SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Community-Driven and Knowledge-Guided Optimization
of AI Applications Across the Whole SW/HW Stack
or how to adapt to a Cambrian explosion inor how to adapt to a Cambrian explosion in AI / SW / HWAI / SW / HW ……
ARM Research SummitARM Research Summit
Cambridge, September 2017Cambridge, September 2017
Grigori FursinGrigori Fursin
CTO and coCTO and co--founder, dividiti, UKfounder, dividiti, UK
Chief Scientist, cTuning foundationChief Scientist, cTuning foundation
… with cKnowledge.org and open co… with cKnowledge.org and open co--design competitionsdesign competitions
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((2 of 24)of 24)
A race to develop innovative AI products and systems (SW & HW) …A race to develop innovative AI products and systems (SW & HW) …
Various form factors:
IoT, mobile, data centers, supercomputers
Various constraints:
speed, energy, accuracy, size, resiliency, costs
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((3 of 24)of 24)
… leads to a Cambrian AI/SW/HW explosion and technological chaos… leads to a Cambrian AI/SW/HW explosion and technological chaos
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((4 of 24)of 24)
Which AI/SW/HW solutions will survive?Which AI/SW/HW solutions will survive?
AI users
We at dividiti.com perform
competitive analysis
and optimization
of the whole AI/SW/HW stack
for various realistic scenarios
(object detection,
image classification, etc)
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((5 of 24)of 24)
Scenario: image classification on mobile devices
800+ distinct mobile devices
mobile CPUs and GPUs
Caffe, TensorFlow
OpenBLAS, CLBlast, ViennaCL, Eigen
AlexNet, GoogleNet, SqueezeNet
ImageNet and user images
Requirement: speed vs cost
(vs energy vs accuracy
vs model size
vs memory usage
vs reliability…)
Price (euros)
Executiontime(sec)
Just a few winning "AI+SW+HW species"
must be optimized further
or may "extinct"
Obtained using our CK-based Android app to crowdsource experiments
across devices provided by volunteers (later in the talk)
cKnowledge.org/repo cKnowledge.org/ai
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((6 of 24)of 24)
Optimization is adOptimization is ad--hoc, tedious, expensive and time consuminghoc, tedious, expensive and time consuming
Mobile device ServerMobile device Server
Data centersData centers
Available libraries / skeletonsAvailable libraries / skeletons
CompilersCompilers
Binary or byte codeBinary or byte code
Hardware,
simulators
Hardware,
simulators
Run-time environmentRun-time environment
Run-time stateRun-time state
of the system
InputsInputs
Existing frameworks / algorithmsExisting frameworks / algorithms
Various modelsVarious models
User front-end (cloud, GRID,User front-end (cloud, GRID,
supercomputer, etc)
Algorithm / source codeAlgorithm / source code
Microsoft Azure, AWS, Google Cloud, XSEDE, PRACE, Watson…
100s of models for TensorFlow,Caffe,Torch,Theano,MxNet,CNTK100s of models for TensorFlow,Caffe,Torch,Theano,MxNet,CNTK
CUDA, MPI, OpenMP, TBB, OpenCL, StarPU, OmpSs …
C,C++,Fortran,Java,Python,byte code, assembler …
LLVM,GCC,ICC,Rose,PGI,Lift ,functional programming …
cuBLAS, BLAS,MAGMA,ViennaCL,CLBlast,cuDNN, openBLAS,
clBLAS, libDNN, tinyDNN,ARM compute lib, libxsmm, skeletons
diverse hardware: heterogeneous, out-of-order, caches
(ARM,x86,CUDA,Mali,Adreno,Power,TPU,FPGA,MIPS,AVX,neon)
Linux (CentOS, Ubuntu, RedHat, SUSE, Debian),
Android, Windows, BSD, iOS, MacOS …
Too many design and optimization choices at each level of continuously changing SW/HW stack!
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((7 of 24)of 24)
Mobile device ServerMobile device Server
Data centersData centers
Available libraries / skeletonsAvailable libraries / skeletons
CompilersCompilers
Binary or byte codeBinary or byte code
Hardware,
simulators
Hardware,
simulators
Run-time environmentRun-time environment
Run-time stateRun-time state
of the system
InputsInputs
Existing frameworks / algorithmsExisting frameworks / algorithms
Various modelsVarious models
User front-end (cloud, GRID,User front-end (cloud, GRID,
supercomputer, etc)
Algorithm / source codeAlgorithm / source code
Microsoft Azure, AWS, Google Cloud, XSEDE, PRACE, Watson…
Hundreds of models for TF, Caffe, Torch, Theano, MxNet, CNTK
CUDA, MPI, OpenMP, TBB, OpenCL, StarPU, OmpSs …
C,C++,Fortran,Java,Python,byte code, assembler …
LLVM,GCC,ICC,Rose,PGI,Lift , functional programming …
cuBLAS, BLAS,MAGMA,ViennaCL,CLBlast,cuDNN, openBLAS,
clBLAS, libDNN, tinyDNN,ARM compute lib, libxsmm, skeletons
diverse hardware: heterogeneous, out-of-order, caches
(ARM,x86,CUDA,Mali,Adreno,Power,TPU,FPGA,MIPS,AVX,neon)
Linux (CentOS, Ubuntu, RedHat, SUSE, Debian),
Android, Windows, BSD, iOS, MacOS …
Time to reinvent computer engineering
and enable open, collaborative and reproducible AI/SW/HW co-design!
Time to reinvent computer engineering
and enable open, collaborative and reproducible AI/SW/HW co-design!
Optimization is adOptimization is ad--hoc, tedious, expensive and time consuminghoc, tedious, expensive and time consuming
Too many design and optimization choices at each level of continuosly changing SW/HW stack!
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((8 of 24)of 24)
cKnowledge.org:cKnowledge.org: pluginplugin--based workflow framework to cobased workflow framework to co--design AI/SW/HW stackdesign AI/SW/HW stack
Grigori Fursin, Anton Lokhmotov, Ed Plowman, "Collective Knowledge: towards R&D sustainability", DATE'16
Available libraries / skeletonsAvailable libraries / skeletons
CompilersCompilers
Binary or byte codeBinary or byte code
Hardware,
simulators
Hardware,
simulators
Run-time environmentRun-time environment
Run-time stateRun-time state
of the system
InputsInputs Various modelsVarious models
Algorithm / source codeAlgorithm / source code
AI frameworkAI framework
Common JSON APICommon JSON API
Initial funding (2015)
Common experimental framework
for computer engineering and AI research
https://github.com/ctuning/ck
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((9 of 24)of 24)
Repositories with reusable and customizable artifacts (JSON API and meta info)Repositories with reusable and customizable artifacts (JSON API and meta info)
Unified modelsUnified models
CK JSON APICK JSON API
CK metaCK metaMobileNets
GoogleNetGoogleNet
AlexNet
SqueezeNetSqueezeNet
ResNetResNet
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK meta
AI frameworksAI frameworks
CK JSON APICK JSON API
CK metaCK metaTensorFlow
Caffe
Caffe2
CNTK
MxNetMxNet
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK meta
… …
…
Available libraries / skeletonsAvailable libraries / skeletons
CompilersCompilers
Binary or byte codeBinary or byte code
Hardware,
simulators
Hardware,
simulators
Run-time environmentRun-time environment
Run-time stateRun-time state
of the system
InputsInputs Various modelsVarious models
Algorithm / source codeAlgorithm / source code
AI frameworkAI framework
Common JSON APICommon JSON API
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((10 of 24)of 24)
Unified modelsUnified models
CK JSON APICK JSON API
AI frameworksAI frameworks
CK JSON APICK JSON API
… …
CK
API
CK
API
Image
classification
Image
classification
CK
API
CK
API
Object
detection
Object
detection
CK
API
CK
API
EmotionEmotion
analysis
Available libraries / skeletonsAvailable libraries / skeletons
CompilersCompilers
Binary or byte codeBinary or byte code
Hardware,
simulators
Hardware,
simulators
Run-time environmentRun-time environment
Run-time stateRun-time state
of the system
InputsInputs Various modelsVarious models
Algorithm / source codeAlgorithm / source code
AI frameworkAI framework
Common JSON APICommon JSON API
Repositories with reusable and customizable workflows (JSON API)Repositories with reusable and customizable workflows (JSON API)
CK metaCK metaMobileNets
GoogleNetGoogleNet
AlexNet
SqueezeNetSqueezeNet
ResNetResNet
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK metaTensorFlow
Caffe
Caffe2
CNTK
MxNetMxNet
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK meta
…
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((11 of 24)of 24)
Available libraries / skeletonsAvailable libraries / skeletons
CompilersCompilers
Binary or byte codeBinary or byte code
Hardware,
simulators
Hardware,
simulators
Run-time environmentRun-time environment
Run-time stateRun-time state
of the system
InputsInputs Various modelsVarious models
Algorithm / source codeAlgorithm / source code
AI frameworkAI framework
Common JSON APICommon JSON API
Unified modelsUnified models
CK JSON APICK JSON API
AI frameworksAI frameworks
CK JSON APICK JSON API
… …
CK
API
CK
API
Image
classification
Image
classification
CK
API
CK
API
Object
detection
Object
detection
CK
API
CK
API
EmotionEmotion
analysis
Crowdsource AI expeirments
across diverse platforms
provided by volunteers
ContinuousContinuous competition ofcompetition of various AI/SW/HW combinationsvarious AI/SW/HW combinations ((species)species)
cKnowledge.org/repo
Everyone is on the same page:
fair and reproducible competitions
CK metaCK metaMobileNets
GoogleNetGoogleNet
AlexNet
SqueezeNetSqueezeNet
ResNetResNet
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK metaTensorFlow
Caffe
Caffe2
CNTK
MxNetMxNet
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK meta
…
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((12 of 24)of 24)
CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components
setupsetup softsoft
findfind
extract featuresextract features
datasetdataset
compilecompile
runrun
addadd
replayreplay
experimentexperiment
autotuneautotune
programprogram
TensorFlowTensorFlow
Caffe2Caffe2
ARM compute libARM compute lib
image classificationimage classification
object detectionobject detection
ImageNetImageNet
Car video streamCar video stream
Real surveillance cameraReal surveillance camera
GEMM OpenCLGEMM OpenCL
convolution CPUconvolution CPU
performance resultsperformance results
training / accuracytraining / accuracy
bugsbugs
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
Ad-hoc scripts to perform some actions on some artifacts
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((13 of 24)of 24)
CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components
setup soft
find
extract featuresextract features
dataset
compile
run
add
replay
experiment
autotune
program
TensorFlowTensorFlow
Caffe2Caffe2
ARM compute libARM compute lib
image classificationimage classification
object detectionobject detection
ImageNetImageNet
Car video streamCar video stream
Real surveillance cameraReal surveillance camera
GEMM OpenCLGEMM OpenCL
convolution CPUconvolution CPU
performance resultsperformance results
training / accuracytraining / accuracy
bugsbugs
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
/ 1st level directory – CK modules / 2nd level dir - CK entries / CK meta info
Python modulePython moduleJSON APIJSON API holder for original artifactholder for original artifact CK metaCK meta
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((14 of 24)of 24)
CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components
setup soft
find
extract featuresextract features
dataset
compile
run
add
replay
experiment
autotune
program
TensorFlowTensorFlow
Caffe2Caffe2
ARM compute libARM compute lib
image classificationimage classification
object detectionobject detection
ImageNetImageNet
Car video streamCar video stream
Real surveillance cameraReal surveillance camera
GEMM OpenCLGEMM OpenCL
convolution CPUconvolution CPU
performance resultsperformance results
training / accuracytraining / accuracy
bugsbugs
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
/ 1st level directory – CK modules / 2nd level dir - CK entries / CK meta info
Python modulePython moduleJSON APIJSON API holder for original artifactholder for original artifact CK metaCK meta
Collective Knowledge (github.com/ctuning/ck) –
$
$ ck pull
$ ck add
$ ck compile
$ ck run
Collective Knowledge (github.com/ctuning/ck) –
assists you in unifying, executing, sharing and reusing your artifacts:
$ sudo pip install ck
$ ck pull repo:ck-autotuning
$ ck add dataset:my-new-dataset (UID will be automatically generated)
$ ck compile program:cbench-automotive-susan
$ ck run program:cbench-automotive-susan
https://github.com/ctuning/ck/wiki/Shared-modules
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((15 of 24)of 24)
We already converted multiple AI frameworks, artifacts and workflows to the CKWe already converted multiple AI frameworks, artifacts and workflows to the CK
ICC 17.0
CUDA 8.0CUDA 8.0
GCC 7.0
LLVM 4.0
Databases, local repositoriesDatabases, local repositories
Ad-hocinitAd-hocinit
scripts
Ad-hoc
scripts to
process CSV,
XLS, TXT, etc.
Ad-hoc experimental workflows
ProgramProgramCKprogram
CKpipeline
CK
compiler
CK AI
framework
CK math
library CK experiment
Caffe
OpenCL
Caffe CUDACaffe CUDA
TensorFlowTensorFlow
CPU/CUDA
MAGMA
cuBLAS
OpenBLASOpenBLAS
ViennaCL
CLBlast Stat. analysis,
predictive
analytics,
visualization
• github.com/dividiti/ck-caffe
• github.com/ctuning/ck-caffe2
• github.com/ctuning/ck-tensorflow
$ ck pull repo –url= github.com/dividiti/ck-caffe
$ ck compile program:caffe-classification
$ ck run program:caffe-classification
https://github.com/ctuning/ck/wiki/Shared-repos
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((16 of 24)of 24)
We've already converted multiple AI frameworks, artifacts and workflows to the CKWe've already converted multiple AI frameworks, artifacts and workflows to the CK
ICC 17.0
CUDA 8.0CUDA 8.0
GCC 7.0
LLVM 4.0
Databases, local repositoriesDatabases, local repositories
Ad-hocinitAd-hocinit
scripts
Ad-hoc
scripts to
process CSV,
XLS, TXT, etc.
UnifiedAPI(input)UnifiedAPI(input)
Read
program
Read
program
meta
Detect all softwareDetect all software
dependencies; ask user
If multiple versions exists
Prepare
environment
CompileCompile
program
Run
program
UnifiedAPI(output)UnifiedAPI(output)
Ad-hoc experimental workflows
ProgramProgramCKprogram
CKpipeline
CK
compiler
CK AI
framework
CK math
library CK experiment
JSONJSON
CK program module can automatically adapt
to underlying environment via dependencies
Source files and auxiliary scriptsSource files and auxiliary scripts
CK program entry (native directory)CK program entry (native directory)
.cm/meta.json – describes soft dependencies ,
data sets, and how to compile and run this program
.cm/meta.json – describes soft dependencies ,
data sets, and how to compile and run this program
CK entries associated with a given
module describe a given object
using meta.json while storing all
necessary files and sub-directories
Caffe
OpenCL
Caffe CUDACaffe CUDA
TensorFlowTensorFlow
CPU/CUDA
MAGMA
cuBLAS
OpenBLASOpenBLAS
ViennaCL
CLBlast Stat. analysis,
predictive
analytics,
visualization
• github.com/dividiti/ck-caffe
• github.com/ctuning/ck-caffe2
• github.com/ctuning/ck-tensorflow
$ ck pull repo –url= github.com/dividiti/ck-caffe
$ ck compile program:caffe-classification
$ ck run program:caffe-classification
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((17 of 24)of 24)
Automatically adapting workflow to any underlying software and hardware
local / env / 03ca0be16962f471 / env.sh
Tags: compiler,cuda,v8.0
local / env / 03ca0be16962f471 / env.sh
Tags: compiler,cuda,v8.0
local / env / 0a5ba198d48e3af3 / env.bat
Tags: lib,blas,cublas,v8.0
local / env / 0a5ba198d48e3af3 / env.bat
Tags: lib,blas,cublas,v8.0
Soft entries in CK describe how
to detect if a given software is
already installed, how to set up
all its environment including
all paths (to binaries, libraries,
include, aux tools, etc),
and how to detect its version
$ ck detect soft --tags=compiler,cuda$ ck detect soft --tags=compiler,cuda
$ ck detect soft:compiler.gcc$ ck detect soft:compiler.gcc
$ ck detect soft:compiler.llvm$ ck detect soft:compiler.llvm
$ ck list soft:compiler*$ ck list soft:compiler*
$ ck detect soft:lib.cublas$ ck detect soft:lib.cublas
Env entries are created in CK local
repo for all found software
instances together with their meta
and an auto-generated environment
script env.sh (on Linux) or env.bat
(on Windows)
Package entries describe how to
install a given software if it is not
already installed (using CK Python
plugin together with install.sh
script on Linux host or install.bat
on Windows host)
$ ck install package:caffemodel-bvlc-googlenet$ ck install package:caffemodel-bvlc-googlenet
$ ck install package:imagenet-2012-val$ ck install package:imagenet-2012-val
$ ck install package:lib-tensorflow-cuda$ ck install package:lib-tensorflow-cuda
$ ck list package:*caffemodel*$ ck list package:*caffemodel*
LocalCKrepoLocalCKrepo
$ ck search soft --tags=blas$ ck search soft --tags=blas
$ ck show env$ ck show env
$ ck show env –tags=cublas$ ck show env –tags=cublas
$ ck rm env:* –tags=cublas$ ck rm env:* –tags=cublas
$ ck search package –tags=caffe$ ck search package –tags=caffe
$ ck list package:*tensorflow*$ ck list package:*tensorflow* $ ck install package:lib-caffe-bvlc-master-cuda-universal$ ck install package:lib-caffe-bvlc-master-cuda-universal
https://github.com/ctuning/ck/wiki/Portable-workflows
Multiple versions of tools may easily co-exist and plugged in to CK workflows!
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((18 of 24)of 24)
Applying methodology from natural sciences to optimize computer systems
https://github.com/ctuning/ck/wiki/Autotuning
CK Python modules (wrappers) with a unified JSON API
CKinput(JSON/dict)
CKoutput(JSON/dict)
Unified input
BehaviorBehavior
ChoicesChoices
FeaturesFeatures
StateState
ActionAction
Unified output
BehaviorBehavior
ChoicesChoices
FeaturesFeatures
StateState
b = B( c , f , s )
… … … …
Formalized function B
of a behavior of any CK object
Flattened CK JSON vectors
(dict converted to vector)
to simplify statistical analysis,
machine learning
and data mining
Some
actions
Tools (compilers, profilers, etc)Tools (compilers, profilers, etc) Generated filesGenerated files
Chain CK modules to implement research workflows such as multi-objective autotuning and co-design
exploration
Choose
exploration
strategy
Perform SW/HW DSEPerform SW/HW DSE
(math transforms,
skeleton params,
compiler flags,
transformations …)
PerformPerform
stat.
analysis
Detect
(Pareto)
frontier
Model
optimizations
Model
behavior,
predict
optimizations
Reduce
complexity
SetSet
environment
for a given
tool version
CK program module
with pipeline function
CompileCompile
program
Run
code
i
i
i i
First expose coarse grain high-level choices, features, system state and behavior characteristics
Crowdsource benchmarking and random exploration across diverse inputs and devices;
Keep best species (AI/SW/HW choices); model behavior; predict better optimizations and designs
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((19 of 24)of 24)
Prepare first proof-of-concept community experiments
Available libraries / skeletonsAvailable libraries / skeletons
CompilersCompilers
Binary or byte codeBinary or byte code
Hardware,
simulators
Hardware,
simulators
Run-time environmentRun-time environment
Run-time stateRun-time state
of the system
InputsInputs Various modelsVarious models
Algorithm / source codeAlgorithm / source code
AI frameworkAI framework
Algorithms: object classification, object detection
AI frameworks:
Caffe CPU, Caffe OpenCL, TensorFlow CPU
Math libraries:
OpenBLAS, ViennaCL, clBLAS, CLBlast, cuBLAS, cuDNN,
Eigen, gemmlowp
Compilers: GCC 5+
Models:
AlexNet, GoogleNet, VGG, ResNet,
SqueezeNet, SqueezeDet, SSD
Datasets: KITTI, COCO, VOC, ImageNet
Optimization choices: batch size, number of CPU threads
Characteristics:
total execution time (including OpenCL overheads),
top1/top5 model accuracy, static model size (MB),
device cost, max power consumption (if available)
System state: CPU/GPU frequency, memory
cKnowledge.org/repo
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((20 of 24)of 24)
Crowdsource benchmarking across Android devices provided by volunteers
Continuously collect statistics, bugs and misclassifications at cKnowledge.org/repo
The number of distinct participated platforms:800+
The number of distinct CPUs: 260+
The number of distinct GPUs: 110+
The number of distinct OS: 280+
Power range: 1-10W
No need for a dedicated and expensive cloud –
volunteers help us validate research ideas
similar to SETI@HOME
Also collecting real images from users
for misclassifications to build an open
and continuously updated training set)!
Winning solutions
on various frontiers
Timeperimage(seconds)
Cost(euros)
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((21 of 24)of 24)
Crowdsource benchmarking across Android devices provided by volunteers
Continuously collect statistics, bugs and misclassifications at cKnowledge.org/repo
Winning solutions
on various frontiers
Firefly-RK3399
The number of distinct participated platforms:790+
The number of distinct CPUs: 260+
The number of distinct GPUs: 110+
The number of distinct OS: 280+
Power range: 1-10W
No need for a dedicated and expensive cloud –
volunteers help us validate research ideas
similar to SETI@HOME
Also collecting real images from users
for misclassifications to build an open
and continuously updated training set)!
Timeperimage(seconds)
Cost(euros)
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((22 of 24)of 24)
Let's dig further – (crowdsource) BLAS autotuning in Caffe on Firefly-RK3399
Collaboration between Marco Cianfriglia (Roma Tre University), Cedric Nugteren (TomTom),
Flavio Vella, Anton Lokhmotov and Grigori Fursin (dividiti)
Name Description Ranges
KWG 2D tiling at workgroup level {32,64}
KWI KWG kernel-loop can be unrolled by a factor KWI {1}
MDIMA Local Memory Re-shape {4,8}
MDIMC Local Memory Re-shape {8, 16, 32}
MWG 2D tiling at workgroup level {32, 64, 128}
NDIMB Local Memory Re-shape {8, 16, 32}
NDIMC Local Memory Re-shape {8, 16, 32}
NWG 2D tiling at workgroup level {16, 32}
SA manual caching using the local memory {0, 1}
SB manual caching using the local memory {0, 1}
STRM Striding within single thread for matrix A and C {0,1}
STRN Striding within single thread for matrix B {0,1}
VWM Vector width for loading A and C {8,16}
VWN Vector width for loading B {0,1}
Tunable parameters of OpenCL-based BLAS ( github.com/CNugteren/CLBlast )
For now only two data sets (small & large)
Some extra constraints
to avoid illegal
combinations
Use different autotuners
under CK to speed up
design space exploration
based on probabilistic
focused search,
generic algorithms,
deep learning, SVM, KNN,
MARS, decision trees …
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((23 of 24)of 24)
Let's dig further – autotuning BLAS (CLBlast) in Caffe on Firefly-RK3399
• Caffe with autotuned OpenBLAS (threads and batches) is the fastest
• Caffe with autotuned CLBlast is 6..7x faster than default version and competitive with
OpenBLAS-based version– now worth making adaptive selection at run-time.
Sharing results in a reproducible way with the community for validation and improvement:
https://nbviewer.jupyter.org/github/dividiti/ck-caffe-firefly-rk3399/
blob/master/script/batch_size-libs-models/analysis.20170531.ipynb
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((24 of 24)of 24)
• Bring together industry and academia to participate in open
and reproducible AI/SW/HW co-design competitions using CK framework
• Share more artifacts, workflows and results in a reusable
and customizable CK format (common JSON API and meta description)
• Collaboratively improve models and find missing features
• Gradually expose more design and optimization knobs at all AI/SW/HW levels
• Enable distributed on-line learning for self-optimizing and self-learning systems
http://cKnowledge.org/partners http://cKnowledge.org/publications
Join the growing Collective Knowledge community!

Contenu connexe

Tendances

Tendances (20)

Hardware in Space
Hardware in SpaceHardware in Space
Hardware in Space
 
NVIDIA Keynote #GTC21
NVIDIA Keynote #GTC21 NVIDIA Keynote #GTC21
NVIDIA Keynote #GTC21
 
OpenACC Monthly Highlights - February 2018
OpenACC Monthly Highlights - February 2018OpenACC Monthly Highlights - February 2018
OpenACC Monthly Highlights - February 2018
 
Fuelling the AI Revolution with Gaming
Fuelling the AI Revolution with GamingFuelling the AI Revolution with Gaming
Fuelling the AI Revolution with Gaming
 
Talk on commercialising space data
Talk on commercialising space data Talk on commercialising space data
Talk on commercialising space data
 
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
 
Talk on using AI to address some of humanities problems
Talk on using AI to address some of humanities problemsTalk on using AI to address some of humanities problems
Talk on using AI to address some of humanities problems
 
AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
 
Hire a Machine to Code - Michael Arthur Bucko & Aurélien Nicolas
Hire a Machine to Code - Michael Arthur Bucko & Aurélien NicolasHire a Machine to Code - Michael Arthur Bucko & Aurélien Nicolas
Hire a Machine to Code - Michael Arthur Bucko & Aurélien Nicolas
 
AI + E-commerce
AI + E-commerceAI + E-commerce
AI + E-commerce
 
OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar
 
HPC Top 5 Stories: April 26, 2018
HPC Top 5 Stories: April 26, 2018HPC Top 5 Stories: April 26, 2018
HPC Top 5 Stories: April 26, 2018
 
OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019
 
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded Day
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded DayC:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded Day
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded Day
 
NVIDIA Developer Program Overview
NVIDIA Developer Program OverviewNVIDIA Developer Program Overview
NVIDIA Developer Program Overview
 
FPGA-based soft-processors: 6G nodes and post-quantum security in space
 FPGA-based soft-processors: 6G nodes and post-quantum security in space FPGA-based soft-processors: 6G nodes and post-quantum security in space
FPGA-based soft-processors: 6G nodes and post-quantum security in space
 
OpenACC Monthly Highlights: May 2019
OpenACC Monthly Highlights: May 2019OpenACC Monthly Highlights: May 2019
OpenACC Monthly Highlights: May 2019
 
oneAPI: Industry Initiative & Intel Product
oneAPI: Industry Initiative & Intel ProductoneAPI: Industry Initiative & Intel Product
oneAPI: Industry Initiative & Intel Product
 
WML OpenPOWER presentation
WML OpenPOWER presentationWML OpenPOWER presentation
WML OpenPOWER presentation
 

Similaire à Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions and Collective Knowledge

“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
Edge AI and Vision Alliance
 
Elastic r sc10-tutorial
Elastic r sc10-tutorialElastic r sc10-tutorial
Elastic r sc10-tutorial
Arden Chan
 
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
Eugenio Villar
 
Cluster Tutorial
Cluster TutorialCluster Tutorial
Cluster Tutorial
cybercbm
 

Similaire à Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions and Collective Knowledge (20)

Inria - Software assets - Energy
Inria - Software assets - EnergyInria - Software assets - Energy
Inria - Software assets - Energy
 
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
 
Linux and Open Source in Math, Science and Engineering
Linux and Open Source in Math, Science and EngineeringLinux and Open Source in Math, Science and Engineering
Linux and Open Source in Math, Science and Engineering
 
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptxOpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
 
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred Hutch
 
Elastic r sc10-tutorial
Elastic r sc10-tutorialElastic r sc10-tutorial
Elastic r sc10-tutorial
 
Inria - Software assets - Aerospace
Inria - Software assets - AerospaceInria - Software assets - Aerospace
Inria - Software assets - Aerospace
 
The future of AI is hybrid
The future of AI is hybridThe future of AI is hybrid
The future of AI is hybrid
 
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
 
Software used in Electronics and Communication
Software used in Electronics and CommunicationSoftware used in Electronics and Communication
Software used in Electronics and Communication
 
"Imaging + AI: Opportunities Inside the Car and Beyond," a Presentation from ...
"Imaging + AI: Opportunities Inside the Car and Beyond," a Presentation from ..."Imaging + AI: Opportunities Inside the Car and Beyond," a Presentation from ...
"Imaging + AI: Opportunities Inside the Car and Beyond," a Presentation from ...
 
AI in Finance: Moving forward!
AI in Finance: Moving forward!AI in Finance: Moving forward!
AI in Finance: Moving forward!
 
cc23
cc23cc23
cc23
 
Cluster Tutorial
Cluster TutorialCluster Tutorial
Cluster Tutorial
 
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
 
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
 
20130503 iCore at calipso workshop fia dublin
20130503 iCore at calipso workshop fia dublin20130503 iCore at calipso workshop fia dublin
20130503 iCore at calipso workshop fia dublin
 
e-Clouds A Platform and Marketplace to Access and Publish Scientific Applicat...
e-Clouds A Platform and Marketplace to Access and Publish Scientific Applicat...e-Clouds A Platform and Marketplace to Access and Publish Scientific Applicat...
e-Clouds A Platform and Marketplace to Access and Publish Scientific Applicat...
 
Omkar revankar
Omkar revankarOmkar revankar
Omkar revankar
 

Plus de Grigori Fursin

CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...
Grigori Fursin
 
Collective Mind: bringing reproducible research to the masses
Collective Mind: bringing reproducible research to the massesCollective Mind: bringing reproducible research to the masses
Collective Mind: bringing reproducible research to the masses
Grigori Fursin
 
Collective Mind: a collaborative curation tool for program optimization
Collective Mind: a collaborative curation tool for program optimizationCollective Mind: a collaborative curation tool for program optimization
Collective Mind: a collaborative curation tool for program optimization
Grigori Fursin
 
Collective Mind infrastructure and repository to crowdsource auto-tuning (c-m...
Collective Mind infrastructure and repository to crowdsource auto-tuning (c-m...Collective Mind infrastructure and repository to crowdsource auto-tuning (c-m...
Collective Mind infrastructure and repository to crowdsource auto-tuning (c-m...
Grigori Fursin
 

Plus de Grigori Fursin (8)

CGO/PPoPP'17 Artifact Evaluation Discussion (enabling open and reproducible r...
CGO/PPoPP'17 Artifact Evaluation Discussion (enabling open and reproducible r...CGO/PPoPP'17 Artifact Evaluation Discussion (enabling open and reproducible r...
CGO/PPoPP'17 Artifact Evaluation Discussion (enabling open and reproducible r...
 
CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...
 
Collective Knowledge: python and scikit-learn based open research SDK for col...
Collective Knowledge: python and scikit-learn based open research SDK for col...Collective Knowledge: python and scikit-learn based open research SDK for col...
Collective Knowledge: python and scikit-learn based open research SDK for col...
 
Artifact Evaluation Experience CGO'15 / PPoPP'15
Artifact Evaluation Experience CGO'15 / PPoPP'15Artifact Evaluation Experience CGO'15 / PPoPP'15
Artifact Evaluation Experience CGO'15 / PPoPP'15
 
Collective Mind: bringing reproducible research to the masses
Collective Mind: bringing reproducible research to the massesCollective Mind: bringing reproducible research to the masses
Collective Mind: bringing reproducible research to the masses
 
Panel at acm_sigplan_trust2014
Panel at acm_sigplan_trust2014Panel at acm_sigplan_trust2014
Panel at acm_sigplan_trust2014
 
Collective Mind: a collaborative curation tool for program optimization
Collective Mind: a collaborative curation tool for program optimizationCollective Mind: a collaborative curation tool for program optimization
Collective Mind: a collaborative curation tool for program optimization
 
Collective Mind infrastructure and repository to crowdsource auto-tuning (c-m...
Collective Mind infrastructure and repository to crowdsource auto-tuning (c-m...Collective Mind infrastructure and repository to crowdsource auto-tuning (c-m...
Collective Mind infrastructure and repository to crowdsource auto-tuning (c-m...
 

Dernier

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Dernier (20)

20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions and Collective Knowledge

  • 1. Community-Driven and Knowledge-Guided Optimization of AI Applications Across the Whole SW/HW Stack or how to adapt to a Cambrian explosion inor how to adapt to a Cambrian explosion in AI / SW / HWAI / SW / HW …… ARM Research SummitARM Research Summit Cambridge, September 2017Cambridge, September 2017 Grigori FursinGrigori Fursin CTO and coCTO and co--founder, dividiti, UKfounder, dividiti, UK Chief Scientist, cTuning foundationChief Scientist, cTuning foundation … with cKnowledge.org and open co… with cKnowledge.org and open co--design competitionsdesign competitions
  • 2. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((2 of 24)of 24) A race to develop innovative AI products and systems (SW & HW) …A race to develop innovative AI products and systems (SW & HW) … Various form factors: IoT, mobile, data centers, supercomputers Various constraints: speed, energy, accuracy, size, resiliency, costs
  • 3. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((3 of 24)of 24) … leads to a Cambrian AI/SW/HW explosion and technological chaos… leads to a Cambrian AI/SW/HW explosion and technological chaos
  • 4. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((4 of 24)of 24) Which AI/SW/HW solutions will survive?Which AI/SW/HW solutions will survive? AI users We at dividiti.com perform competitive analysis and optimization of the whole AI/SW/HW stack for various realistic scenarios (object detection, image classification, etc)
  • 5. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((5 of 24)of 24) Scenario: image classification on mobile devices 800+ distinct mobile devices mobile CPUs and GPUs Caffe, TensorFlow OpenBLAS, CLBlast, ViennaCL, Eigen AlexNet, GoogleNet, SqueezeNet ImageNet and user images Requirement: speed vs cost (vs energy vs accuracy vs model size vs memory usage vs reliability…) Price (euros) Executiontime(sec) Just a few winning "AI+SW+HW species" must be optimized further or may "extinct" Obtained using our CK-based Android app to crowdsource experiments across devices provided by volunteers (later in the talk) cKnowledge.org/repo cKnowledge.org/ai
  • 6. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((6 of 24)of 24) Optimization is adOptimization is ad--hoc, tedious, expensive and time consuminghoc, tedious, expensive and time consuming Mobile device ServerMobile device Server Data centersData centers Available libraries / skeletonsAvailable libraries / skeletons CompilersCompilers Binary or byte codeBinary or byte code Hardware, simulators Hardware, simulators Run-time environmentRun-time environment Run-time stateRun-time state of the system InputsInputs Existing frameworks / algorithmsExisting frameworks / algorithms Various modelsVarious models User front-end (cloud, GRID,User front-end (cloud, GRID, supercomputer, etc) Algorithm / source codeAlgorithm / source code Microsoft Azure, AWS, Google Cloud, XSEDE, PRACE, Watson… 100s of models for TensorFlow,Caffe,Torch,Theano,MxNet,CNTK100s of models for TensorFlow,Caffe,Torch,Theano,MxNet,CNTK CUDA, MPI, OpenMP, TBB, OpenCL, StarPU, OmpSs … C,C++,Fortran,Java,Python,byte code, assembler … LLVM,GCC,ICC,Rose,PGI,Lift ,functional programming … cuBLAS, BLAS,MAGMA,ViennaCL,CLBlast,cuDNN, openBLAS, clBLAS, libDNN, tinyDNN,ARM compute lib, libxsmm, skeletons diverse hardware: heterogeneous, out-of-order, caches (ARM,x86,CUDA,Mali,Adreno,Power,TPU,FPGA,MIPS,AVX,neon) Linux (CentOS, Ubuntu, RedHat, SUSE, Debian), Android, Windows, BSD, iOS, MacOS … Too many design and optimization choices at each level of continuously changing SW/HW stack!
  • 7. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((7 of 24)of 24) Mobile device ServerMobile device Server Data centersData centers Available libraries / skeletonsAvailable libraries / skeletons CompilersCompilers Binary or byte codeBinary or byte code Hardware, simulators Hardware, simulators Run-time environmentRun-time environment Run-time stateRun-time state of the system InputsInputs Existing frameworks / algorithmsExisting frameworks / algorithms Various modelsVarious models User front-end (cloud, GRID,User front-end (cloud, GRID, supercomputer, etc) Algorithm / source codeAlgorithm / source code Microsoft Azure, AWS, Google Cloud, XSEDE, PRACE, Watson… Hundreds of models for TF, Caffe, Torch, Theano, MxNet, CNTK CUDA, MPI, OpenMP, TBB, OpenCL, StarPU, OmpSs … C,C++,Fortran,Java,Python,byte code, assembler … LLVM,GCC,ICC,Rose,PGI,Lift , functional programming … cuBLAS, BLAS,MAGMA,ViennaCL,CLBlast,cuDNN, openBLAS, clBLAS, libDNN, tinyDNN,ARM compute lib, libxsmm, skeletons diverse hardware: heterogeneous, out-of-order, caches (ARM,x86,CUDA,Mali,Adreno,Power,TPU,FPGA,MIPS,AVX,neon) Linux (CentOS, Ubuntu, RedHat, SUSE, Debian), Android, Windows, BSD, iOS, MacOS … Time to reinvent computer engineering and enable open, collaborative and reproducible AI/SW/HW co-design! Time to reinvent computer engineering and enable open, collaborative and reproducible AI/SW/HW co-design! Optimization is adOptimization is ad--hoc, tedious, expensive and time consuminghoc, tedious, expensive and time consuming Too many design and optimization choices at each level of continuosly changing SW/HW stack!
  • 8. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((8 of 24)of 24) cKnowledge.org:cKnowledge.org: pluginplugin--based workflow framework to cobased workflow framework to co--design AI/SW/HW stackdesign AI/SW/HW stack Grigori Fursin, Anton Lokhmotov, Ed Plowman, "Collective Knowledge: towards R&D sustainability", DATE'16 Available libraries / skeletonsAvailable libraries / skeletons CompilersCompilers Binary or byte codeBinary or byte code Hardware, simulators Hardware, simulators Run-time environmentRun-time environment Run-time stateRun-time state of the system InputsInputs Various modelsVarious models Algorithm / source codeAlgorithm / source code AI frameworkAI framework Common JSON APICommon JSON API Initial funding (2015) Common experimental framework for computer engineering and AI research https://github.com/ctuning/ck
  • 9. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((9 of 24)of 24) Repositories with reusable and customizable artifacts (JSON API and meta info)Repositories with reusable and customizable artifacts (JSON API and meta info) Unified modelsUnified models CK JSON APICK JSON API CK metaCK metaMobileNets GoogleNetGoogleNet AlexNet SqueezeNetSqueezeNet ResNetResNet CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK meta AI frameworksAI frameworks CK JSON APICK JSON API CK metaCK metaTensorFlow Caffe Caffe2 CNTK MxNetMxNet CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK meta … … … Available libraries / skeletonsAvailable libraries / skeletons CompilersCompilers Binary or byte codeBinary or byte code Hardware, simulators Hardware, simulators Run-time environmentRun-time environment Run-time stateRun-time state of the system InputsInputs Various modelsVarious models Algorithm / source codeAlgorithm / source code AI frameworkAI framework Common JSON APICommon JSON API
  • 10. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((10 of 24)of 24) Unified modelsUnified models CK JSON APICK JSON API AI frameworksAI frameworks CK JSON APICK JSON API … … CK API CK API Image classification Image classification CK API CK API Object detection Object detection CK API CK API EmotionEmotion analysis Available libraries / skeletonsAvailable libraries / skeletons CompilersCompilers Binary or byte codeBinary or byte code Hardware, simulators Hardware, simulators Run-time environmentRun-time environment Run-time stateRun-time state of the system InputsInputs Various modelsVarious models Algorithm / source codeAlgorithm / source code AI frameworkAI framework Common JSON APICommon JSON API Repositories with reusable and customizable workflows (JSON API)Repositories with reusable and customizable workflows (JSON API) CK metaCK metaMobileNets GoogleNetGoogleNet AlexNet SqueezeNetSqueezeNet ResNetResNet CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK metaTensorFlow Caffe Caffe2 CNTK MxNetMxNet CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK meta …
  • 11. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((11 of 24)of 24) Available libraries / skeletonsAvailable libraries / skeletons CompilersCompilers Binary or byte codeBinary or byte code Hardware, simulators Hardware, simulators Run-time environmentRun-time environment Run-time stateRun-time state of the system InputsInputs Various modelsVarious models Algorithm / source codeAlgorithm / source code AI frameworkAI framework Common JSON APICommon JSON API Unified modelsUnified models CK JSON APICK JSON API AI frameworksAI frameworks CK JSON APICK JSON API … … CK API CK API Image classification Image classification CK API CK API Object detection Object detection CK API CK API EmotionEmotion analysis Crowdsource AI expeirments across diverse platforms provided by volunteers ContinuousContinuous competition ofcompetition of various AI/SW/HW combinationsvarious AI/SW/HW combinations ((species)species) cKnowledge.org/repo Everyone is on the same page: fair and reproducible competitions CK metaCK metaMobileNets GoogleNetGoogleNet AlexNet SqueezeNetSqueezeNet ResNetResNet CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK metaTensorFlow Caffe Caffe2 CNTK MxNetMxNet CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK meta …
  • 12. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((12 of 24)of 24) CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components setupsetup softsoft findfind extract featuresextract features datasetdataset compilecompile runrun addadd replayreplay experimentexperiment autotuneautotune programprogram TensorFlowTensorFlow Caffe2Caffe2 ARM compute libARM compute lib image classificationimage classification object detectionobject detection ImageNetImageNet Car video streamCar video stream Real surveillance cameraReal surveillance camera GEMM OpenCLGEMM OpenCL convolution CPUconvolution CPU performance resultsperformance results training / accuracytraining / accuracy bugsbugs with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. Ad-hoc scripts to perform some actions on some artifacts
  • 13. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((13 of 24)of 24) CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components setup soft find extract featuresextract features dataset compile run add replay experiment autotune program TensorFlowTensorFlow Caffe2Caffe2 ARM compute libARM compute lib image classificationimage classification object detectionobject detection ImageNetImageNet Car video streamCar video stream Real surveillance cameraReal surveillance camera GEMM OpenCLGEMM OpenCL convolution CPUconvolution CPU performance resultsperformance results training / accuracytraining / accuracy bugsbugs JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file / 1st level directory – CK modules / 2nd level dir - CK entries / CK meta info Python modulePython moduleJSON APIJSON API holder for original artifactholder for original artifact CK metaCK meta
  • 14. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((14 of 24)of 24) CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components setup soft find extract featuresextract features dataset compile run add replay experiment autotune program TensorFlowTensorFlow Caffe2Caffe2 ARM compute libARM compute lib image classificationimage classification object detectionobject detection ImageNetImageNet Car video streamCar video stream Real surveillance cameraReal surveillance camera GEMM OpenCLGEMM OpenCL convolution CPUconvolution CPU performance resultsperformance results training / accuracytraining / accuracy bugsbugs JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file / 1st level directory – CK modules / 2nd level dir - CK entries / CK meta info Python modulePython moduleJSON APIJSON API holder for original artifactholder for original artifact CK metaCK meta Collective Knowledge (github.com/ctuning/ck) – $ $ ck pull $ ck add $ ck compile $ ck run Collective Knowledge (github.com/ctuning/ck) – assists you in unifying, executing, sharing and reusing your artifacts: $ sudo pip install ck $ ck pull repo:ck-autotuning $ ck add dataset:my-new-dataset (UID will be automatically generated) $ ck compile program:cbench-automotive-susan $ ck run program:cbench-automotive-susan https://github.com/ctuning/ck/wiki/Shared-modules
  • 15. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((15 of 24)of 24) We already converted multiple AI frameworks, artifacts and workflows to the CKWe already converted multiple AI frameworks, artifacts and workflows to the CK ICC 17.0 CUDA 8.0CUDA 8.0 GCC 7.0 LLVM 4.0 Databases, local repositoriesDatabases, local repositories Ad-hocinitAd-hocinit scripts Ad-hoc scripts to process CSV, XLS, TXT, etc. Ad-hoc experimental workflows ProgramProgramCKprogram CKpipeline CK compiler CK AI framework CK math library CK experiment Caffe OpenCL Caffe CUDACaffe CUDA TensorFlowTensorFlow CPU/CUDA MAGMA cuBLAS OpenBLASOpenBLAS ViennaCL CLBlast Stat. analysis, predictive analytics, visualization • github.com/dividiti/ck-caffe • github.com/ctuning/ck-caffe2 • github.com/ctuning/ck-tensorflow $ ck pull repo –url= github.com/dividiti/ck-caffe $ ck compile program:caffe-classification $ ck run program:caffe-classification https://github.com/ctuning/ck/wiki/Shared-repos
  • 16. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((16 of 24)of 24) We've already converted multiple AI frameworks, artifacts and workflows to the CKWe've already converted multiple AI frameworks, artifacts and workflows to the CK ICC 17.0 CUDA 8.0CUDA 8.0 GCC 7.0 LLVM 4.0 Databases, local repositoriesDatabases, local repositories Ad-hocinitAd-hocinit scripts Ad-hoc scripts to process CSV, XLS, TXT, etc. UnifiedAPI(input)UnifiedAPI(input) Read program Read program meta Detect all softwareDetect all software dependencies; ask user If multiple versions exists Prepare environment CompileCompile program Run program UnifiedAPI(output)UnifiedAPI(output) Ad-hoc experimental workflows ProgramProgramCKprogram CKpipeline CK compiler CK AI framework CK math library CK experiment JSONJSON CK program module can automatically adapt to underlying environment via dependencies Source files and auxiliary scriptsSource files and auxiliary scripts CK program entry (native directory)CK program entry (native directory) .cm/meta.json – describes soft dependencies , data sets, and how to compile and run this program .cm/meta.json – describes soft dependencies , data sets, and how to compile and run this program CK entries associated with a given module describe a given object using meta.json while storing all necessary files and sub-directories Caffe OpenCL Caffe CUDACaffe CUDA TensorFlowTensorFlow CPU/CUDA MAGMA cuBLAS OpenBLASOpenBLAS ViennaCL CLBlast Stat. analysis, predictive analytics, visualization • github.com/dividiti/ck-caffe • github.com/ctuning/ck-caffe2 • github.com/ctuning/ck-tensorflow $ ck pull repo –url= github.com/dividiti/ck-caffe $ ck compile program:caffe-classification $ ck run program:caffe-classification
  • 17. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((17 of 24)of 24) Automatically adapting workflow to any underlying software and hardware local / env / 03ca0be16962f471 / env.sh Tags: compiler,cuda,v8.0 local / env / 03ca0be16962f471 / env.sh Tags: compiler,cuda,v8.0 local / env / 0a5ba198d48e3af3 / env.bat Tags: lib,blas,cublas,v8.0 local / env / 0a5ba198d48e3af3 / env.bat Tags: lib,blas,cublas,v8.0 Soft entries in CK describe how to detect if a given software is already installed, how to set up all its environment including all paths (to binaries, libraries, include, aux tools, etc), and how to detect its version $ ck detect soft --tags=compiler,cuda$ ck detect soft --tags=compiler,cuda $ ck detect soft:compiler.gcc$ ck detect soft:compiler.gcc $ ck detect soft:compiler.llvm$ ck detect soft:compiler.llvm $ ck list soft:compiler*$ ck list soft:compiler* $ ck detect soft:lib.cublas$ ck detect soft:lib.cublas Env entries are created in CK local repo for all found software instances together with their meta and an auto-generated environment script env.sh (on Linux) or env.bat (on Windows) Package entries describe how to install a given software if it is not already installed (using CK Python plugin together with install.sh script on Linux host or install.bat on Windows host) $ ck install package:caffemodel-bvlc-googlenet$ ck install package:caffemodel-bvlc-googlenet $ ck install package:imagenet-2012-val$ ck install package:imagenet-2012-val $ ck install package:lib-tensorflow-cuda$ ck install package:lib-tensorflow-cuda $ ck list package:*caffemodel*$ ck list package:*caffemodel* LocalCKrepoLocalCKrepo $ ck search soft --tags=blas$ ck search soft --tags=blas $ ck show env$ ck show env $ ck show env –tags=cublas$ ck show env –tags=cublas $ ck rm env:* –tags=cublas$ ck rm env:* –tags=cublas $ ck search package –tags=caffe$ ck search package –tags=caffe $ ck list package:*tensorflow*$ ck list package:*tensorflow* $ ck install package:lib-caffe-bvlc-master-cuda-universal$ ck install package:lib-caffe-bvlc-master-cuda-universal https://github.com/ctuning/ck/wiki/Portable-workflows Multiple versions of tools may easily co-exist and plugged in to CK workflows!
  • 18. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((18 of 24)of 24) Applying methodology from natural sciences to optimize computer systems https://github.com/ctuning/ck/wiki/Autotuning CK Python modules (wrappers) with a unified JSON API CKinput(JSON/dict) CKoutput(JSON/dict) Unified input BehaviorBehavior ChoicesChoices FeaturesFeatures StateState ActionAction Unified output BehaviorBehavior ChoicesChoices FeaturesFeatures StateState b = B( c , f , s ) … … … … Formalized function B of a behavior of any CK object Flattened CK JSON vectors (dict converted to vector) to simplify statistical analysis, machine learning and data mining Some actions Tools (compilers, profilers, etc)Tools (compilers, profilers, etc) Generated filesGenerated files Chain CK modules to implement research workflows such as multi-objective autotuning and co-design exploration Choose exploration strategy Perform SW/HW DSEPerform SW/HW DSE (math transforms, skeleton params, compiler flags, transformations …) PerformPerform stat. analysis Detect (Pareto) frontier Model optimizations Model behavior, predict optimizations Reduce complexity SetSet environment for a given tool version CK program module with pipeline function CompileCompile program Run code i i i i First expose coarse grain high-level choices, features, system state and behavior characteristics Crowdsource benchmarking and random exploration across diverse inputs and devices; Keep best species (AI/SW/HW choices); model behavior; predict better optimizations and designs
  • 19. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((19 of 24)of 24) Prepare first proof-of-concept community experiments Available libraries / skeletonsAvailable libraries / skeletons CompilersCompilers Binary or byte codeBinary or byte code Hardware, simulators Hardware, simulators Run-time environmentRun-time environment Run-time stateRun-time state of the system InputsInputs Various modelsVarious models Algorithm / source codeAlgorithm / source code AI frameworkAI framework Algorithms: object classification, object detection AI frameworks: Caffe CPU, Caffe OpenCL, TensorFlow CPU Math libraries: OpenBLAS, ViennaCL, clBLAS, CLBlast, cuBLAS, cuDNN, Eigen, gemmlowp Compilers: GCC 5+ Models: AlexNet, GoogleNet, VGG, ResNet, SqueezeNet, SqueezeDet, SSD Datasets: KITTI, COCO, VOC, ImageNet Optimization choices: batch size, number of CPU threads Characteristics: total execution time (including OpenCL overheads), top1/top5 model accuracy, static model size (MB), device cost, max power consumption (if available) System state: CPU/GPU frequency, memory cKnowledge.org/repo
  • 20. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((20 of 24)of 24) Crowdsource benchmarking across Android devices provided by volunteers Continuously collect statistics, bugs and misclassifications at cKnowledge.org/repo The number of distinct participated platforms:800+ The number of distinct CPUs: 260+ The number of distinct GPUs: 110+ The number of distinct OS: 280+ Power range: 1-10W No need for a dedicated and expensive cloud – volunteers help us validate research ideas similar to SETI@HOME Also collecting real images from users for misclassifications to build an open and continuously updated training set)! Winning solutions on various frontiers Timeperimage(seconds) Cost(euros)
  • 21. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((21 of 24)of 24) Crowdsource benchmarking across Android devices provided by volunteers Continuously collect statistics, bugs and misclassifications at cKnowledge.org/repo Winning solutions on various frontiers Firefly-RK3399 The number of distinct participated platforms:790+ The number of distinct CPUs: 260+ The number of distinct GPUs: 110+ The number of distinct OS: 280+ Power range: 1-10W No need for a dedicated and expensive cloud – volunteers help us validate research ideas similar to SETI@HOME Also collecting real images from users for misclassifications to build an open and continuously updated training set)! Timeperimage(seconds) Cost(euros)
  • 22. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((22 of 24)of 24) Let's dig further – (crowdsource) BLAS autotuning in Caffe on Firefly-RK3399 Collaboration between Marco Cianfriglia (Roma Tre University), Cedric Nugteren (TomTom), Flavio Vella, Anton Lokhmotov and Grigori Fursin (dividiti) Name Description Ranges KWG 2D tiling at workgroup level {32,64} KWI KWG kernel-loop can be unrolled by a factor KWI {1} MDIMA Local Memory Re-shape {4,8} MDIMC Local Memory Re-shape {8, 16, 32} MWG 2D tiling at workgroup level {32, 64, 128} NDIMB Local Memory Re-shape {8, 16, 32} NDIMC Local Memory Re-shape {8, 16, 32} NWG 2D tiling at workgroup level {16, 32} SA manual caching using the local memory {0, 1} SB manual caching using the local memory {0, 1} STRM Striding within single thread for matrix A and C {0,1} STRN Striding within single thread for matrix B {0,1} VWM Vector width for loading A and C {8,16} VWN Vector width for loading B {0,1} Tunable parameters of OpenCL-based BLAS ( github.com/CNugteren/CLBlast ) For now only two data sets (small & large) Some extra constraints to avoid illegal combinations Use different autotuners under CK to speed up design space exploration based on probabilistic focused search, generic algorithms, deep learning, SVM, KNN, MARS, decision trees …
  • 23. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((23 of 24)of 24) Let's dig further – autotuning BLAS (CLBlast) in Caffe on Firefly-RK3399 • Caffe with autotuned OpenBLAS (threads and batches) is the fastest • Caffe with autotuned CLBlast is 6..7x faster than default version and competitive with OpenBLAS-based version– now worth making adaptive selection at run-time. Sharing results in a reproducible way with the community for validation and improvement: https://nbviewer.jupyter.org/github/dividiti/ck-caffe-firefly-rk3399/ blob/master/script/batch_size-libs-models/analysis.20170531.ipynb
  • 24. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((24 of 24)of 24) • Bring together industry and academia to participate in open and reproducible AI/SW/HW co-design competitions using CK framework • Share more artifacts, workflows and results in a reusable and customizable CK format (common JSON API and meta description) • Collaboratively improve models and find missing features • Gradually expose more design and optimization knobs at all AI/SW/HW levels • Enable distributed on-line learning for self-optimizing and self-learning systems http://cKnowledge.org/partners http://cKnowledge.org/publications Join the growing Collective Knowledge community!