Slides from ARM's Research Summit'17 about "Community-Driven and Knowledge-Guided Optimization of AI Applications Across the Whole SW/HW Stack" (http://cKnowledge.org/repo , http://cKnowledge.org/ai , http://tinyurl.com/zlbxvmw , https://developer.arm.com/research/summit )
Co-designing the whole AI/SW/HW stack in terms of speed, accuracy, energy consumption, size, costs and other metrics has become extremely complex, long and costly. With no rigorous methodology for analyzing performance and accumulating optimisation knowledge, we are simply destined to drown in the ever growing number of design choices, system
features and conflicting optimisation goals.
We present our novel community-driven approach to solve the above problems. Originating from natural sciences, this approach is embodied in Collective Knowledge (CK), our open-source cross-platform workflow framework and repository for automatic, collaborative and reproducible experimentation. CK helps organize, unify and share representative workloads, data sets, AI frameworks, libraries, compilers, scripts, models and other artifacts as customizable and reusable components with a common JSON API.
CK helps bring academia, industry and end-users together to
gradually expose optimisation choices at all levels (e.g. from parameterized models and algorithmic skeletons to compiler
flags and hardware configurations) and autotune them across diverse inputs and platforms. Optimization knowledge gets continuously aggregated in public or private repositories such as cKnowledge.org/repo in a reproducible way, and can be then mined and extrapolated to predict better AI algorithm choices, compiler transformations and hardware designs.
We also demonstrate how we use this approach in practice together with ARM and other companies to adapt to a Cambrian AI/SW/HW explosion by creating an open repository of reusable AI artifacts, and then collaboratively optimising and co-designing the whole deep learning stack (software, hardware and models).
Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions and Collective Knowledge
1. Community-Driven and Knowledge-Guided Optimization
of AI Applications Across the Whole SW/HW Stack
or how to adapt to a Cambrian explosion inor how to adapt to a Cambrian explosion in AI / SW / HWAI / SW / HW ……
ARM Research SummitARM Research Summit
Cambridge, September 2017Cambridge, September 2017
Grigori FursinGrigori Fursin
CTO and coCTO and co--founder, dividiti, UKfounder, dividiti, UK
Chief Scientist, cTuning foundationChief Scientist, cTuning foundation
… with cKnowledge.org and open co… with cKnowledge.org and open co--design competitionsdesign competitions
2. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((2 of 24)of 24)
A race to develop innovative AI products and systems (SW & HW) …A race to develop innovative AI products and systems (SW & HW) …
Various form factors:
IoT, mobile, data centers, supercomputers
Various constraints:
speed, energy, accuracy, size, resiliency, costs
3. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((3 of 24)of 24)
… leads to a Cambrian AI/SW/HW explosion and technological chaos… leads to a Cambrian AI/SW/HW explosion and technological chaos
4. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((4 of 24)of 24)
Which AI/SW/HW solutions will survive?Which AI/SW/HW solutions will survive?
AI users
We at dividiti.com perform
competitive analysis
and optimization
of the whole AI/SW/HW stack
for various realistic scenarios
(object detection,
image classification, etc)
5. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((5 of 24)of 24)
Scenario: image classification on mobile devices
800+ distinct mobile devices
mobile CPUs and GPUs
Caffe, TensorFlow
OpenBLAS, CLBlast, ViennaCL, Eigen
AlexNet, GoogleNet, SqueezeNet
ImageNet and user images
Requirement: speed vs cost
(vs energy vs accuracy
vs model size
vs memory usage
vs reliability…)
Price (euros)
Executiontime(sec)
Just a few winning "AI+SW+HW species"
must be optimized further
or may "extinct"
Obtained using our CK-based Android app to crowdsource experiments
across devices provided by volunteers (later in the talk)
cKnowledge.org/repo cKnowledge.org/ai
6. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((6 of 24)of 24)
Optimization is adOptimization is ad--hoc, tedious, expensive and time consuminghoc, tedious, expensive and time consuming
Mobile device ServerMobile device Server
Data centersData centers
Available libraries / skeletonsAvailable libraries / skeletons
CompilersCompilers
Binary or byte codeBinary or byte code
Hardware,
simulators
Hardware,
simulators
Run-time environmentRun-time environment
Run-time stateRun-time state
of the system
InputsInputs
Existing frameworks / algorithmsExisting frameworks / algorithms
Various modelsVarious models
User front-end (cloud, GRID,User front-end (cloud, GRID,
supercomputer, etc)
Algorithm / source codeAlgorithm / source code
Microsoft Azure, AWS, Google Cloud, XSEDE, PRACE, Watson…
100s of models for TensorFlow,Caffe,Torch,Theano,MxNet,CNTK100s of models for TensorFlow,Caffe,Torch,Theano,MxNet,CNTK
CUDA, MPI, OpenMP, TBB, OpenCL, StarPU, OmpSs …
C,C++,Fortran,Java,Python,byte code, assembler …
LLVM,GCC,ICC,Rose,PGI,Lift ,functional programming …
cuBLAS, BLAS,MAGMA,ViennaCL,CLBlast,cuDNN, openBLAS,
clBLAS, libDNN, tinyDNN,ARM compute lib, libxsmm, skeletons
diverse hardware: heterogeneous, out-of-order, caches
(ARM,x86,CUDA,Mali,Adreno,Power,TPU,FPGA,MIPS,AVX,neon)
Linux (CentOS, Ubuntu, RedHat, SUSE, Debian),
Android, Windows, BSD, iOS, MacOS …
Too many design and optimization choices at each level of continuously changing SW/HW stack!
7. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((7 of 24)of 24)
Mobile device ServerMobile device Server
Data centersData centers
Available libraries / skeletonsAvailable libraries / skeletons
CompilersCompilers
Binary or byte codeBinary or byte code
Hardware,
simulators
Hardware,
simulators
Run-time environmentRun-time environment
Run-time stateRun-time state
of the system
InputsInputs
Existing frameworks / algorithmsExisting frameworks / algorithms
Various modelsVarious models
User front-end (cloud, GRID,User front-end (cloud, GRID,
supercomputer, etc)
Algorithm / source codeAlgorithm / source code
Microsoft Azure, AWS, Google Cloud, XSEDE, PRACE, Watson…
Hundreds of models for TF, Caffe, Torch, Theano, MxNet, CNTK
CUDA, MPI, OpenMP, TBB, OpenCL, StarPU, OmpSs …
C,C++,Fortran,Java,Python,byte code, assembler …
LLVM,GCC,ICC,Rose,PGI,Lift , functional programming …
cuBLAS, BLAS,MAGMA,ViennaCL,CLBlast,cuDNN, openBLAS,
clBLAS, libDNN, tinyDNN,ARM compute lib, libxsmm, skeletons
diverse hardware: heterogeneous, out-of-order, caches
(ARM,x86,CUDA,Mali,Adreno,Power,TPU,FPGA,MIPS,AVX,neon)
Linux (CentOS, Ubuntu, RedHat, SUSE, Debian),
Android, Windows, BSD, iOS, MacOS …
Time to reinvent computer engineering
and enable open, collaborative and reproducible AI/SW/HW co-design!
Time to reinvent computer engineering
and enable open, collaborative and reproducible AI/SW/HW co-design!
Optimization is adOptimization is ad--hoc, tedious, expensive and time consuminghoc, tedious, expensive and time consuming
Too many design and optimization choices at each level of continuosly changing SW/HW stack!
8. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((8 of 24)of 24)
cKnowledge.org:cKnowledge.org: pluginplugin--based workflow framework to cobased workflow framework to co--design AI/SW/HW stackdesign AI/SW/HW stack
Grigori Fursin, Anton Lokhmotov, Ed Plowman, "Collective Knowledge: towards R&D sustainability", DATE'16
Available libraries / skeletonsAvailable libraries / skeletons
CompilersCompilers
Binary or byte codeBinary or byte code
Hardware,
simulators
Hardware,
simulators
Run-time environmentRun-time environment
Run-time stateRun-time state
of the system
InputsInputs Various modelsVarious models
Algorithm / source codeAlgorithm / source code
AI frameworkAI framework
Common JSON APICommon JSON API
Initial funding (2015)
Common experimental framework
for computer engineering and AI research
https://github.com/ctuning/ck
9. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((9 of 24)of 24)
Repositories with reusable and customizable artifacts (JSON API and meta info)Repositories with reusable and customizable artifacts (JSON API and meta info)
Unified modelsUnified models
CK JSON APICK JSON API
CK metaCK metaMobileNets
GoogleNetGoogleNet
AlexNet
SqueezeNetSqueezeNet
ResNetResNet
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK meta
AI frameworksAI frameworks
CK JSON APICK JSON API
CK metaCK metaTensorFlow
Caffe
Caffe2
CNTK
MxNetMxNet
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK meta
… …
…
Available libraries / skeletonsAvailable libraries / skeletons
CompilersCompilers
Binary or byte codeBinary or byte code
Hardware,
simulators
Hardware,
simulators
Run-time environmentRun-time environment
Run-time stateRun-time state
of the system
InputsInputs Various modelsVarious models
Algorithm / source codeAlgorithm / source code
AI frameworkAI framework
Common JSON APICommon JSON API
10. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((10 of 24)of 24)
Unified modelsUnified models
CK JSON APICK JSON API
AI frameworksAI frameworks
CK JSON APICK JSON API
… …
CK
API
CK
API
Image
classification
Image
classification
CK
API
CK
API
Object
detection
Object
detection
CK
API
CK
API
EmotionEmotion
analysis
Available libraries / skeletonsAvailable libraries / skeletons
CompilersCompilers
Binary or byte codeBinary or byte code
Hardware,
simulators
Hardware,
simulators
Run-time environmentRun-time environment
Run-time stateRun-time state
of the system
InputsInputs Various modelsVarious models
Algorithm / source codeAlgorithm / source code
AI frameworkAI framework
Common JSON APICommon JSON API
Repositories with reusable and customizable workflows (JSON API)Repositories with reusable and customizable workflows (JSON API)
CK metaCK metaMobileNets
GoogleNetGoogleNet
AlexNet
SqueezeNetSqueezeNet
ResNetResNet
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK metaTensorFlow
Caffe
Caffe2
CNTK
MxNetMxNet
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK meta
…
11. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((11 of 24)of 24)
Available libraries / skeletonsAvailable libraries / skeletons
CompilersCompilers
Binary or byte codeBinary or byte code
Hardware,
simulators
Hardware,
simulators
Run-time environmentRun-time environment
Run-time stateRun-time state
of the system
InputsInputs Various modelsVarious models
Algorithm / source codeAlgorithm / source code
AI frameworkAI framework
Common JSON APICommon JSON API
Unified modelsUnified models
CK JSON APICK JSON API
AI frameworksAI frameworks
CK JSON APICK JSON API
… …
CK
API
CK
API
Image
classification
Image
classification
CK
API
CK
API
Object
detection
Object
detection
CK
API
CK
API
EmotionEmotion
analysis
Crowdsource AI expeirments
across diverse platforms
provided by volunteers
ContinuousContinuous competition ofcompetition of various AI/SW/HW combinationsvarious AI/SW/HW combinations ((species)species)
cKnowledge.org/repo
Everyone is on the same page:
fair and reproducible competitions
CK metaCK metaMobileNets
GoogleNetGoogleNet
AlexNet
SqueezeNetSqueezeNet
ResNetResNet
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK metaTensorFlow
Caffe
Caffe2
CNTK
MxNetMxNet
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK meta
…
12. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((12 of 24)of 24)
CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components
setupsetup softsoft
findfind
extract featuresextract features
datasetdataset
compilecompile
runrun
addadd
replayreplay
experimentexperiment
autotuneautotune
programprogram
TensorFlowTensorFlow
Caffe2Caffe2
ARM compute libARM compute lib
image classificationimage classification
object detectionobject detection
ImageNetImageNet
Car video streamCar video stream
Real surveillance cameraReal surveillance camera
GEMM OpenCLGEMM OpenCL
convolution CPUconvolution CPU
performance resultsperformance results
training / accuracytraining / accuracy
bugsbugs
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
Ad-hoc scripts to perform some actions on some artifacts
13. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((13 of 24)of 24)
CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components
setup soft
find
extract featuresextract features
dataset
compile
run
add
replay
experiment
autotune
program
TensorFlowTensorFlow
Caffe2Caffe2
ARM compute libARM compute lib
image classificationimage classification
object detectionobject detection
ImageNetImageNet
Car video streamCar video stream
Real surveillance cameraReal surveillance camera
GEMM OpenCLGEMM OpenCL
convolution CPUconvolution CPU
performance resultsperformance results
training / accuracytraining / accuracy
bugsbugs
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
/ 1st level directory – CK modules / 2nd level dir - CK entries / CK meta info
Python modulePython moduleJSON APIJSON API holder for original artifactholder for original artifact CK metaCK meta
14. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((14 of 24)of 24)
CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components
setup soft
find
extract featuresextract features
dataset
compile
run
add
replay
experiment
autotune
program
TensorFlowTensorFlow
Caffe2Caffe2
ARM compute libARM compute lib
image classificationimage classification
object detectionobject detection
ImageNetImageNet
Car video streamCar video stream
Real surveillance cameraReal surveillance camera
GEMM OpenCLGEMM OpenCL
convolution CPUconvolution CPU
performance resultsperformance results
training / accuracytraining / accuracy
bugsbugs
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
/ 1st level directory – CK modules / 2nd level dir - CK entries / CK meta info
Python modulePython moduleJSON APIJSON API holder for original artifactholder for original artifact CK metaCK meta
Collective Knowledge (github.com/ctuning/ck) –
$
$ ck pull
$ ck add
$ ck compile
$ ck run
Collective Knowledge (github.com/ctuning/ck) –
assists you in unifying, executing, sharing and reusing your artifacts:
$ sudo pip install ck
$ ck pull repo:ck-autotuning
$ ck add dataset:my-new-dataset (UID will be automatically generated)
$ ck compile program:cbench-automotive-susan
$ ck run program:cbench-automotive-susan
https://github.com/ctuning/ck/wiki/Shared-modules
15. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((15 of 24)of 24)
We already converted multiple AI frameworks, artifacts and workflows to the CKWe already converted multiple AI frameworks, artifacts and workflows to the CK
ICC 17.0
CUDA 8.0CUDA 8.0
GCC 7.0
LLVM 4.0
Databases, local repositoriesDatabases, local repositories
Ad-hocinitAd-hocinit
scripts
Ad-hoc
scripts to
process CSV,
XLS, TXT, etc.
Ad-hoc experimental workflows
ProgramProgramCKprogram
CKpipeline
CK
compiler
CK AI
framework
CK math
library CK experiment
Caffe
OpenCL
Caffe CUDACaffe CUDA
TensorFlowTensorFlow
CPU/CUDA
MAGMA
cuBLAS
OpenBLASOpenBLAS
ViennaCL
CLBlast Stat. analysis,
predictive
analytics,
visualization
• github.com/dividiti/ck-caffe
• github.com/ctuning/ck-caffe2
• github.com/ctuning/ck-tensorflow
$ ck pull repo –url= github.com/dividiti/ck-caffe
$ ck compile program:caffe-classification
$ ck run program:caffe-classification
https://github.com/ctuning/ck/wiki/Shared-repos
16. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((16 of 24)of 24)
We've already converted multiple AI frameworks, artifacts and workflows to the CKWe've already converted multiple AI frameworks, artifacts and workflows to the CK
ICC 17.0
CUDA 8.0CUDA 8.0
GCC 7.0
LLVM 4.0
Databases, local repositoriesDatabases, local repositories
Ad-hocinitAd-hocinit
scripts
Ad-hoc
scripts to
process CSV,
XLS, TXT, etc.
UnifiedAPI(input)UnifiedAPI(input)
Read
program
Read
program
meta
Detect all softwareDetect all software
dependencies; ask user
If multiple versions exists
Prepare
environment
CompileCompile
program
Run
program
UnifiedAPI(output)UnifiedAPI(output)
Ad-hoc experimental workflows
ProgramProgramCKprogram
CKpipeline
CK
compiler
CK AI
framework
CK math
library CK experiment
JSONJSON
CK program module can automatically adapt
to underlying environment via dependencies
Source files and auxiliary scriptsSource files and auxiliary scripts
CK program entry (native directory)CK program entry (native directory)
.cm/meta.json – describes soft dependencies ,
data sets, and how to compile and run this program
.cm/meta.json – describes soft dependencies ,
data sets, and how to compile and run this program
CK entries associated with a given
module describe a given object
using meta.json while storing all
necessary files and sub-directories
Caffe
OpenCL
Caffe CUDACaffe CUDA
TensorFlowTensorFlow
CPU/CUDA
MAGMA
cuBLAS
OpenBLASOpenBLAS
ViennaCL
CLBlast Stat. analysis,
predictive
analytics,
visualization
• github.com/dividiti/ck-caffe
• github.com/ctuning/ck-caffe2
• github.com/ctuning/ck-tensorflow
$ ck pull repo –url= github.com/dividiti/ck-caffe
$ ck compile program:caffe-classification
$ ck run program:caffe-classification
17. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((17 of 24)of 24)
Automatically adapting workflow to any underlying software and hardware
local / env / 03ca0be16962f471 / env.sh
Tags: compiler,cuda,v8.0
local / env / 03ca0be16962f471 / env.sh
Tags: compiler,cuda,v8.0
local / env / 0a5ba198d48e3af3 / env.bat
Tags: lib,blas,cublas,v8.0
local / env / 0a5ba198d48e3af3 / env.bat
Tags: lib,blas,cublas,v8.0
Soft entries in CK describe how
to detect if a given software is
already installed, how to set up
all its environment including
all paths (to binaries, libraries,
include, aux tools, etc),
and how to detect its version
$ ck detect soft --tags=compiler,cuda$ ck detect soft --tags=compiler,cuda
$ ck detect soft:compiler.gcc$ ck detect soft:compiler.gcc
$ ck detect soft:compiler.llvm$ ck detect soft:compiler.llvm
$ ck list soft:compiler*$ ck list soft:compiler*
$ ck detect soft:lib.cublas$ ck detect soft:lib.cublas
Env entries are created in CK local
repo for all found software
instances together with their meta
and an auto-generated environment
script env.sh (on Linux) or env.bat
(on Windows)
Package entries describe how to
install a given software if it is not
already installed (using CK Python
plugin together with install.sh
script on Linux host or install.bat
on Windows host)
$ ck install package:caffemodel-bvlc-googlenet$ ck install package:caffemodel-bvlc-googlenet
$ ck install package:imagenet-2012-val$ ck install package:imagenet-2012-val
$ ck install package:lib-tensorflow-cuda$ ck install package:lib-tensorflow-cuda
$ ck list package:*caffemodel*$ ck list package:*caffemodel*
LocalCKrepoLocalCKrepo
$ ck search soft --tags=blas$ ck search soft --tags=blas
$ ck show env$ ck show env
$ ck show env –tags=cublas$ ck show env –tags=cublas
$ ck rm env:* –tags=cublas$ ck rm env:* –tags=cublas
$ ck search package –tags=caffe$ ck search package –tags=caffe
$ ck list package:*tensorflow*$ ck list package:*tensorflow* $ ck install package:lib-caffe-bvlc-master-cuda-universal$ ck install package:lib-caffe-bvlc-master-cuda-universal
https://github.com/ctuning/ck/wiki/Portable-workflows
Multiple versions of tools may easily co-exist and plugged in to CK workflows!
18. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((18 of 24)of 24)
Applying methodology from natural sciences to optimize computer systems
https://github.com/ctuning/ck/wiki/Autotuning
CK Python modules (wrappers) with a unified JSON API
CKinput(JSON/dict)
CKoutput(JSON/dict)
Unified input
BehaviorBehavior
ChoicesChoices
FeaturesFeatures
StateState
ActionAction
Unified output
BehaviorBehavior
ChoicesChoices
FeaturesFeatures
StateState
b = B( c , f , s )
… … … …
Formalized function B
of a behavior of any CK object
Flattened CK JSON vectors
(dict converted to vector)
to simplify statistical analysis,
machine learning
and data mining
Some
actions
Tools (compilers, profilers, etc)Tools (compilers, profilers, etc) Generated filesGenerated files
Chain CK modules to implement research workflows such as multi-objective autotuning and co-design
exploration
Choose
exploration
strategy
Perform SW/HW DSEPerform SW/HW DSE
(math transforms,
skeleton params,
compiler flags,
transformations …)
PerformPerform
stat.
analysis
Detect
(Pareto)
frontier
Model
optimizations
Model
behavior,
predict
optimizations
Reduce
complexity
SetSet
environment
for a given
tool version
CK program module
with pipeline function
CompileCompile
program
Run
code
i
i
i i
First expose coarse grain high-level choices, features, system state and behavior characteristics
Crowdsource benchmarking and random exploration across diverse inputs and devices;
Keep best species (AI/SW/HW choices); model behavior; predict better optimizations and designs
19. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((19 of 24)of 24)
Prepare first proof-of-concept community experiments
Available libraries / skeletonsAvailable libraries / skeletons
CompilersCompilers
Binary or byte codeBinary or byte code
Hardware,
simulators
Hardware,
simulators
Run-time environmentRun-time environment
Run-time stateRun-time state
of the system
InputsInputs Various modelsVarious models
Algorithm / source codeAlgorithm / source code
AI frameworkAI framework
Algorithms: object classification, object detection
AI frameworks:
Caffe CPU, Caffe OpenCL, TensorFlow CPU
Math libraries:
OpenBLAS, ViennaCL, clBLAS, CLBlast, cuBLAS, cuDNN,
Eigen, gemmlowp
Compilers: GCC 5+
Models:
AlexNet, GoogleNet, VGG, ResNet,
SqueezeNet, SqueezeDet, SSD
Datasets: KITTI, COCO, VOC, ImageNet
Optimization choices: batch size, number of CPU threads
Characteristics:
total execution time (including OpenCL overheads),
top1/top5 model accuracy, static model size (MB),
device cost, max power consumption (if available)
System state: CPU/GPU frequency, memory
cKnowledge.org/repo
20. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((20 of 24)of 24)
Crowdsource benchmarking across Android devices provided by volunteers
Continuously collect statistics, bugs and misclassifications at cKnowledge.org/repo
The number of distinct participated platforms:800+
The number of distinct CPUs: 260+
The number of distinct GPUs: 110+
The number of distinct OS: 280+
Power range: 1-10W
No need for a dedicated and expensive cloud –
volunteers help us validate research ideas
similar to SETI@HOME
Also collecting real images from users
for misclassifications to build an open
and continuously updated training set)!
Winning solutions
on various frontiers
Timeperimage(seconds)
Cost(euros)
21. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((21 of 24)of 24)
Crowdsource benchmarking across Android devices provided by volunteers
Continuously collect statistics, bugs and misclassifications at cKnowledge.org/repo
Winning solutions
on various frontiers
Firefly-RK3399
The number of distinct participated platforms:790+
The number of distinct CPUs: 260+
The number of distinct GPUs: 110+
The number of distinct OS: 280+
Power range: 1-10W
No need for a dedicated and expensive cloud –
volunteers help us validate research ideas
similar to SETI@HOME
Also collecting real images from users
for misclassifications to build an open
and continuously updated training set)!
Timeperimage(seconds)
Cost(euros)
22. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((22 of 24)of 24)
Let's dig further – (crowdsource) BLAS autotuning in Caffe on Firefly-RK3399
Collaboration between Marco Cianfriglia (Roma Tre University), Cedric Nugteren (TomTom),
Flavio Vella, Anton Lokhmotov and Grigori Fursin (dividiti)
Name Description Ranges
KWG 2D tiling at workgroup level {32,64}
KWI KWG kernel-loop can be unrolled by a factor KWI {1}
MDIMA Local Memory Re-shape {4,8}
MDIMC Local Memory Re-shape {8, 16, 32}
MWG 2D tiling at workgroup level {32, 64, 128}
NDIMB Local Memory Re-shape {8, 16, 32}
NDIMC Local Memory Re-shape {8, 16, 32}
NWG 2D tiling at workgroup level {16, 32}
SA manual caching using the local memory {0, 1}
SB manual caching using the local memory {0, 1}
STRM Striding within single thread for matrix A and C {0,1}
STRN Striding within single thread for matrix B {0,1}
VWM Vector width for loading A and C {8,16}
VWN Vector width for loading B {0,1}
Tunable parameters of OpenCL-based BLAS ( github.com/CNugteren/CLBlast )
For now only two data sets (small & large)
Some extra constraints
to avoid illegal
combinations
Use different autotuners
under CK to speed up
design space exploration
based on probabilistic
focused search,
generic algorithms,
deep learning, SVM, KNN,
MARS, decision trees …
23. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((23 of 24)of 24)
Let's dig further – autotuning BLAS (CLBlast) in Caffe on Firefly-RK3399
• Caffe with autotuned OpenBLAS (threads and batches) is the fastest
• Caffe with autotuned CLBlast is 6..7x faster than default version and competitive with
OpenBLAS-based version– now worth making adaptive selection at run-time.
Sharing results in a reproducible way with the community for validation and improvement:
https://nbviewer.jupyter.org/github/dividiti/ck-caffe-firefly-rk3399/
blob/master/script/batch_size-libs-models/analysis.20170531.ipynb
24. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((24 of 24)of 24)
• Bring together industry and academia to participate in open
and reproducible AI/SW/HW co-design competitions using CK framework
• Share more artifacts, workflows and results in a reusable
and customizable CK format (common JSON API and meta description)
• Collaboratively improve models and find missing features
• Gradually expose more design and optimization knobs at all AI/SW/HW levels
• Enable distributed on-line learning for self-optimizing and self-learning systems
http://cKnowledge.org/partners http://cKnowledge.org/publications
Join the growing Collective Knowledge community!