Community-Driven and Knowledge-Guided Optimization
of AI Applications Across the Whole SW/HW Stack
or how to adapt to a Ca...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping ...
Prochain SlideShare
Chargement dans…5
×

Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions and Collective Knowledge

355 vues

Publié le

Slides from ARM's Research Summit'17 about "Community-Driven and Knowledge-Guided Optimization of AI Applications Across the Whole SW/HW Stack" (http://cKnowledge.org/repo , http://cKnowledge.org/ai , http://tinyurl.com/zlbxvmw , https://developer.arm.com/research/summit )

Co-designing the whole AI/SW/HW stack in terms of speed, accuracy, energy consumption, size, costs and other metrics has become extremely complex, long and costly. With no rigorous methodology for analyzing performance and accumulating optimisation knowledge, we are simply destined to drown in the ever growing number of design choices, system
features and conflicting optimisation goals.

We present our novel community-driven approach to solve the above problems. Originating from natural sciences, this approach is embodied in Collective Knowledge (CK), our open-source cross-platform workflow framework and repository for automatic, collaborative and reproducible experimentation. CK helps organize, unify and share representative workloads, data sets, AI frameworks, libraries, compilers, scripts, models and other artifacts as customizable and reusable components with a common JSON API.

CK helps bring academia, industry and end-users together to
gradually expose optimisation choices at all levels (e.g. from parameterized models and algorithmic skeletons to compiler
flags and hardware configurations) and autotune them across diverse inputs and platforms. Optimization knowledge gets continuously aggregated in public or private repositories such as cKnowledge.org/repo in a reproducible way, and can be then mined and extrapolated to predict better AI algorithm choices, compiler transformations and hardware designs.

We also demonstrate how we use this approach in practice together with ARM and other companies to adapt to a Cambrian AI/SW/HW explosion by creating an open repository of reusable AI artifacts, and then collaboratively optimising and co-designing the whole deep learning stack (software, hardware and models).

Publié dans : Données & analyses
  • Soyez le premier à commenter

Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions and Collective Knowledge

  1. 1. Community-Driven and Knowledge-Guided Optimization of AI Applications Across the Whole SW/HW Stack or how to adapt to a Cambrian explosion inor how to adapt to a Cambrian explosion in AI / SW / HWAI / SW / HW …… ARM Research SummitARM Research Summit Cambridge, September 2017Cambridge, September 2017 Grigori FursinGrigori Fursin CTO and coCTO and co--founder, dividiti, UKfounder, dividiti, UK Chief Scientist, cTuning foundationChief Scientist, cTuning foundation … with cKnowledge.org and open co… with cKnowledge.org and open co--design competitionsdesign competitions
  2. 2. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((2 of 24)of 24) A race to develop innovative AI products and systems (SW & HW) …A race to develop innovative AI products and systems (SW & HW) … Various form factors: IoT, mobile, data centers, supercomputers Various constraints: speed, energy, accuracy, size, resiliency, costs
  3. 3. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((3 of 24)of 24) … leads to a Cambrian AI/SW/HW explosion and technological chaos… leads to a Cambrian AI/SW/HW explosion and technological chaos
  4. 4. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((4 of 24)of 24) Which AI/SW/HW solutions will survive?Which AI/SW/HW solutions will survive? AI users We at dividiti.com perform competitive analysis and optimization of the whole AI/SW/HW stack for various realistic scenarios (object detection, image classification, etc)
  5. 5. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((5 of 24)of 24) Scenario: image classification on mobile devices 800+ distinct mobile devices mobile CPUs and GPUs Caffe, TensorFlow OpenBLAS, CLBlast, ViennaCL, Eigen AlexNet, GoogleNet, SqueezeNet ImageNet and user images Requirement: speed vs cost (vs energy vs accuracy vs model size vs memory usage vs reliability…) Price (euros) Executiontime(sec) Just a few winning "AI+SW+HW species" must be optimized further or may "extinct" Obtained using our CK-based Android app to crowdsource experiments across devices provided by volunteers (later in the talk) cKnowledge.org/repo cKnowledge.org/ai
  6. 6. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((6 of 24)of 24) Optimization is adOptimization is ad--hoc, tedious, expensive and time consuminghoc, tedious, expensive and time consuming Mobile device ServerMobile device Server Data centersData centers Available libraries / skeletonsAvailable libraries / skeletons CompilersCompilers Binary or byte codeBinary or byte code Hardware, simulators Hardware, simulators Run-time environmentRun-time environment Run-time stateRun-time state of the system InputsInputs Existing frameworks / algorithmsExisting frameworks / algorithms Various modelsVarious models User front-end (cloud, GRID,User front-end (cloud, GRID, supercomputer, etc) Algorithm / source codeAlgorithm / source code Microsoft Azure, AWS, Google Cloud, XSEDE, PRACE, Watson… 100s of models for TensorFlow,Caffe,Torch,Theano,MxNet,CNTK100s of models for TensorFlow,Caffe,Torch,Theano,MxNet,CNTK CUDA, MPI, OpenMP, TBB, OpenCL, StarPU, OmpSs … C,C++,Fortran,Java,Python,byte code, assembler … LLVM,GCC,ICC,Rose,PGI,Lift ,functional programming … cuBLAS, BLAS,MAGMA,ViennaCL,CLBlast,cuDNN, openBLAS, clBLAS, libDNN, tinyDNN,ARM compute lib, libxsmm, skeletons diverse hardware: heterogeneous, out-of-order, caches (ARM,x86,CUDA,Mali,Adreno,Power,TPU,FPGA,MIPS,AVX,neon) Linux (CentOS, Ubuntu, RedHat, SUSE, Debian), Android, Windows, BSD, iOS, MacOS … Too many design and optimization choices at each level of continuously changing SW/HW stack!
  7. 7. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((7 of 24)of 24) Mobile device ServerMobile device Server Data centersData centers Available libraries / skeletonsAvailable libraries / skeletons CompilersCompilers Binary or byte codeBinary or byte code Hardware, simulators Hardware, simulators Run-time environmentRun-time environment Run-time stateRun-time state of the system InputsInputs Existing frameworks / algorithmsExisting frameworks / algorithms Various modelsVarious models User front-end (cloud, GRID,User front-end (cloud, GRID, supercomputer, etc) Algorithm / source codeAlgorithm / source code Microsoft Azure, AWS, Google Cloud, XSEDE, PRACE, Watson… Hundreds of models for TF, Caffe, Torch, Theano, MxNet, CNTK CUDA, MPI, OpenMP, TBB, OpenCL, StarPU, OmpSs … C,C++,Fortran,Java,Python,byte code, assembler … LLVM,GCC,ICC,Rose,PGI,Lift , functional programming … cuBLAS, BLAS,MAGMA,ViennaCL,CLBlast,cuDNN, openBLAS, clBLAS, libDNN, tinyDNN,ARM compute lib, libxsmm, skeletons diverse hardware: heterogeneous, out-of-order, caches (ARM,x86,CUDA,Mali,Adreno,Power,TPU,FPGA,MIPS,AVX,neon) Linux (CentOS, Ubuntu, RedHat, SUSE, Debian), Android, Windows, BSD, iOS, MacOS … Time to reinvent computer engineering and enable open, collaborative and reproducible AI/SW/HW co-design! Time to reinvent computer engineering and enable open, collaborative and reproducible AI/SW/HW co-design! Optimization is adOptimization is ad--hoc, tedious, expensive and time consuminghoc, tedious, expensive and time consuming Too many design and optimization choices at each level of continuosly changing SW/HW stack!
  8. 8. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((8 of 24)of 24) cKnowledge.org:cKnowledge.org: pluginplugin--based workflow framework to cobased workflow framework to co--design AI/SW/HW stackdesign AI/SW/HW stack Grigori Fursin, Anton Lokhmotov, Ed Plowman, "Collective Knowledge: towards R&D sustainability", DATE'16 Available libraries / skeletonsAvailable libraries / skeletons CompilersCompilers Binary or byte codeBinary or byte code Hardware, simulators Hardware, simulators Run-time environmentRun-time environment Run-time stateRun-time state of the system InputsInputs Various modelsVarious models Algorithm / source codeAlgorithm / source code AI frameworkAI framework Common JSON APICommon JSON API Initial funding (2015) Common experimental framework for computer engineering and AI research https://github.com/ctuning/ck
  9. 9. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((9 of 24)of 24) Repositories with reusable and customizable artifacts (JSON API and meta info)Repositories with reusable and customizable artifacts (JSON API and meta info) Unified modelsUnified models CK JSON APICK JSON API CK metaCK metaMobileNets GoogleNetGoogleNet AlexNet SqueezeNetSqueezeNet ResNetResNet CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK meta AI frameworksAI frameworks CK JSON APICK JSON API CK metaCK metaTensorFlow Caffe Caffe2 CNTK MxNetMxNet CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK meta … … … Available libraries / skeletonsAvailable libraries / skeletons CompilersCompilers Binary or byte codeBinary or byte code Hardware, simulators Hardware, simulators Run-time environmentRun-time environment Run-time stateRun-time state of the system InputsInputs Various modelsVarious models Algorithm / source codeAlgorithm / source code AI frameworkAI framework Common JSON APICommon JSON API
  10. 10. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((10 of 24)of 24) Unified modelsUnified models CK JSON APICK JSON API AI frameworksAI frameworks CK JSON APICK JSON API … … CK API CK API Image classification Image classification CK API CK API Object detection Object detection CK API CK API EmotionEmotion analysis Available libraries / skeletonsAvailable libraries / skeletons CompilersCompilers Binary or byte codeBinary or byte code Hardware, simulators Hardware, simulators Run-time environmentRun-time environment Run-time stateRun-time state of the system InputsInputs Various modelsVarious models Algorithm / source codeAlgorithm / source code AI frameworkAI framework Common JSON APICommon JSON API Repositories with reusable and customizable workflows (JSON API)Repositories with reusable and customizable workflows (JSON API) CK metaCK metaMobileNets GoogleNetGoogleNet AlexNet SqueezeNetSqueezeNet ResNetResNet CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK metaTensorFlow Caffe Caffe2 CNTK MxNetMxNet CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK meta …
  11. 11. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((11 of 24)of 24) Available libraries / skeletonsAvailable libraries / skeletons CompilersCompilers Binary or byte codeBinary or byte code Hardware, simulators Hardware, simulators Run-time environmentRun-time environment Run-time stateRun-time state of the system InputsInputs Various modelsVarious models Algorithm / source codeAlgorithm / source code AI frameworkAI framework Common JSON APICommon JSON API Unified modelsUnified models CK JSON APICK JSON API AI frameworksAI frameworks CK JSON APICK JSON API … … CK API CK API Image classification Image classification CK API CK API Object detection Object detection CK API CK API EmotionEmotion analysis Crowdsource AI expeirments across diverse platforms provided by volunteers ContinuousContinuous competition ofcompetition of various AI/SW/HW combinationsvarious AI/SW/HW combinations ((species)species) cKnowledge.org/repo Everyone is on the same page: fair and reproducible competitions CK metaCK metaMobileNets GoogleNetGoogleNet AlexNet SqueezeNetSqueezeNet ResNetResNet CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK metaTensorFlow Caffe Caffe2 CNTK MxNetMxNet CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK meta …
  12. 12. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((12 of 24)of 24) CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components setupsetup softsoft findfind extract featuresextract features datasetdataset compilecompile runrun addadd replayreplay experimentexperiment autotuneautotune programprogram TensorFlowTensorFlow Caffe2Caffe2 ARM compute libARM compute lib image classificationimage classification object detectionobject detection ImageNetImageNet Car video streamCar video stream Real surveillance cameraReal surveillance camera GEMM OpenCLGEMM OpenCL convolution CPUconvolution CPU performance resultsperformance results training / accuracytraining / accuracy bugsbugs with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. Ad-hoc scripts to perform some actions on some artifacts
  13. 13. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((13 of 24)of 24) CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components setup soft find extract featuresextract features dataset compile run add replay experiment autotune program TensorFlowTensorFlow Caffe2Caffe2 ARM compute libARM compute lib image classificationimage classification object detectionobject detection ImageNetImageNet Car video streamCar video stream Real surveillance cameraReal surveillance camera GEMM OpenCLGEMM OpenCL convolution CPUconvolution CPU performance resultsperformance results training / accuracytraining / accuracy bugsbugs JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file / 1st level directory – CK modules / 2nd level dir - CK entries / CK meta info Python modulePython moduleJSON APIJSON API holder for original artifactholder for original artifact CK metaCK meta
  14. 14. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((14 of 24)of 24) CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components setup soft find extract featuresextract features dataset compile run add replay experiment autotune program TensorFlowTensorFlow Caffe2Caffe2 ARM compute libARM compute lib image classificationimage classification object detectionobject detection ImageNetImageNet Car video streamCar video stream Real surveillance cameraReal surveillance camera GEMM OpenCLGEMM OpenCL convolution CPUconvolution CPU performance resultsperformance results training / accuracytraining / accuracy bugsbugs JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file / 1st level directory – CK modules / 2nd level dir - CK entries / CK meta info Python modulePython moduleJSON APIJSON API holder for original artifactholder for original artifact CK metaCK meta Collective Knowledge (github.com/ctuning/ck) – $ $ ck pull $ ck add $ ck compile $ ck run Collective Knowledge (github.com/ctuning/ck) – assists you in unifying, executing, sharing and reusing your artifacts: $ sudo pip install ck $ ck pull repo:ck-autotuning $ ck add dataset:my-new-dataset (UID will be automatically generated) $ ck compile program:cbench-automotive-susan $ ck run program:cbench-automotive-susan https://github.com/ctuning/ck/wiki/Shared-modules
  15. 15. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((15 of 24)of 24) We already converted multiple AI frameworks, artifacts and workflows to the CKWe already converted multiple AI frameworks, artifacts and workflows to the CK ICC 17.0 CUDA 8.0CUDA 8.0 GCC 7.0 LLVM 4.0 Databases, local repositoriesDatabases, local repositories Ad-hocinitAd-hocinit scripts Ad-hoc scripts to process CSV, XLS, TXT, etc. Ad-hoc experimental workflows ProgramProgramCKprogram CKpipeline CK compiler CK AI framework CK math library CK experiment Caffe OpenCL Caffe CUDACaffe CUDA TensorFlowTensorFlow CPU/CUDA MAGMA cuBLAS OpenBLASOpenBLAS ViennaCL CLBlast Stat. analysis, predictive analytics, visualization • github.com/dividiti/ck-caffe • github.com/ctuning/ck-caffe2 • github.com/ctuning/ck-tensorflow $ ck pull repo –url= github.com/dividiti/ck-caffe $ ck compile program:caffe-classification $ ck run program:caffe-classification https://github.com/ctuning/ck/wiki/Shared-repos
  16. 16. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((16 of 24)of 24) We've already converted multiple AI frameworks, artifacts and workflows to the CKWe've already converted multiple AI frameworks, artifacts and workflows to the CK ICC 17.0 CUDA 8.0CUDA 8.0 GCC 7.0 LLVM 4.0 Databases, local repositoriesDatabases, local repositories Ad-hocinitAd-hocinit scripts Ad-hoc scripts to process CSV, XLS, TXT, etc. UnifiedAPI(input)UnifiedAPI(input) Read program Read program meta Detect all softwareDetect all software dependencies; ask user If multiple versions exists Prepare environment CompileCompile program Run program UnifiedAPI(output)UnifiedAPI(output) Ad-hoc experimental workflows ProgramProgramCKprogram CKpipeline CK compiler CK AI framework CK math library CK experiment JSONJSON CK program module can automatically adapt to underlying environment via dependencies Source files and auxiliary scriptsSource files and auxiliary scripts CK program entry (native directory)CK program entry (native directory) .cm/meta.json – describes soft dependencies , data sets, and how to compile and run this program .cm/meta.json – describes soft dependencies , data sets, and how to compile and run this program CK entries associated with a given module describe a given object using meta.json while storing all necessary files and sub-directories Caffe OpenCL Caffe CUDACaffe CUDA TensorFlowTensorFlow CPU/CUDA MAGMA cuBLAS OpenBLASOpenBLAS ViennaCL CLBlast Stat. analysis, predictive analytics, visualization • github.com/dividiti/ck-caffe • github.com/ctuning/ck-caffe2 • github.com/ctuning/ck-tensorflow $ ck pull repo –url= github.com/dividiti/ck-caffe $ ck compile program:caffe-classification $ ck run program:caffe-classification
  17. 17. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((17 of 24)of 24) Automatically adapting workflow to any underlying software and hardware local / env / 03ca0be16962f471 / env.sh Tags: compiler,cuda,v8.0 local / env / 03ca0be16962f471 / env.sh Tags: compiler,cuda,v8.0 local / env / 0a5ba198d48e3af3 / env.bat Tags: lib,blas,cublas,v8.0 local / env / 0a5ba198d48e3af3 / env.bat Tags: lib,blas,cublas,v8.0 Soft entries in CK describe how to detect if a given software is already installed, how to set up all its environment including all paths (to binaries, libraries, include, aux tools, etc), and how to detect its version $ ck detect soft --tags=compiler,cuda$ ck detect soft --tags=compiler,cuda $ ck detect soft:compiler.gcc$ ck detect soft:compiler.gcc $ ck detect soft:compiler.llvm$ ck detect soft:compiler.llvm $ ck list soft:compiler*$ ck list soft:compiler* $ ck detect soft:lib.cublas$ ck detect soft:lib.cublas Env entries are created in CK local repo for all found software instances together with their meta and an auto-generated environment script env.sh (on Linux) or env.bat (on Windows) Package entries describe how to install a given software if it is not already installed (using CK Python plugin together with install.sh script on Linux host or install.bat on Windows host) $ ck install package:caffemodel-bvlc-googlenet$ ck install package:caffemodel-bvlc-googlenet $ ck install package:imagenet-2012-val$ ck install package:imagenet-2012-val $ ck install package:lib-tensorflow-cuda$ ck install package:lib-tensorflow-cuda $ ck list package:*caffemodel*$ ck list package:*caffemodel* LocalCKrepoLocalCKrepo $ ck search soft --tags=blas$ ck search soft --tags=blas $ ck show env$ ck show env $ ck show env –tags=cublas$ ck show env –tags=cublas $ ck rm env:* –tags=cublas$ ck rm env:* –tags=cublas $ ck search package –tags=caffe$ ck search package –tags=caffe $ ck list package:*tensorflow*$ ck list package:*tensorflow* $ ck install package:lib-caffe-bvlc-master-cuda-universal$ ck install package:lib-caffe-bvlc-master-cuda-universal https://github.com/ctuning/ck/wiki/Portable-workflows Multiple versions of tools may easily co-exist and plugged in to CK workflows!
  18. 18. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((18 of 24)of 24) Applying methodology from natural sciences to optimize computer systems https://github.com/ctuning/ck/wiki/Autotuning CK Python modules (wrappers) with a unified JSON API CKinput(JSON/dict) CKoutput(JSON/dict) Unified input BehaviorBehavior ChoicesChoices FeaturesFeatures StateState ActionAction Unified output BehaviorBehavior ChoicesChoices FeaturesFeatures StateState b = B( c , f , s ) … … … … Formalized function B of a behavior of any CK object Flattened CK JSON vectors (dict converted to vector) to simplify statistical analysis, machine learning and data mining Some actions Tools (compilers, profilers, etc)Tools (compilers, profilers, etc) Generated filesGenerated files Chain CK modules to implement research workflows such as multi-objective autotuning and co-design exploration Choose exploration strategy Perform SW/HW DSEPerform SW/HW DSE (math transforms, skeleton params, compiler flags, transformations …) PerformPerform stat. analysis Detect (Pareto) frontier Model optimizations Model behavior, predict optimizations Reduce complexity SetSet environment for a given tool version CK program module with pipeline function CompileCompile program Run code i i i i First expose coarse grain high-level choices, features, system state and behavior characteristics Crowdsource benchmarking and random exploration across diverse inputs and devices; Keep best species (AI/SW/HW choices); model behavior; predict better optimizations and designs
  19. 19. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((19 of 24)of 24) Prepare first proof-of-concept community experiments Available libraries / skeletonsAvailable libraries / skeletons CompilersCompilers Binary or byte codeBinary or byte code Hardware, simulators Hardware, simulators Run-time environmentRun-time environment Run-time stateRun-time state of the system InputsInputs Various modelsVarious models Algorithm / source codeAlgorithm / source code AI frameworkAI framework Algorithms: object classification, object detection AI frameworks: Caffe CPU, Caffe OpenCL, TensorFlow CPU Math libraries: OpenBLAS, ViennaCL, clBLAS, CLBlast, cuBLAS, cuDNN, Eigen, gemmlowp Compilers: GCC 5+ Models: AlexNet, GoogleNet, VGG, ResNet, SqueezeNet, SqueezeDet, SSD Datasets: KITTI, COCO, VOC, ImageNet Optimization choices: batch size, number of CPU threads Characteristics: total execution time (including OpenCL overheads), top1/top5 model accuracy, static model size (MB), device cost, max power consumption (if available) System state: CPU/GPU frequency, memory cKnowledge.org/repo
  20. 20. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((20 of 24)of 24) Crowdsource benchmarking across Android devices provided by volunteers Continuously collect statistics, bugs and misclassifications at cKnowledge.org/repo The number of distinct participated platforms:800+ The number of distinct CPUs: 260+ The number of distinct GPUs: 110+ The number of distinct OS: 280+ Power range: 1-10W No need for a dedicated and expensive cloud – volunteers help us validate research ideas similar to SETI@HOME Also collecting real images from users for misclassifications to build an open and continuously updated training set)! Winning solutions on various frontiers Timeperimage(seconds) Cost(euros)
  21. 21. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((21 of 24)of 24) Crowdsource benchmarking across Android devices provided by volunteers Continuously collect statistics, bugs and misclassifications at cKnowledge.org/repo Winning solutions on various frontiers Firefly-RK3399 The number of distinct participated platforms:790+ The number of distinct CPUs: 260+ The number of distinct GPUs: 110+ The number of distinct OS: 280+ Power range: 1-10W No need for a dedicated and expensive cloud – volunteers help us validate research ideas similar to SETI@HOME Also collecting real images from users for misclassifications to build an open and continuously updated training set)! Timeperimage(seconds) Cost(euros)
  22. 22. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((22 of 24)of 24) Let's dig further – (crowdsource) BLAS autotuning in Caffe on Firefly-RK3399 Collaboration between Marco Cianfriglia (Roma Tre University), Cedric Nugteren (TomTom), Flavio Vella, Anton Lokhmotov and Grigori Fursin (dividiti) Name Description Ranges KWG 2D tiling at workgroup level {32,64} KWI KWG kernel-loop can be unrolled by a factor KWI {1} MDIMA Local Memory Re-shape {4,8} MDIMC Local Memory Re-shape {8, 16, 32} MWG 2D tiling at workgroup level {32, 64, 128} NDIMB Local Memory Re-shape {8, 16, 32} NDIMC Local Memory Re-shape {8, 16, 32} NWG 2D tiling at workgroup level {16, 32} SA manual caching using the local memory {0, 1} SB manual caching using the local memory {0, 1} STRM Striding within single thread for matrix A and C {0,1} STRN Striding within single thread for matrix B {0,1} VWM Vector width for loading A and C {8,16} VWN Vector width for loading B {0,1} Tunable parameters of OpenCL-based BLAS ( github.com/CNugteren/CLBlast ) For now only two data sets (small & large) Some extra constraints to avoid illegal combinations Use different autotuners under CK to speed up design space exploration based on probabilistic focused search, generic algorithms, deep learning, SVM, KNN, MARS, decision trees …
  23. 23. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((23 of 24)of 24) Let's dig further – autotuning BLAS (CLBlast) in Caffe on Firefly-RK3399 • Caffe with autotuned OpenBLAS (threads and batches) is the fastest • Caffe with autotuned CLBlast is 6..7x faster than default version and competitive with OpenBLAS-based version– now worth making adaptive selection at run-time. Sharing results in a reproducible way with the community for validation and improvement: https://nbviewer.jupyter.org/github/dividiti/ck-caffe-firefly-rk3399/ blob/master/script/batch_size-libs-models/analysis.20170531.ipynb
  24. 24. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((24 of 24)of 24) • Bring together industry and academia to participate in open and reproducible AI/SW/HW co-design competitions using CK framework • Share more artifacts, workflows and results in a reusable and customizable CK format (common JSON API and meta description) • Collaboratively improve models and find missing features • Gradually expose more design and optimization knobs at all AI/SW/HW levels • Enable distributed on-line learning for self-optimizing and self-learning systems http://cKnowledge.org/partners http://cKnowledge.org/publications Join the growing Collective Knowledge community!

×