Contenu connexe Similaire à "SoCs for Computer Vision-enabled IoT Devices," a March 2019 Silicon Valley Meetup Presentation from MediaTek (20) Plus de Edge AI and Vision Alliance (20) "SoCs for Computer Vision-enabled IoT Devices," a March 2019 Silicon Valley Meetup Presentation from MediaTek 2. 2018 Copyright © MediaTek Inc. All rights reserved.
2
NeuroPilot& Platform-awareMLKits
3. CONFIDENTIAL A
NeuroPilotPlatform-awareMLKits
Super-Resolution Depth Estimation Segmentation
MediaTek Platform
Network
Reduction
Network
Architecture Search
Network Deep Fusion (Tiling + Fusion)
BW Req.: 2.0GB/s
HW Util.: 80%
FPS: 100 FPS
Power: < 40mW
MediaTek Platform-aware MLKits
Platform-friendly NN StructureUser-defined NN Structure
Application
Developers
Network
Quantization
4. 2018 Copyright © MediaTek Inc. All rights reserved.
4
NeuroPilot for Developer
• Highly integrated with Android Neural Network
• Support Tensorflow as well as Caffe and ONNX
• Add useful tools/utilities for developer
ANN Runtime
ANN API
ANN HAL
Interpreter
.tflite format
Tensowflow
Model
CPU NN HAL impl. GPU NN HAL impl. VPU NN HAL impl.
Caffe / ONNX
Model
MTK Ext. API
1. Bind Op with HW 2.Profiler 3.Debugger (Log)
TOCO
Model
Convertor
Offline Tool
Quantization
NeuroPilot specified
On Device
CPU GPU VPU
Developers
AIA NN HAL impl.
AIA
5. 2018 Copyright © MediaTek Inc. All rights reserved.
5
MediaTek NeuroPilot Toolkit- utility and debug tool
NN Utility
Debugger
Profiling
NeuroPilot
Toolkit
• Model Convertor
(TensorFlow/Caffe/ONNX)
• Quantization
• Power API
• Performance
• Memory
• System Crash
• Mobilelog
6. 2018 Copyright © MediaTek Inc. All rights reserved.
AIA - AI HW Accelerator
6
7. 2018 Copyright © MediaTek Inc. All rights reserved.
AIA Key Features
▪ Bandwidth reduction techniques
- Tile-base layer fusion
- TCM for data-exchange
- Sparsity compression
▪ Performance Engine
MDLA: 806GMAC/s
@788MHz
▪ Flexible quantization scheme
- Asymmetric or symmetric quant.
- Per-layer or per channel quant.
- No extra performance overhead
▪ Power Efficient
>1 TMACs/W (2x better
than VPU)
@12FFC
▪ Bandwidth-Aware Design
▪ Dual AXI Port for high BW
▪ High Throughput Load/Store
▪ Simultaneous execution
of OPs (CONV/ACT/POOL)
▪ Support INT8/INT16 or FP16