3. Who am I?
Kazuyuki Miyazawa
Work Experience
• April 2019 - Present
AI Research Engineer @DeNA Co., Ltd.
• April 2010 - March 2019
Research Scientist @Mitsubishi Electric Corp.
Education
• PhD in Information Science @Tohoku Univ.
@kzykmyzw
4. Background
•Maps are an essential ingredient for every mobility service
•Higher & higher map quality is in demand to enable advanced services
(e.g., autonomous vehicle)
-1980s 1980s-20XXs 20XXs-
5. Problems for Current Map Creation/Maintenance
•Manual processes are labor-intensive and time-consuming
•Using a special measurement system (e.g., mobile mapping system) is costly and
difficult to scale to achieve high coverage for various types of mobility services
https://www.infradoctor.jp/details/detail20190313.pdf
https://www.google.com/streetview/explore/
6. What Can DeNA Do About It?
•Dashcams are becoming popular, and can capture a lot of useful information for maps
•Current AI shows an amazing performance for image/video analysis
•We are developing low-cost and rapid map creation (or maintenance) technology
using dashcam videos collected via cloud servers
2014 2015 2016 2017 2018
160
120
80
40
0
Dashcam sales volume (Japan)(million units)
GfKジャパン, “2018年ドライブレコーダーの販売動向,” 2019
https://www.gfk.com/fileadmin/user_upload/dyna_content/JP/20190328_drivinngrecorders.pdf
9. How Do We Know the 3D Position from a 2D
Image?
?
?
?
From a single 2D image, we cannot
decide the 3D position of the object
10. How Do We Know the 3D Position from 2D Images?
If we have two (or more) views, we can
decide the 3D object position as the
intersection of camera rays
11. Dashcam Video = Multi-View Images
time: t1
time: t2
time: t3
Dashcam video can be seen as a set of
multi-view images because the vehicle
moves while capturing
12. Dashcam Video = Multi-View Images
time: t1
time: t2
time: t3
Dashcam video can be seen as a set of
multi-view images because the vehicle
moves while capturing
Camera pose for each frame is
necessary to calculate the 3D
object position
13. Camera Pose Estimation from Video
•SfM*1 or Visual SLAM*2 is used as a core technology
•Estimate the camera poses by tracking salient points in the video
*1 Structure from Motion
*2 Simultaneous Localization And Mapping
15. Dataset Creation for Accuracy Evaluation
•Built our own dataset of dashcam videos and corresponding highly accurate 3D data
as ground truth for evaluation purposes
•Manually annotated various objects (e.g., traffic signs, lanes, etc.)
Videos from Dashcams 3D Point Clouds from LiDAR
16. Sample Results
Dashcam Video Estimated Position
Estimated camera positions
Estimated object position
Ground-truth object position
Error: 0.20m
17. Sample Results
Dashcam Video Estimated Position
Estimated camera positions
Estimated object position
Ground-truth object position
Error: 1.2m
18. Results Summary
0 0.5 1.0 1.5 2.0 2.5
Error [m]
Frequency
Average Error: 0.74m
Average error of object position estimation is below 1m!
20. Of Course, Deep Learning!
R-FCN: Object Detection via Region-based Fully ConvolutionalNetworks
https://arxiv.org/pdf/1605.06409v2.pdf
OpenPose: RealtimeMulti-Person 2D Pose Estimation using Part AffinityFields
https://arxiv.org/pdf/1812.08008.pdf
Panoptic Segmentation
https://arxiv.org/pdf/1801.00868.pdf
21. Traffic Light/Sign Detection using CNN
• Use Faster R-CNN to detect traffic lights/signs in each frame of dashcam videos
• Faster R-CNN is one of the most successful object detection methods proposed in 2016
• Main drawback is speed, but acceptable for off-line applications
Classification
Regression
Traffic light
Stop
Speed limit
No right turn
Position
…
CNN
Region Proposals
24. Q. Is It Easy to Achieve This? A. NO!
Data
Preparation
Model
Training
Parameter
Tuning
Model
Verification
Deploy
Monitoring Data Analysis
Model
Development
Need to iterate again and again
25. Q. Is It Easy to Achieve This? A. NO!
Data
Preparation
Model
Training
Parameter
Tuning
Model
Verification
Deploy
Monitoring Data Analysis
Model
Development
Rapid iteration is the key
26. Who am I?
Profile
• Kosuke Kuzuoka (23)
• Love Tesla, Elon Musk and cats
Experience
• February 2020 - Present
Software Engineer, ML @Mercari, Inc.
• June 2018 – February 2020
AI Research Engineer @DeNA Co., Ltd.
• March 2017 – June 2018
R&D Manager @Photoruction, inc.
27. Brief Intro to Object Detection
• An active research area among
computer vision community
• Task is detecting objects
(like cats) in an image
• Modern algorithms heavily
rely on deep learning
• Takes hours to train a model
Photo by Paul Hanaoka on Unsplash
28. Photo by Paul Hanaoka on Unsplash
A cat is detected as a cat,
hence it’s a true positive.
Wrongly detected as cats,
hence they are false positives
29. Problems in Development Processes
1. Train, validate and test models (computationally expensive)
2. Evaluate, visualize and analyze models (time consuming)
3. Adjust hyper-param, then go back to 1
30. Problems in Development Processes
1. Train, validate and test models (computationally expensive)
2. Evaluate, visualize and analyze models (time consuming)
3. Adjust hyper-param, then go back to 1
31. Problems in Development Processes
1. Train, validate and test models (computationally expensive)
2. Evaluate, visualize and analyze models (time consuming)
3. Adjust hyper-param, then go back to 1
32. Problems in Development Processes
1. Train, validate and test models (computationally expensive)
2. Evaluate, visualize and analyze models (time consuming)
3. Adjust hyper-param, then go back to 1
33. Problems in Development Processes
1. Train, validate and test models (computationally expensive)
2. Evaluate, visualize and analyze models (time consuming)
3. Adjust hyper-param, then go back to 1
Not essential, yet
very important...
34. Some of Problems are:
• Error-prone process (misspelling commands, etc.)
• Going back and forth between EC2 instances…
• Inefficient process, like drawing boxes, uploading
to third party app for visualization etc.
• Researchers not being able to focus on essential
work (developing models etc.)
35. Solutions!
• Work harder and harder...
• Automating tasks via workflow engine
• Flexible internal tool to evaluate,
visualize and analyze models
36. Solutions!
• Work harder and harder...
• Automating tasks via workflow engine
• Flexible internal tool to evaluate,
visualize and analyze models But I’m busy
with AI dev...
37. What We Wanted...
• A system that automatically evaluates,
visualizes and analyzes models and datasets.
• A tool that lets researchers focus on
essential work (parameter tuning etc.)
• User-friendly web app
38. • Easy to develop
• Easy to collaborate
• Good performance
• AI engineer friendly
(Python… )
Yet, We Want It to Be:
40. • Easy to deploy and maintain
• Collaborations made easy
• Cost effective, yet performant
• You can use Python
Image source: https://serverless.com/
41. Serverless Computing
• No need to manage servers,
cloud providers do it for you!
• Consists of small deployable
unit of functions
• Scales as your app grows
• No idle fee, pay as you go
42. • No need to manage servers,
cloud providers do it for you!
• Consists of small deployable
unit of functions
• Scales as your app grows
• No idle fee, pay as you go
Serverless Computing
Image source: https://aws.amazon.com/
43. • No need to manage servers,
cloud providers do it for you!
• Consists of small deployable
unit of functions
• Scales as your app grows
• No idle fee, pay as you go
Serverless Computing
44. • No need to manage servers,
cloud providers do it for you!
• Consists of small deployable
unit of functions
• Scales as your app grows
• No idle fee, pay as you go
Serverless Computing
45. Serverless Computing
• No need to manage servers,
cloud providers do it for you!
• Consists of small deployable
unit of functions
• Scales as your app grows
• No idle fee, pay as you go
58. More Functionalities On Its Way...
• Model version control
• Dataset analysis and version control
• Automating training and testing
59. Summing It Up
• Speed is important. You don’t want to
spend too much time on an internal tool
• Collaboration should be easy. Every
engineer should be able to contribute
• With little effort, researchers can focus
on more essential work
60. Wrap Up
AI Technologies for Map Creation/Maintenance
• Dashcam videos contain a lot of useful information for maps
• Develop computer vision technology to estimate objects’ positions
• Experimental evaluation shows the estimation error is less than 1m
Engineering for Continuous Improvement
• Rapid development cycle is important
• Serverless architecture is a cost-effective choice to develop and maintain
support tools for continuous improvement of AI