cvsaisentan20141004 kanezaki

1
MIL
RGBD画像処理と
三次元物体認識
Machine Intelligence Lab.
2014/10/04
第25回コンピュータビジョン勉強会＠関東
原田研究室助教金崎朝子

2
MIL
本日の発表
• 3D系～RGBD系の研究を
• 古いのから最近のまで
• 浅く広くラーニングする
1. 2000年前後の3D特徴量
2. Kinectブーム時代
3. それ以降のRGBD・3D系研究
4. 本当に3Dは必要かな？
5. RGBDデータセット
cf.) ディープラーニング
図はGoogLeNet

3
MIL
SHREC
• Shape Retrieval Contest http://www.aimatshape.net/event/SHREC/
• 3D物体認識の研究分野を支えてきたコンペティション（2006~）
• SHREC ’14の内訳
1. Automatic Location of Landmarks used in Manual Anthropometry
2. Shape Retrieval of Non-Rigid 3D Human Models
3. Retrieval and Classification on Textured 3D Models
4. Extended Large Scale Sketch-Based 3D Shape Retrieval
5. Large Scale Comprehensive 3D Shape Retrieval
of Landmarks used in Manual Anthropometry
http://www.andreagiachetti.it/shrec14/

4
MIL
SHREC
2. Shape Retrieval of of Non-Non-Rigid Rigid 3D 3D Human Human Models
Models
非剛体物体
ポーズが変わっても
同じ物体であることを
どう認識するか？
http://www.cs.cf.ac.uk/shaperetrieval/shrec14/index.html

5
MIL
SHREC
3. Retrieval and and Classification Classification on on Textured Textured 3D 3D Models
Models
http://saturno.ge.imati.cnr.it/ima/smg/Shrec2013/texture.pdf

6
MIL
SHREC
Scale Sketch-Based 3D Shape Retrieval スケッチを
入力して
3D物体検索
http://www.itl.nist.gov/iad/vug/sharp/contest/2014/SBR/

7
MIL
SHREC
5.. LLaarrggee SSccaallee CCoommpprreehheennssiivvee 3 3DD S Shhaappee R Reettrrieievvaal l
8,987 models
categorized into 171 classes
（画像に比べてだいぶまだ
小規模である）
http://www.itl.nist.gov/iad/vug/sharp/contest/2014/SBR/

8
MIL
3D物体認識の手法– 古典なもの
• ヒストグラムベース
• 関数ベース
• 2Dベース
3D Shape Histograms
Ankerst, M., Kastenmüller, G., Kriegel, H. P., & Seidl, T. (1999, January). 3D shape histograms
for similarity search and classification in spatial databases. In Advances in Spatial
Databases (pp. 207-226), 1999.

9
MIL
• 関数ベース
• 2Dベース


l
m
l f   c Y  
 
m
l
( , ) ( , )

m 
l

0
l
Spherical Harmonic Representations
Kazhdan, M., Funkhouser, T., & Rusinkiewicz, S. Rotation
invariant spherical harmonic representation of 3D shape
descriptors. In Proceedings of the Eurographics/ACM SIGGRAPH
symposium on Geometry processing (pp. 156-164), 2003.

10
MIL
ぐるぐるいっぱい回転させて比較する
• 関数ベース
• 2Dベース
Light Field Descriptor
Ding-Yun Chen, Xiao-Pei Tian, Yu-Te Shen and Ming Ouhyoung, "On Visual Similarity
Based 3D Model Retrieval", Computer Graphics Forum (EUROGRAPHICS'03), Vol. 22,
No. 3, pp. 223-232, Sept. 2003.

11
MIL
実世界向け3D物体認識の手法– 古典なもの
Spin Image 超有名特徴量
Johnson, Andrew E., and Martial Hebert. "Using spin
images for efficient object recognition in cluttered 3D
scenes." Pattern Analysis and Machine Intelligence,
IEEE Transactions on 21.5 (1999): 433-449.
局所記述子・回転不変

12
MIL
2009年当時の私の研究＠國吉・原田研
• こんなのやってました。
• Color-CHLAC特徴
カラーボクセルデータを扱い、各ボクセルのRGB値の相関を一定領域内で積分
=
Object
x72
72
13
11
21
18
x13
x11
x21
x18

13
MESA SR-4000 TOF sensor
PointGray Flea2 camera
176×144 pixel、100万円くらい
もっと良いセンサはないか…?
光切断法
・精度は良いが
・部屋暗くする
・時間かかる
MIL
2009年当時の私の研究＠國吉・原田研

14
MIL
2010年11月Kinect登場

15
MIL
Kinectショック① ～つくってみた～
• 世界各地でボーントラッキング試したとか
アプリ作ったとかで動画がうpされる
• 特にロボット研究界隈で盛り上がりを見せる
– ROS 3D Contest とか
http://wiki.ros.org/openni/Contests/ROS%203D
1st place $3000
“Customizable Buttons”
http://wiki.ros.org/openni/Contests/ROS%203D/Customizable%20Buttons
Most Useful 1st place $2000
“RGBD-6D-SLAM”
http://wiki.ros.org/openni/Contests/ROS%203D/RGBD-6D-SLAM
• KinectFusion [Izadi et al., 2011]
– Microsoft発、SLAMのすごいやつ。ICP+GPU。すぐ使える？
世界中が
お祭り騒ぎに。

16
• NIST and Willow Garage: Solutions in Perception Challenge
MIL
Kinectショック➁ ～ICRA 2011～
up to $10,000 dollars will be awarded
exponentially starting at $3.50 for a first prize
winner who achieves 80% recognition and
increasing exponentially from there.
その後続かなかった…
• Best Vision Paper: Sparse Distance
Learning for Object Recognition
Combining RGB and Depth Information
Kevin Lai, Liefeng Bo, Xiaofeng Ren, and Dieter Fox
Intelとの共同研究で、RGB-D Object Dataset公開
初のKinect論文
• Fast Object Detection for Robots in a Cluttered Indoor Environment Using Integral
3D Feature Table. Asako Kanezaki, Takahiro Suzuki, Tatsuya Harada, and Yasuo Kuniyoshi.

17
MIL
Kinectショック③ ～RGBDが流行語に？～
• タイトルにRGB-Dが入っている論文の数（※金崎調べ/アバウト）
2011 2012 2013 2014
CVPR 0 1 7 7
ICCV 0 - 10 -
ICRA 3 8 7 8
ビジョン系
ビジョン系
ロボット系
キーワードにRGBD Perceptionが
入っている論文が43件
• 純粋なビジョン系でも受け入れられるようになったし、
ロボットビジョンではもうRGBDがデフォと言ってもいいくらい

18
単眼カメラ
Visual SLAMの
プロ
PTAM、DTAM
MIL
RGBD：SLAM
KinectFusion
要チェック研究機関
＋
Andrew Davison
Faculty of
Engineering, Department of
Computing
Professor of Robot Vision
TUM RGB-D
F. Steinbruecker J. Sturm D. Cremers
↑さっきのRGBD-6D-SLAMをやってた人

19
MIL
RGBD：三次元物体（環境）認識
Assistant Professor
Silvio Savarese
Computer Science
Department
Kevin Lai
RGBDデータセット
作ったり、いろいろ。
Assistant Professor
Jianxiong Xiao
Radu B. Rusu
CEO and Co-Founder at Fyusion, Inc
President and CEO - Open Perception, Inc.
要チェック研究機関
Professor
Dieter Fox
Spin Image、spectral graph matching
等、3Dやる上で重要な論文を
たくさん書いている
セマンティックマッピング等

20
MIL
物体認識の対象とするデータ
色なし
色あり
2次元2.5次元3次元
線画距離データ
形状データ
RGB画像RGBD画像テクスチャ付形状データ
http://nicolas.burrus.name/uploads/Research/viewer_output_view3d_triangles.png
Vision
Robot
Graphics
D次元が
増えたPartial
data
色が
ついた
http://shape.cs.princeton.edu/
benchmark/

21
D次元が
増えた
MIL
RGB
画像
マルチモーダルフュージョン的なアプローチ
• Bar-Hillel, Aharon, Dmitri Hanukaev, and Dan Levi.
"Fusing visual and range imaging for object class recognition."
IEEE ICCV, 2011.
• We have presented a system fusing depth and visual information which
reduces classification error by more than 60% compared to using the visual
image alone, and close to 20% compared to depth alone.

22
D次元が
増えた
MIL
RGB
画像
LINE-MOD
Multimodal templates for real-time detection of texture-less
objects in heavily cluttered scenes
Stefan Hinterstoisser, Stefan Holzer, Cedric Cagniart, Slobodan Ilic,
Kurt Konolige, Nassir Navab, Vincent Lepetit. IEEE ICCV, 2011.
Ｓｔｅｆａｎの考えた最強のテンプレートマッチング
輝度勾配in 色画像法線in 距離画像テンプレート
テクスチャレスな物体を表現するのに
表面形状の情報（法線）で補おう
can parse a VGA
image with over
3000 templates
with about 10 fps
on the CPU
テンプレートマッチングのプロ
動画

23
D次元が
増えた
MIL
RGB
画像
Learning 6D Object Pose Estimation using 3D Object Coordinates
Eric Brachmann, Alexander Krull, Frank Michel, Stefan Gumhold, Jamie Shotton,
and Carsten Rother.
←LINE-MOD
ECCV, 2014.
TUD
あっ・・・
• Decision forest
• Jointly predicts both 3D object coordinates and object instance probabilities
• RANSAC based optimization scheme

24
距離
画像
MIL
（カラー）点群特徴量
• SHOT記述子[Federico Tombari et al., ECCV2010]
– 今の世代のSpin Image（と私は思っている）
– デファクトスタンダードな3D点群記述子
– PCLにも入っているよ
• CSHOT記述子[Federico Tombari et al., ICIP2011]
– SHOTのカラー版
色が
ついた
Federico氏：
他にも3Dキーポイント
徹底比較論文とか
いろいろやられている
各点まわりの局所領域を8 x 2 x 2 に分割
各領域の法線ベクトル풏푣と
푖
点の法線ベクトル풏푢の内積푐표푠휃푖 = 풏푣푖 ∙ 풏푢
のヒストグラム
F. Tombari, S. Salti, L. Di Stefano.
"Unique signatures of Histograms
for local surface description",
ECCV 2010.
F. Tombari, S. Salti, L. Di Stefano.
"A combined texture-shape
descriptor for enhanced 3D
feature matching", ICIP 2010.

25
MIL
←ＨＫＳも
Partialな3Dモデル特徴量
The Partial View Heat Kernel Descriptor for 3D Object
Representation [Brandao et al., ICRA2014]
– Heat Kernel Signature (HKS) 記述子[Bronstein and Kokkinos, CVPR2010]
を、Partial Viewなデータの記述向けに拡張した。
Partial
data
３D
モデル
＋テクスチャも考慮
cf.) Heat Kernel Signature (HKS)
全周モデル向け。（non-rigidにもrigidにも使える）
各点の記述子は、物体全体の表面形状から計算される。
⇒ 視点が変わると見えてる部分が変わるので、HKSも変わる
푘 푣푗 , 푣푠, 푡 =
푁
푖=1
푒−휆푖푡휙푖,푗휙푖,푠
物体全体表面のLaplace-Beltrami作用素の
固有値と固有ベクトル←partial viewになると変化。
超有名特徴量
双子のBronstein先生s

26
IEEE CVPR, 2011
– jointly estimating 3D objects, 3D points and camera poses from multiple images
• S. Yingze Bao, M. Bagra, Y. Chao, and S. Savarese, Semantic Structure from
MIL
3Dセマンティックマッピング(1/3)
Semantic Structure From Motion (SSFM)
• S. Yingze Bao and S. Savarese, Semantic Structure from Motion,
Motion with Points, Regions, and Objects, IEEE CVPR, 2012
3D 2D

27
MIL
Dense 3D Semantic Mapping of Indoor Scenes from RGB-D Images
[Hermans et al., ICRA 2014] Best Vision Paper! (ICRA 2014)
• 2D-3D label transfer
• 3D refinementは毎フレーム
やる必要ない⇒ 4Ｈｚ出た
Ren et al. [7] obtain better results for all
classes, but their complex
approach takes over a minute per image.
[7] X. Ren, L. Bo, and D. Fox, “RGB-(D)
scene labeling: Features and
algorithms,” in CVPR, 2012.
動画

28
MIL
ECCV 2014 オーラルセッション”Context and 3D Scenes”
４本中の２本
Jianxiong Xiao
Sliding Shapes for 3D Object Detection in Depth Images
[Song et al., ECCV2014]
PanoContext: A Whole-room 3D Context Model for Panoramic
Scene Understanding [Zhang et al., ECCV2014]

29
Large-Scale Multi-Resolution Surface Reconstruction from RGB-D Sequences
(F. Steinbruecker, C. Kerl, J. Sturm, D. Cremers), In IEEE ICCV, 2013.
動画
MIL
RGBDリアルタイム処理の時代
• RGBD SLAMの三年後を見てみましょう
computed from more than 24.000 RGB-D images.
The reconstruction run at more than 200 Hz on a
GTX680. The finest resolution was 5mm and the
entire scene fit into approximately 2.5 GB of GPU
RAM, including color.
Volumetric 3D Mapping in Real-Time on a CPU
(F. Steinbruecker, J. Sturm, D. Cremers), In IEEE ICRA, 2014.
This work is inspired by our recent finding [16] that data fusion in an octree runs extremely
fast on a GPU (>200 Hz), so that real-time processing on CPU comes back into reach.
GPUで200Hzは、やり過ぎました。CPUで実装します。
Our method fuses incoming RGB-D images in real-time
at 45 Hz and outputs up-to-date triangle meshes at
approximately 1 Hz at 5 mm resolution at the finest level.
TUMの例のグループ。
毎年RGBD SLAMの
すごい論文を出している

30
Geometric Generative Gaze Estimation (G3E) for Remote RGB-D
Cameras. Kenneth Funes Mora and Jean-Marc Odobez, In IEEE CVPR, 2014
• ICPを使って顔の3D姿勢推定＞正面顔をレンダリング
• 眼画像を切り出して視線方向推定
• 視線方向をワールド座標系に戻す
MIL
Gaze EstimationにまでRGBDが使われる時代
彼らのCVPR Workshop on Gesture Recognition, 2012の動画

31
以上、RGBD系の研究を浅く広く紹介しました。
RGBDのディープラーニングもあるよ！
Convolutional-Recursive Deep Learning for 3D Object Classification
R. Socher, B. Huval, B. Bhat, C. D. Manning, A. Y. Ng. In NIPS, 2012.
MIL
3D ShapeNets for 2.5D Object Recognition and
Next-Best-View Prediction.
Z. Wu, S. Song, A. Khosla, X. Tang, J. Xiao
In arXiv, 2014.

32
MIL
残りの発表
• 3D系～RGBD系の研究を
• 古いのから最近のまで
• 浅く広くラーニングする
1. 2000年前後の3D特徴量
2. Kinectブーム時代
3. それ以降のRGBD・3D系研究
4. 本当に3Dは必要かな？
5. RGBDデータセット
cf.) ディープラーニング
図はGoogLeNet

33
MIL
学習時に3Dがあれば
本当に3Dは必要かな？(1/4)
Single Image 3D Object Detection and Pose Estimation for Grasping
[Zhu et al., ICRA2014]
• 普通に一枚の画像があれば3D姿勢推定までできてロボットが物体把持できる
普通の画像→ シルエットからDPMで物体検出→ superpixels segmentation → 姿勢推定

34
MIL
使用時に3Dがあれば
Size Matters: Metric Visual Search Constraints from Monocular Metadata.
Fritz, Mario, Kate Saenko, and Trevor Darrell. NIPS, 2010.
– “2.1D” local feature
– combines traditional appearance gradient statistics with an estimate of
average absolute depth within the local window
– EXIFタグから物体の大きさが分かるよ
カメラの内部パラメタの他、しばしばFocus distanceまで分かる
→ 物体の深度が推定できる！
学習時：Ｗｅｂ画像だけ
使用時：一応距離センサ使ってる

35
使用時に連続画像があれば
MIL
3D Reconstruction from Accidental Motion
Fisher Yu and David Gallup. CVPR, 2014.
撮影者の偶発的な動き
から3D復元ができる。
動画

36
学習時に大量画像があれば
Reconstructing PASCAL VOC
Sara Vicente, Joao Carreira, Lourdes Agapito and Jorge Batista. CVPR, 2014.
MIL
• PASCAL VOC データセットを使って入力画像の物体を3D復元する。
1. Viewpoint Estimation (Rigid Structure from Motion)
2. 3D Reconstruction (Visual Hull Sampling)
3. Reconstruction Ranking
コードもありますhttp://www2.isr.uc.pt/~joaoluis/carvi/index.html
PASCAL VOC データセット
動画

37
MIL
RGBDデータセット
• Kevin’s RGB-D Object Dataset
http://rgbd-dataset.cs.washington.edu/
• B3DO: Berkeley 3-D Object Dataset
http://kinectdata.com/
• NYU Depth Dataset V2
http://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html
• Hinterstoisser’s ACCV database
http://campar.in.tum.de/Main/StefanHinterstoisser
• RGBD Salient Object Detection Dataset
https://sites.google.com/site/rgbdsaliency/home
• SUN3D database
http://sun3d.cs.princeton.edu/
• Kanezaki’s color & depth dataset
（100個）http://www.mi.t.u-tokyo.ac.jp/kanezaki/color_depth_dataset_100.html
（12個）http://www.isi.imi.i.u-tokyo.ac.jp/software/color_depth_dataset_with_labels.zip

cvsaisentan20141004 kanezaki

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à cvsaisentan20141004 kanezaki

Similaire à cvsaisentan20141004 kanezaki (20)

Dernier

Dernier (12)

cvsaisentan20141004 kanezaki