SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
A Report
On
3D OBJECT RECOGNITION
USING POINT CLOUD LIBRARY
(PCL)
prepared by:
Rishikesh Bagwe (2012A8PS401G)
Mentor:
Imran Syed (Sc ‘C’)
Centre for Artificial Intelligence and Robotics, Bangalore
A Practice School – II station of
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI
(May, 2016)
i
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI
(RAJASTHAN)
Practice School Division
Station: Centre for Artificial Intelligence and Robotics, Bangalore
Duration: 5 months and 7 days Date of Start: 12th
Jan 2016
Date of Submission: 31st
May 2016
Title of the Project: 3D Object Recognition using point cloud library (pcl)
Students:
Name ID numbers Discipline
Rishikesh Bagwe 2012A8PS401G Electronics and Instrumentation
Station Experts:
Name Designation
Imran Syed Scientist ‘C’
PS Faculty: K. Pradheep Kumar
Siju C. R.
Keywords:
Project Area: Artificial Intelligence, Object Recognition
Abstract:
This report gives detailed steps required for 3D object recognition. It states the literature and
concepts used in the process. The report also has the global and local pipeline execution results.
A comparative study of different algorithms used in the pipelines is listed here and the
combination of algorithms to be used is concluded.
ii
Preface
The following project report is based on 3D object recognition. It gives information on why
3D object recognition is better than 2D one, what are the steps involved in the process of object
recognition. It also gives an insight into the mathematics used for various algorithms in
keypoint description. Some of the commonly used algorithms for surface description are
explained in the report. The training and testing parts are the body of object recognition. The
training is done by using 400 points clouds (3D images) for each class (object). The
combination of various algorithms is tried and the fastest and accurate combination is selected.
Microsoft Kinect Xbox360 is used to gather 3D data for testing purpose.
iii
Acknowledgement
I would like to express my special thanks of gratitude to my station mentor Mr. Imran Syed for
giving me an opportunity to work on 3D object recognition project and continuously guiding
me through the obstacles I faced. I would also like to thank the organisation Centre for Artificial
Intelligence and Robotics (CAIR) for allowing me to use their equipment and expertise for my
project. Secondly I like thank my BITS PS faculty Dr. K. Pradheep Kumar and Dr. Siju C.R.
for easing the process of entering into the CAIR and for guiding me through the rules and
regulation of the Practice School Division. Lastly I want to thank the Birla Institute of
Technology and Science (BITS), Pilani for providing me the opportunity to work in a reputed
research organisation, CAIR.
iv
Table of Contents
Abstract Sheet .............................................................................................................................i
Preface........................................................................................................................................ii
Acknowledgement ................................................................................................................... iii
Table of Contents......................................................................................................................iv
1 Introduction ........................................................................................................................1
2 Terminology and Concepts.................................................................................................4
3 The process classification and flow....................................................................................6
3.1 The Global Pipeline.....................................................................................................7
3.2 The Local Pipeline ......................................................................................................9
4 Testing and Training.........................................................................................................11
5 Experiment Results...........................................................................................................12
5.1 Experiment 1 – RGB-D object dataset from the internet..........................................12
5.2 Experiment 2 – Creating and testing on own dataset................................................16
6 Conclusion........................................................................................................................17
References................................................................................................................................18
1
1 Introduction
The objective of the 3D object recognition is to identify objects correctly in the point cloud and
determines their poses (i.e., location and orientation). For many years, the most common
sensors for computer vision were 2D cameras that retrieved a RGB image of the scene (like all
the digital cameras that are so common nowadays, in our laptops or smartphones). Algorithms
exist that are able to find an object in a picture, even if it is rotated or scaled. Then came 3D
sensor which gave us the depth of each point in the scene and because of this new measurement
we can now detect the object and its pose even from a different camera view than it was trained
with. But the addition of a new dimension makes calculations expensive. Working with the
data they retrieve is a lot different that working with a 2D image, and texture information is
rarely used.
There are different 3D sensors categorized into 3 classes viz.
- Stereo Cameras: They are the only passive measurement device of the list. They are
essentially two identical cameras assembled together (some centimeters apart), that
capture slightly different scenes. By computing the differences between both
scenes, it is possible to infer depth information about the points in each image. A
stereo pair is cheap, but perhaps the least accurate sensor. Ideally, it would require
perfect calibration of both cameras, which is unfeasible in practice. Bad light
conditions will render it useless.
- Time-of-flight (Tof): These sensors work by measuring the time it has taken a ray
or pulse of light to travel a certain distance. Because the speed of light is a known
constant, a simple formula can be used to obtain the range to the object. These
sensors are not affected by light conditions and have the potential to be very precise.
A LIDAR (light+radar) is just a common laser range finder mounted on a platform
that is able to rotate very fast, scanning the scene point by point.
- Structured Light: sensors (like Kinect and Xtion) work by projecting a pattern of
infrared light (for example, a grid of lines, or a "constellation" of points) on top of
the scene's objects. This pattern is seen distorted when looked from a different from
the projector's perspective. By analysing this distortion, information about the depth
can be retrieved, and the surface(s) reconstructed.
2
In this project I will be using Microsoft Kinect for taking data. So it is important to go into the
details of 3D image formation from Kinect sensor. The basis of Microsoft Kinect is the
PrimeSense Technology. Kinect has PS1080 system-on-chip which handles its 3D image
formation. The Kinect has a RGB Camera and an IR projector and sensor. It can be seen in the
picture given below.
It uses the IR sensor & projector
pair to measure the depth of a point
in the scene. The theory of
operation is simple, but its
execution can be complex which is
done by PrimeSense’s PS1080
SoC. The IR projector projects a pattern of IR dots and detects them using a conventional
CMOS image sensor with an IR filter. The pattern will change based upon objects that reflect
the light. The dots will change size and position based on how far the objects are from the
source. For example:
The PS1080 SoC has both the projected pattern and the sensed pattern. First it maps the points
in the projected pattern to the ones in sensed pattern. Then it measures the distance by which
the point has moved (disparity). This disparity is then used to calculate the depth of that point
as follows.
3
In the diagram beside,
By similarity of ΔO & ΔXOP
𝑥
𝑂𝑃
=
𝑓
𝑍
……..1
And by similarity of ΔO’ & ΔXO’P
𝑥′
𝑂′𝑃
=
𝑓
𝑍
…….2
From 1 and 2,
𝑥+𝑥′
𝑂𝑂′
=
𝑓
𝑍
.
We know OO’ distance and f (focal length). x and x’ are found by the Primesense SoC PS1080
from IR projector and IR sensor respectively. Therefore we can find Z, the depth of the point
X.
The Kinect return the 3D data in the form of a point cloud. A point cloud is a set of points in
three-dimensional space, each with its own XYZ coordinates. Every point corresponds to
exactly one pixel of the captured image. Optionally, the point cloud data can also store RGB
data if the sensor has a RGB camera. The data format in which a point cloud is stored is called
point cloud data (.pcd).
In order to do the processing like object detection, object recognition, 3D modelling on these
point clouds and to handle the complex calculation involving depth measurement a point cloud
library (pcl) was started in early 2010 by Willow Garage and OpenCV. The first version was
fully introduced in 2011, and it has been actively maintained ever since. PCL aims to be an
one-for-all solution for point cloud and 3D processing. It is an open source, multiplatform
library divided in many submodules for different tasks, like visualization, filtering,
segmentation, registration, searching, and feature estimation.
left up – IR projection
left down – RGB image2D
right down – depth map
right up – point cloud
(RGBD)
4
2 Terminology and Concepts
Keypoints:
According to the original publication a keypoint is a point on the object which
1. takes information about borders and the surface structure into account
2. can be reliably detected even if the object is observed from another perspective
3. provides stable areas for normal estimation or the descriptor calculation in general
As you can see in the beside figure the good keypoints
are not exactly on the edge but just around it so that it
is easy to calculate the normals in the neighbourhood.
Also the red bad keypoints does not have
characteristic surface change beneath them. The main
reason to find the keypoints is to reduce the stress on
further process. A point cloud of an object can have as much as 1 lakh points which makes
processing lengthy. But if we calculate the keypoints, it reduces the point number to some
hundreds.
There are different algorithms for detecting keypoints. A small set of detectors specifically
proposed for 3D point clouds and range maps viz. Intrinsic Shape Signatures (ISS), NARF,
etc. Several keypoint detectors are derived from 2D interest point detectors, they are Harris3D,
SIFT3D, SUSAN3D.
The following image shows the ISS3D keypoints calculated on a keyboard point cloud data:
The colored points are the
keypoints. As you can see
not all the keypoints are
good ones according to the
above description of
keypoints. This is mainly
due to the aberrations in
the point cloud data
collected.
5
Descriptors:
It is a ‘n’ dimensional vector calculated for each points local neighbourhood or sometimes it
is computed for the whole cloud. The dimension of the vector depends on the algorithm used
for calculating it. These descriptors are divided into 2 categories global and local descriptors.
Each local descriptors describe the surface beneath the neighbourhood of each keypoint
whereas one global descriptor describes the whole viewed object surface. In order to calculate
the descriptors we first have to calculate normals at each point in the specified neighbourhood.
Then the difference in the angels between the normals is binned into a histogram. For example
Fast Point Feature Histogram (FPFH) has 33 bins. These 33 bins are subdivided into 11 bins
based on the value intervals for each parameter (e.g. parameter will the angle difference
between the normal at desired point and one of its neighbouring point in x plane). So each
interval will have 3 bins. The number of instances for each interval and for each parameter is
calculated and then added to the histogram at the appropriate bin. Apart from FPFH there are
various algorithm for descriptor calculation like in Local category - Signature of Histograms
of Orientations (SHOT), Point Feature Histogram (PFH) and in Global category – Viewpoint
Feature Histogram (VFH), Global Fast Point Feature Histogram (GFPFH). In this report we
have used the VFH descriptor for global pipeline and SHOT descriptor for local pipeline.
Here are the FPFH and VFH descriptors calculated for the keyboard point cloud data shown
above:
VFH descriptor
FPFH descriptors
6
3 The process classification and flow
The basis of 3D object recognition is to find a set of correspondences between two different
point clouds, one of them containing the object we are looking for. The process is classified
into 2 separate pipelines viz. Local pipeline and Global pipeline. The global pipeline is usually
used for object detection in a scene and the local pipeline is to find the object position and
orientation with respect to the camera. Each pipeline has different stages as shown in the figure
below.
The main difference between the 2 pipeline is at the stage of description. Both the pipelines
use different algorithms for finding descriptors. The local pipeline describes the surface
curvatures of an object while the global pipeline considers the object as a whole.
So for the complete object recognition we need to use the global pipeline first and then the
local one. So accordingly there will be different combinations of algorithm used for description
in both the pipeline. The accuracy and speed of the process highly depends on the combination
used. A comparative study needs to be done in order to get the optimum throughput.
Apart from the algorithms used for descriptors, both the pipelines have different stages. As
seen from the figure above, keypoint extraction is only in local pipeline because in global
pipeline, descriptor is found for whole object while in local pipeline in order to reduce the
processing time keyoints are found first and then their descriptors are calculated. Similarly
segmentation step is there in only global pipeline because we want the object from the
separately for finding the objects global descriptor as a whole whereas in local pipeline we
have the object itself.
Usually when these 2 pipelines are integrated, first we carry out the global pipeline i.e. first we
find the object in the given scene and then run the local pipeline to calculate the orientation of
the found object.
7
3.1 The Global Pipeline
As stated earlier the global pipeline is used for finding the class (what object it is?) of the given
object. The number of classes that the system can recognise depends on the training of the
system.
Training Part:
A segmented point cloud database is used for training, so we do not need to perform
segmentation while training. The training part is done by taking 130 images of each object
from different views and then grouping them into few groups based on a threshold distance of
the calculated global descriptors. The global descriptor used in this project is VFH (View-Point
Feature Histogram) which is view dependent that is why it needs images from different view-
points. The descriptors are stored in the database in a KD –tree format. Along with the KD-tree
format (hdf5) file there are 2 more files generated namely the descriptor name file and the
training data path file. The data path file generated is shown below:
8
Testing Part:
First the data captured from the Kinect is segmented to keep only the object in the point
cloud. This segmentation is done on the basis of surface planes like cylindrical, planar,
spherical etc. For there are different objects kept on the table, so we perform planar
segmentation to detect the table surface and cut it out. The remaining disjoint point clouds will
the objects kept on the table.
(photo of segmentation)
Sometimes the segmentation will give back 2-3 points cloud even if only one object is kept on
the table in front of it. In such situations the first point cloud produced by the algorithm is the
intended object the others are maybe due to the faulty registration of the point cloud in the
Kinect.
After the segmentation, VFH descriptors are calculated for the acquired point cloud. The
descriptor are then compared to all the VFH descriptors stored in the KD-tree while training
based on a distance threshold in 308 dimensions (VFH being 308 dimensions). And the closest
5 matching points clouds are output by the program as shown in the figure below
The highlighted part in grey is the object model provided for matching and the one highlighted
in pink is the matching result. As you can see the keyboard is matched correctly to keyboard.
9
3.2 The Local Pipeline
The local pipeline is used to measure the orientation and position of the object in the scene.
The orientation which we get is relative to a point cloud which we give for testing (whose
orientation and position we know).
Training Part:
After matching and determining the class of the object, we find the local descriptors
file from our database for the one with which it is matched. This database is created for all the
files which are used for global pipeline training. For local pipeline training we first find the
keypoints of a point cloud. In this project we use ISS3D keypoint detector for finding the
keypoints. Having found the keypoints we then find the descriptors. We have used SHOT
(Signature of Histogram of Orientations) descriptors. The database of local descriptors for all
point clouds is stored in individual files.
Testing Part:
In this testing part, having identified the class of the given point cloud. We then find
ISS3D keypoints and then the SHOT descriptors. These descriptors are then compared with the
globally matched point cloud’s local descriptors. The comparison is based on the n dimensional
euclidean distance between the descriptors of the two point clouds. The output of this pipeline
is the rotational matrix and a translation vector which gives the orientation of the given point
cloud with respect to the trained point cloud. The figure below shows the result of the local
pipeline.
10
RANSAC (Random Sample Consensus) part:
The correspondences from the local matching, as seen in the figure above, are not all
correct. There are always considerable number of wrong matches which lead to incorrect
rotation matrix and translation vector. In order to remove the incorrect matches, RANSAC is
used. An iterative process which finds out the parameters of a mathematical model given a set
of pre-recorded information having outliers. Its aim is to remove these outliers. So here it finds
the orientation of the correspondences lines and removes the unparalleled lines. The following
figures shows the results when RANSAC was used on matching two keyboards.
11
4 Testing and Training
Training and testing are 2 separate processes. As mentioned above each of the global and local
pipeline has training part and testing part. But while execution the training is done together for
both the pipelines and testing for both is done together. The training is done to build a database
for object recognition. It has to be done for different views of the object. In testing part, there
are additional steps for matching. Training is an offline process i.e. it is done once and then the
system is ready recognition while testing is an online process. The efforts are always made to
reduce the time required for testing. Training always takes more time since it has to process
more number of point clouds.
12
5 Experiment Results
Two experiments are conducted. The first is using the object dataset from the internet to train
the system and in the second the chair dataset is constructed using an office. The matching
accuracy results are stated in each case
5.1 Experiment 1 – RGB-D object dataset from the internet
The standard RGB-D object dataset is used in this project for experimentation. The dataset has
different folders for each class (for example inside Keyboard folder there are keyboard_1,
keyboard_2, etc. folders). Each folder has the object view from a particular elevation. We used
only 10 classes (object) from this dataset for final testing. The following are the classes used:
13
We tried global recognition with 2 different algorithms.
VFH:
1. Training done with folder 1 (i.e. for e.g. keyboard_1) and testing done on different
images from folder 1.
Object Matched/Tested Mostly confused with
Apple 213/219 Orange, Cap
Banana 219/257 Soda_can, Shampoo
Cap 222/227 Apple, Soda_can
Coffee_mug 197/200 Apple
Keyboard 252/252
Kleenex 232/271 Apple, Cap
Orange 251/252 Apple
Plate 249/253 Coffee_mug
Shampoo 208/273 Soda_can, Kleenex
Soda_can 207/227 Apple, Shampoo
2. Training done with folder 1 (i.e. for e.g. keyboard_1) and testing done on different
images from folder 2 (i.e. for e.g. keyboard_2).
Object Matches Mostly confused with
Apple 208/225 Orange, Soda_can
Banana 167/253 Soda_can, Shampoo
Cap 191/238 Coffee_mug, Soda_can
Coffee_mug 171/201 Shampoo, Soda_can
Keyboard 160/211 Shampoo, Kleenex
Kleenex 213/273 Cap, Soda_can
Orange 190/253 Apple
Plate 182/211
Shampoo 199/273 Soda_can, Kleenex
Soda_can 182/211 Apple, Cap
14
OUR_CVFH:
1. Training done with folder 1 (i.e. for e.g. keyboard_1) and testing done on different
images from folder 1.
Object Matches Mostly confused with
Apple 321/330 Orange, Soda_can
Banana 288/300 Kleenex, Shampoo
Cap 396/411 Kleenex, Coffee_mug
Coffee_mug 638/640 Cap
Keyboard 418/424 Banana, Plate
Kleenex 560/603 Cap, Coffee_mug
Orange 406/409 Apple
Plate 547/553 Cap, Kleenex
Shampoo 367/382 Soda_can, Kleenex
Soda_can 309/324 Cap, Apple
2. Training done with folder 1 (i.e. for e.g. keyboard_1) and testing done on different
images from folder 2 (i.e. for e.g. keyboard_2).
Object Matches Mostly confused with
Apple 326/345 Orange, Soda_can
Banana 251/348 Shampoo
Cap 366/446 Coffee_mug, Kleenex
Coffee_mug 669/700 Kleenex, Cap
Keyboard 286/511 Shampoo, Cap
Kleenex 413/483 Cap, Soda_can
Orange 400/415 Apple
Plate 633/636 Cap
Shampoo 242/411 Soda_can, Banana
Soda_can 270/312 Cap, Coffee_mug
15
Based on the accuracy of the 2 methods, VFH global descriptors are used for matching the
object in the dataset. The results are demonstrated by capturing a 3D image of a coffee mug
and a keyboard and matching it with the available dataset.
For Keyboard:
So the system identified the given object correctly as keyboard from among the 9 objects used
for training. The white keyboard is from the dataset set while the black one is the one captured
from the Kinect. There are problems with the sizes of the keyboards. Both are of different sizes
so they can’t be aligned exactly. The following is the rotational matrix and the translation vector
for the matching:
For Coffee Mug:
Again for the coffee mug the system identified it correctly. The light purple is the coffee mug
from the dataset while the red one is captured using the Kinect. As one can see, the alignment
is not proper, so for improving the results one can increase the number of views used for
training or increase the number of points used for finding a single local descriptor. The
alignment issue arises due to inaccurate performance of local descriptors.
16
5.2 Experiment 2 – Creating and testing on own dataset
In this experiment a Chair dataset is created by taking 3D images of an office chair from a
particular elevation. The chair was rotated at the same place and different 3D views were
collected. All these images are first passed through a Pass Through filter to remove the
background objects and the ground and then only the chair remains in the image. VFH global
descriptors are found for these images and then the descriptors are stored in a KDtree along
with the other 10 object dataset.
An image of a different chair is taken, and similarly it is passed through the filter and descriptor
calculated to be used for matching. The following are the results of chair matching:
In the above figures, the brown chair is the one used in the dataset and the pink chair is used
for testing. All the 3 images different view angles of pink chair are given for testing. All the
times, the system has correctly identified the given object as chair. Here again, the accuracy of
alignment is different for different views of the test chair. The following is the rotational matrix
and translation vector for the 1st
alignment figure:
17
6 Conclusion
A comparative study of algorithms for 3D object recognition is performed. The algorithms
considered in this project are Global – VFH, OUR_CVFH, Local – SHOT, FPFH, PFH. After
the execution of the algorithms individually the results showed that VFH and SHOT descriptors
are better along with ISS3D keypoints. Also when both VFH and SHOT were executed together
in a pipeline, it showed good results which evident in this report.
18
References
- 3D Object Recognition Based on Local and Global Features
Using Point Cloud Library. Author - Khaled Alhamzi, Mohammed Elmogy, Sherif
Barakat.
- Tutorial Documentation from the original documentation from their website:
http://pointclouds.org/documentation/tutorials/
- A Large-Scale Hierarchical Multi-View RGB-D Object Dataset
Kevin Lai, Liefeng Bo, Xiaofeng Ren, and Dieter Fox
In IEEE International Conference on Robotics and Automation (ICRA), May
2011.
- Wikipedia
https://en.wikipedia.org/wiki/Structured-light_3D_scanner
https://en.wikipedia.org/wiki/Time-of-flight_camera
- PCL/OpenNI tutorials
http://robotica.unileon.es/mediawiki/index.php/PhD-3D-Object-Tracking

Contenu connexe

Tendances

BachelorThesis 5.3
BachelorThesis 5.3BachelorThesis 5.3
BachelorThesis 5.3
Nguyen Huy
 
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
c.choi
 
3 track kinect@Bicocca - sdk e camere
3   track kinect@Bicocca - sdk e camere3   track kinect@Bicocca - sdk e camere
3 track kinect@Bicocca - sdk e camere
Matteo Valoriani
 
Integration of a Structure from Motion into Virtual and Augmented Reality for...
Integration of a Structure from Motion into Virtual and Augmented Reality for...Integration of a Structure from Motion into Virtual and Augmented Reality for...
Integration of a Structure from Motion into Virtual and Augmented Reality for...
Tomohiro Fukuda
 

Tendances (18)

Image recognition
Image recognitionImage recognition
Image recognition
 
BachelorThesis 5.3
BachelorThesis 5.3BachelorThesis 5.3
BachelorThesis 5.3
 
Availability of Mobile Augmented Reality System for Urban Landscape Simulation
Availability of Mobile Augmented Reality System for Urban Landscape SimulationAvailability of Mobile Augmented Reality System for Urban Landscape Simulation
Availability of Mobile Augmented Reality System for Urban Landscape Simulation
 
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
 
高精地图数据协议标准探究
高精地图数据协议标准探究高精地图数据协议标准探究
高精地图数据协议标准探究
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
3 track kinect@Bicocca - sdk e camere
3   track kinect@Bicocca - sdk e camere3   track kinect@Bicocca - sdk e camere
3 track kinect@Bicocca - sdk e camere
 
Kinect2 hands on
Kinect2 hands onKinect2 hands on
Kinect2 hands on
 
Image recognition
Image recognitionImage recognition
Image recognition
 
Secure System based on Dynamic Features of IRIS Recognition
Secure System based on Dynamic Features of IRIS RecognitionSecure System based on Dynamic Features of IRIS Recognition
Secure System based on Dynamic Features of IRIS Recognition
 
Integrating UAV Development Technology with Augmented Reality Toward Landscap...
Integrating UAV Development Technology with Augmented Reality Toward Landscap...Integrating UAV Development Technology with Augmented Reality Toward Landscap...
Integrating UAV Development Technology with Augmented Reality Toward Landscap...
 
DISTRIBUTED AND SYNCHRONISED VR MEETING USING CLOUD COMPUTING: Availability a...
DISTRIBUTED AND SYNCHRONISED VR MEETING USING CLOUD COMPUTING: Availability a...DISTRIBUTED AND SYNCHRONISED VR MEETING USING CLOUD COMPUTING: Availability a...
DISTRIBUTED AND SYNCHRONISED VR MEETING USING CLOUD COMPUTING: Availability a...
 
GOAR: GIS Oriented Mobile Augmented Reality for Urban Landscape Assessment
GOAR: GIS Oriented Mobile Augmented Reality for Urban Landscape AssessmentGOAR: GIS Oriented Mobile Augmented Reality for Urban Landscape Assessment
GOAR: GIS Oriented Mobile Augmented Reality for Urban Landscape Assessment
 
Integration of a Structure from Motion into Virtual and Augmented Reality for...
Integration of a Structure from Motion into Virtual and Augmented Reality for...Integration of a Structure from Motion into Virtual and Augmented Reality for...
Integration of a Structure from Motion into Virtual and Augmented Reality for...
 
Bb26347353
Bb26347353Bb26347353
Bb26347353
 
Traffic Light Detection and Recognition for Self Driving Cars using Deep Lear...
Traffic Light Detection and Recognition for Self Driving Cars using Deep Lear...Traffic Light Detection and Recognition for Self Driving Cars using Deep Lear...
Traffic Light Detection and Recognition for Self Driving Cars using Deep Lear...
 
Programming with kinect v2
Programming with kinect v2Programming with kinect v2
Programming with kinect v2
 
SOAR: SENSOR ORIENTED MOBILE AUGMENTED REALITY FOR URBAN LANDSCAPE ASSESSMENT
SOAR: SENSOR ORIENTED MOBILE AUGMENTED REALITY FOR URBAN LANDSCAPE ASSESSMENTSOAR: SENSOR ORIENTED MOBILE AUGMENTED REALITY FOR URBAN LANDSCAPE ASSESSMENT
SOAR: SENSOR ORIENTED MOBILE AUGMENTED REALITY FOR URBAN LANDSCAPE ASSESSMENT
 

En vedette (8)

Real time gesture recognition
Real time gesture recognitionReal time gesture recognition
Real time gesture recognition
 
Gesture recognition technology
Gesture recognition technologyGesture recognition technology
Gesture recognition technology
 
Hand gesture recognition
Hand gesture recognitionHand gesture recognition
Hand gesture recognition
 
Hand gesture recognition system(FYP REPORT)
Hand gesture recognition system(FYP REPORT)Hand gesture recognition system(FYP REPORT)
Hand gesture recognition system(FYP REPORT)
 
Hand Gesture Recognition
Hand Gesture RecognitionHand Gesture Recognition
Hand Gesture Recognition
 
Gesture Recognition
Gesture RecognitionGesture Recognition
Gesture Recognition
 
Gesture recognition
Gesture recognitionGesture recognition
Gesture recognition
 
Gesture recognition technology
Gesture recognition technology Gesture recognition technology
Gesture recognition technology
 

Similaire à Final_draft_Practice_School_II_report

Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate...
 Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate... Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate...
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate...
AIRCC Publishing Corporation
 
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...
AIRCC Publishing Corporation
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
mokamojah
 
Interactive Full-Body Motion Capture Using Infrared Sensor Network
Interactive Full-Body Motion Capture Using Infrared Sensor Network  Interactive Full-Body Motion Capture Using Infrared Sensor Network
Interactive Full-Body Motion Capture Using Infrared Sensor Network
ijcga
 
Interactive full body motion capture using infrared sensor network
Interactive full body motion capture using infrared sensor networkInteractive full body motion capture using infrared sensor network
Interactive full body motion capture using infrared sensor network
ijcga
 
From Sense to Print: Towards Automatic 3D Printing from 3D Sensing Devices
From Sense to Print: Towards Automatic 3D Printing from 3D Sensing DevicesFrom Sense to Print: Towards Automatic 3D Printing from 3D Sensing Devices
From Sense to Print: Towards Automatic 3D Printing from 3D Sensing Devices
toukaigi
 
PS1_2014_2012B5A7521P_2012B5A7848P_2012B4A7958H
PS1_2014_2012B5A7521P_2012B5A7848P_2012B4A7958HPS1_2014_2012B5A7521P_2012B5A7848P_2012B4A7958H
PS1_2014_2012B5A7521P_2012B5A7848P_2012B4A7958H
Saurabh Kumar
 

Similaire à Final_draft_Practice_School_II_report (20)

A Wireless Network Infrastructure Architecture for Rural Communities
A Wireless Network Infrastructure Architecture for Rural CommunitiesA Wireless Network Infrastructure Architecture for Rural Communities
A Wireless Network Infrastructure Architecture for Rural Communities
 
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate...
 Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate... Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate...
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate...
 
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
IRJET - Direct Me-Nevigation for Blind People
IRJET -  	  Direct Me-Nevigation for Blind PeopleIRJET -  	  Direct Me-Nevigation for Blind People
IRJET - Direct Me-Nevigation for Blind People
 
Interactive Full-Body Motion Capture Using Infrared Sensor Network
Interactive Full-Body Motion Capture Using Infrared Sensor Network  Interactive Full-Body Motion Capture Using Infrared Sensor Network
Interactive Full-Body Motion Capture Using Infrared Sensor Network
 
Interactive full body motion capture using infrared sensor network
Interactive full body motion capture using infrared sensor networkInteractive full body motion capture using infrared sensor network
Interactive full body motion capture using infrared sensor network
 
IRJET- 3D Object Recognition of Car Image Detection
IRJET-  	  3D Object Recognition of Car Image DetectionIRJET-  	  3D Object Recognition of Car Image Detection
IRJET- 3D Object Recognition of Car Image Detection
 
Virtual Yoga System Using Kinect Sensor
Virtual Yoga System Using Kinect SensorVirtual Yoga System Using Kinect Sensor
Virtual Yoga System Using Kinect Sensor
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
 
REGISTRATION TECHNOLOGIES and THEIR CLASSIFICATION IN AUGMENTED REALITY THE K...
REGISTRATION TECHNOLOGIES and THEIR CLASSIFICATION IN AUGMENTED REALITY THE K...REGISTRATION TECHNOLOGIES and THEIR CLASSIFICATION IN AUGMENTED REALITY THE K...
REGISTRATION TECHNOLOGIES and THEIR CLASSIFICATION IN AUGMENTED REALITY THE K...
 
REGISTRATION TECHNOLOGIES and THEIR CLASSIFICATION IN AUGMENTED REALITY THE K...
REGISTRATION TECHNOLOGIES and THEIR CLASSIFICATION IN AUGMENTED REALITY THE K...REGISTRATION TECHNOLOGIES and THEIR CLASSIFICATION IN AUGMENTED REALITY THE K...
REGISTRATION TECHNOLOGIES and THEIR CLASSIFICATION IN AUGMENTED REALITY THE K...
 
From Sense to Print: Towards Automatic 3D Printing from 3D Sensing Devices
From Sense to Print: Towards Automatic 3D Printing from 3D Sensing DevicesFrom Sense to Print: Towards Automatic 3D Printing from 3D Sensing Devices
From Sense to Print: Towards Automatic 3D Printing from 3D Sensing Devices
 
Computer Vision.pdf
Computer Vision.pdfComputer Vision.pdf
Computer Vision.pdf
 
A Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detectionA Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detection
 
Object Detetcion using SSD-MobileNet
Object Detetcion using SSD-MobileNetObject Detetcion using SSD-MobileNet
Object Detetcion using SSD-MobileNet
 
IRJET- Comparative Study of Different Techniques for Text as Well as Object D...
IRJET- Comparative Study of Different Techniques for Text as Well as Object D...IRJET- Comparative Study of Different Techniques for Text as Well as Object D...
IRJET- Comparative Study of Different Techniques for Text as Well as Object D...
 
Seminar report on image sensor
Seminar report on image sensorSeminar report on image sensor
Seminar report on image sensor
 
final_report
final_reportfinal_report
final_report
 
PS1_2014_2012B5A7521P_2012B5A7848P_2012B4A7958H
PS1_2014_2012B5A7521P_2012B5A7848P_2012B4A7958HPS1_2014_2012B5A7521P_2012B5A7848P_2012B4A7958H
PS1_2014_2012B5A7521P_2012B5A7848P_2012B4A7958H
 

Plus de Rishikesh Bagwe

Stb of Condensate system
Stb of Condensate systemStb of Condensate system
Stb of Condensate system
Rishikesh Bagwe
 
PLC_ProjectReport_BITS_Pilani
PLC_ProjectReport_BITS_PilaniPLC_ProjectReport_BITS_Pilani
PLC_ProjectReport_BITS_Pilani
Rishikesh Bagwe
 

Plus de Rishikesh Bagwe (7)

Sterilization Unit
Sterilization UnitSterilization Unit
Sterilization Unit
 
DC Motor Drive System (Cascade Control Strategy)
DC Motor Drive System (Cascade Control Strategy)DC Motor Drive System (Cascade Control Strategy)
DC Motor Drive System (Cascade Control Strategy)
 
Gesture controlled robotic arm embedded systems project
Gesture controlled robotic arm embedded systems projectGesture controlled robotic arm embedded systems project
Gesture controlled robotic arm embedded systems project
 
QNET Heating Ventilation and Air Conditioning in LABVIEW & Strain Guages
QNET Heating Ventilation and Air Conditioning in LABVIEW & Strain GuagesQNET Heating Ventilation and Air Conditioning in LABVIEW & Strain Guages
QNET Heating Ventilation and Air Conditioning in LABVIEW & Strain Guages
 
Dynamic Matrix Control (DMC) on jacket tank heater - Rishikesh Bagwe
Dynamic Matrix Control (DMC) on jacket tank heater - Rishikesh BagweDynamic Matrix Control (DMC) on jacket tank heater - Rishikesh Bagwe
Dynamic Matrix Control (DMC) on jacket tank heater - Rishikesh Bagwe
 
Stb of Condensate system
Stb of Condensate systemStb of Condensate system
Stb of Condensate system
 
PLC_ProjectReport_BITS_Pilani
PLC_ProjectReport_BITS_PilaniPLC_ProjectReport_BITS_Pilani
PLC_ProjectReport_BITS_Pilani
 

Final_draft_Practice_School_II_report

  • 1. A Report On 3D OBJECT RECOGNITION USING POINT CLOUD LIBRARY (PCL) prepared by: Rishikesh Bagwe (2012A8PS401G) Mentor: Imran Syed (Sc ‘C’) Centre for Artificial Intelligence and Robotics, Bangalore A Practice School – II station of BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI (May, 2016)
  • 2. i BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI (RAJASTHAN) Practice School Division Station: Centre for Artificial Intelligence and Robotics, Bangalore Duration: 5 months and 7 days Date of Start: 12th Jan 2016 Date of Submission: 31st May 2016 Title of the Project: 3D Object Recognition using point cloud library (pcl) Students: Name ID numbers Discipline Rishikesh Bagwe 2012A8PS401G Electronics and Instrumentation Station Experts: Name Designation Imran Syed Scientist ‘C’ PS Faculty: K. Pradheep Kumar Siju C. R. Keywords: Project Area: Artificial Intelligence, Object Recognition Abstract: This report gives detailed steps required for 3D object recognition. It states the literature and concepts used in the process. The report also has the global and local pipeline execution results. A comparative study of different algorithms used in the pipelines is listed here and the combination of algorithms to be used is concluded.
  • 3. ii Preface The following project report is based on 3D object recognition. It gives information on why 3D object recognition is better than 2D one, what are the steps involved in the process of object recognition. It also gives an insight into the mathematics used for various algorithms in keypoint description. Some of the commonly used algorithms for surface description are explained in the report. The training and testing parts are the body of object recognition. The training is done by using 400 points clouds (3D images) for each class (object). The combination of various algorithms is tried and the fastest and accurate combination is selected. Microsoft Kinect Xbox360 is used to gather 3D data for testing purpose.
  • 4. iii Acknowledgement I would like to express my special thanks of gratitude to my station mentor Mr. Imran Syed for giving me an opportunity to work on 3D object recognition project and continuously guiding me through the obstacles I faced. I would also like to thank the organisation Centre for Artificial Intelligence and Robotics (CAIR) for allowing me to use their equipment and expertise for my project. Secondly I like thank my BITS PS faculty Dr. K. Pradheep Kumar and Dr. Siju C.R. for easing the process of entering into the CAIR and for guiding me through the rules and regulation of the Practice School Division. Lastly I want to thank the Birla Institute of Technology and Science (BITS), Pilani for providing me the opportunity to work in a reputed research organisation, CAIR.
  • 5. iv Table of Contents Abstract Sheet .............................................................................................................................i Preface........................................................................................................................................ii Acknowledgement ................................................................................................................... iii Table of Contents......................................................................................................................iv 1 Introduction ........................................................................................................................1 2 Terminology and Concepts.................................................................................................4 3 The process classification and flow....................................................................................6 3.1 The Global Pipeline.....................................................................................................7 3.2 The Local Pipeline ......................................................................................................9 4 Testing and Training.........................................................................................................11 5 Experiment Results...........................................................................................................12 5.1 Experiment 1 – RGB-D object dataset from the internet..........................................12 5.2 Experiment 2 – Creating and testing on own dataset................................................16 6 Conclusion........................................................................................................................17 References................................................................................................................................18
  • 6. 1 1 Introduction The objective of the 3D object recognition is to identify objects correctly in the point cloud and determines their poses (i.e., location and orientation). For many years, the most common sensors for computer vision were 2D cameras that retrieved a RGB image of the scene (like all the digital cameras that are so common nowadays, in our laptops or smartphones). Algorithms exist that are able to find an object in a picture, even if it is rotated or scaled. Then came 3D sensor which gave us the depth of each point in the scene and because of this new measurement we can now detect the object and its pose even from a different camera view than it was trained with. But the addition of a new dimension makes calculations expensive. Working with the data they retrieve is a lot different that working with a 2D image, and texture information is rarely used. There are different 3D sensors categorized into 3 classes viz. - Stereo Cameras: They are the only passive measurement device of the list. They are essentially two identical cameras assembled together (some centimeters apart), that capture slightly different scenes. By computing the differences between both scenes, it is possible to infer depth information about the points in each image. A stereo pair is cheap, but perhaps the least accurate sensor. Ideally, it would require perfect calibration of both cameras, which is unfeasible in practice. Bad light conditions will render it useless. - Time-of-flight (Tof): These sensors work by measuring the time it has taken a ray or pulse of light to travel a certain distance. Because the speed of light is a known constant, a simple formula can be used to obtain the range to the object. These sensors are not affected by light conditions and have the potential to be very precise. A LIDAR (light+radar) is just a common laser range finder mounted on a platform that is able to rotate very fast, scanning the scene point by point. - Structured Light: sensors (like Kinect and Xtion) work by projecting a pattern of infrared light (for example, a grid of lines, or a "constellation" of points) on top of the scene's objects. This pattern is seen distorted when looked from a different from the projector's perspective. By analysing this distortion, information about the depth can be retrieved, and the surface(s) reconstructed.
  • 7. 2 In this project I will be using Microsoft Kinect for taking data. So it is important to go into the details of 3D image formation from Kinect sensor. The basis of Microsoft Kinect is the PrimeSense Technology. Kinect has PS1080 system-on-chip which handles its 3D image formation. The Kinect has a RGB Camera and an IR projector and sensor. It can be seen in the picture given below. It uses the IR sensor & projector pair to measure the depth of a point in the scene. The theory of operation is simple, but its execution can be complex which is done by PrimeSense’s PS1080 SoC. The IR projector projects a pattern of IR dots and detects them using a conventional CMOS image sensor with an IR filter. The pattern will change based upon objects that reflect the light. The dots will change size and position based on how far the objects are from the source. For example: The PS1080 SoC has both the projected pattern and the sensed pattern. First it maps the points in the projected pattern to the ones in sensed pattern. Then it measures the distance by which the point has moved (disparity). This disparity is then used to calculate the depth of that point as follows.
  • 8. 3 In the diagram beside, By similarity of ΔO & ΔXOP 𝑥 𝑂𝑃 = 𝑓 𝑍 ……..1 And by similarity of ΔO’ & ΔXO’P 𝑥′ 𝑂′𝑃 = 𝑓 𝑍 …….2 From 1 and 2, 𝑥+𝑥′ 𝑂𝑂′ = 𝑓 𝑍 . We know OO’ distance and f (focal length). x and x’ are found by the Primesense SoC PS1080 from IR projector and IR sensor respectively. Therefore we can find Z, the depth of the point X. The Kinect return the 3D data in the form of a point cloud. A point cloud is a set of points in three-dimensional space, each with its own XYZ coordinates. Every point corresponds to exactly one pixel of the captured image. Optionally, the point cloud data can also store RGB data if the sensor has a RGB camera. The data format in which a point cloud is stored is called point cloud data (.pcd). In order to do the processing like object detection, object recognition, 3D modelling on these point clouds and to handle the complex calculation involving depth measurement a point cloud library (pcl) was started in early 2010 by Willow Garage and OpenCV. The first version was fully introduced in 2011, and it has been actively maintained ever since. PCL aims to be an one-for-all solution for point cloud and 3D processing. It is an open source, multiplatform library divided in many submodules for different tasks, like visualization, filtering, segmentation, registration, searching, and feature estimation. left up – IR projection left down – RGB image2D right down – depth map right up – point cloud (RGBD)
  • 9. 4 2 Terminology and Concepts Keypoints: According to the original publication a keypoint is a point on the object which 1. takes information about borders and the surface structure into account 2. can be reliably detected even if the object is observed from another perspective 3. provides stable areas for normal estimation or the descriptor calculation in general As you can see in the beside figure the good keypoints are not exactly on the edge but just around it so that it is easy to calculate the normals in the neighbourhood. Also the red bad keypoints does not have characteristic surface change beneath them. The main reason to find the keypoints is to reduce the stress on further process. A point cloud of an object can have as much as 1 lakh points which makes processing lengthy. But if we calculate the keypoints, it reduces the point number to some hundreds. There are different algorithms for detecting keypoints. A small set of detectors specifically proposed for 3D point clouds and range maps viz. Intrinsic Shape Signatures (ISS), NARF, etc. Several keypoint detectors are derived from 2D interest point detectors, they are Harris3D, SIFT3D, SUSAN3D. The following image shows the ISS3D keypoints calculated on a keyboard point cloud data: The colored points are the keypoints. As you can see not all the keypoints are good ones according to the above description of keypoints. This is mainly due to the aberrations in the point cloud data collected.
  • 10. 5 Descriptors: It is a ‘n’ dimensional vector calculated for each points local neighbourhood or sometimes it is computed for the whole cloud. The dimension of the vector depends on the algorithm used for calculating it. These descriptors are divided into 2 categories global and local descriptors. Each local descriptors describe the surface beneath the neighbourhood of each keypoint whereas one global descriptor describes the whole viewed object surface. In order to calculate the descriptors we first have to calculate normals at each point in the specified neighbourhood. Then the difference in the angels between the normals is binned into a histogram. For example Fast Point Feature Histogram (FPFH) has 33 bins. These 33 bins are subdivided into 11 bins based on the value intervals for each parameter (e.g. parameter will the angle difference between the normal at desired point and one of its neighbouring point in x plane). So each interval will have 3 bins. The number of instances for each interval and for each parameter is calculated and then added to the histogram at the appropriate bin. Apart from FPFH there are various algorithm for descriptor calculation like in Local category - Signature of Histograms of Orientations (SHOT), Point Feature Histogram (PFH) and in Global category – Viewpoint Feature Histogram (VFH), Global Fast Point Feature Histogram (GFPFH). In this report we have used the VFH descriptor for global pipeline and SHOT descriptor for local pipeline. Here are the FPFH and VFH descriptors calculated for the keyboard point cloud data shown above: VFH descriptor FPFH descriptors
  • 11. 6 3 The process classification and flow The basis of 3D object recognition is to find a set of correspondences between two different point clouds, one of them containing the object we are looking for. The process is classified into 2 separate pipelines viz. Local pipeline and Global pipeline. The global pipeline is usually used for object detection in a scene and the local pipeline is to find the object position and orientation with respect to the camera. Each pipeline has different stages as shown in the figure below. The main difference between the 2 pipeline is at the stage of description. Both the pipelines use different algorithms for finding descriptors. The local pipeline describes the surface curvatures of an object while the global pipeline considers the object as a whole. So for the complete object recognition we need to use the global pipeline first and then the local one. So accordingly there will be different combinations of algorithm used for description in both the pipeline. The accuracy and speed of the process highly depends on the combination used. A comparative study needs to be done in order to get the optimum throughput. Apart from the algorithms used for descriptors, both the pipelines have different stages. As seen from the figure above, keypoint extraction is only in local pipeline because in global pipeline, descriptor is found for whole object while in local pipeline in order to reduce the processing time keyoints are found first and then their descriptors are calculated. Similarly segmentation step is there in only global pipeline because we want the object from the separately for finding the objects global descriptor as a whole whereas in local pipeline we have the object itself. Usually when these 2 pipelines are integrated, first we carry out the global pipeline i.e. first we find the object in the given scene and then run the local pipeline to calculate the orientation of the found object.
  • 12. 7 3.1 The Global Pipeline As stated earlier the global pipeline is used for finding the class (what object it is?) of the given object. The number of classes that the system can recognise depends on the training of the system. Training Part: A segmented point cloud database is used for training, so we do not need to perform segmentation while training. The training part is done by taking 130 images of each object from different views and then grouping them into few groups based on a threshold distance of the calculated global descriptors. The global descriptor used in this project is VFH (View-Point Feature Histogram) which is view dependent that is why it needs images from different view- points. The descriptors are stored in the database in a KD –tree format. Along with the KD-tree format (hdf5) file there are 2 more files generated namely the descriptor name file and the training data path file. The data path file generated is shown below:
  • 13. 8 Testing Part: First the data captured from the Kinect is segmented to keep only the object in the point cloud. This segmentation is done on the basis of surface planes like cylindrical, planar, spherical etc. For there are different objects kept on the table, so we perform planar segmentation to detect the table surface and cut it out. The remaining disjoint point clouds will the objects kept on the table. (photo of segmentation) Sometimes the segmentation will give back 2-3 points cloud even if only one object is kept on the table in front of it. In such situations the first point cloud produced by the algorithm is the intended object the others are maybe due to the faulty registration of the point cloud in the Kinect. After the segmentation, VFH descriptors are calculated for the acquired point cloud. The descriptor are then compared to all the VFH descriptors stored in the KD-tree while training based on a distance threshold in 308 dimensions (VFH being 308 dimensions). And the closest 5 matching points clouds are output by the program as shown in the figure below The highlighted part in grey is the object model provided for matching and the one highlighted in pink is the matching result. As you can see the keyboard is matched correctly to keyboard.
  • 14. 9 3.2 The Local Pipeline The local pipeline is used to measure the orientation and position of the object in the scene. The orientation which we get is relative to a point cloud which we give for testing (whose orientation and position we know). Training Part: After matching and determining the class of the object, we find the local descriptors file from our database for the one with which it is matched. This database is created for all the files which are used for global pipeline training. For local pipeline training we first find the keypoints of a point cloud. In this project we use ISS3D keypoint detector for finding the keypoints. Having found the keypoints we then find the descriptors. We have used SHOT (Signature of Histogram of Orientations) descriptors. The database of local descriptors for all point clouds is stored in individual files. Testing Part: In this testing part, having identified the class of the given point cloud. We then find ISS3D keypoints and then the SHOT descriptors. These descriptors are then compared with the globally matched point cloud’s local descriptors. The comparison is based on the n dimensional euclidean distance between the descriptors of the two point clouds. The output of this pipeline is the rotational matrix and a translation vector which gives the orientation of the given point cloud with respect to the trained point cloud. The figure below shows the result of the local pipeline.
  • 15. 10 RANSAC (Random Sample Consensus) part: The correspondences from the local matching, as seen in the figure above, are not all correct. There are always considerable number of wrong matches which lead to incorrect rotation matrix and translation vector. In order to remove the incorrect matches, RANSAC is used. An iterative process which finds out the parameters of a mathematical model given a set of pre-recorded information having outliers. Its aim is to remove these outliers. So here it finds the orientation of the correspondences lines and removes the unparalleled lines. The following figures shows the results when RANSAC was used on matching two keyboards.
  • 16. 11 4 Testing and Training Training and testing are 2 separate processes. As mentioned above each of the global and local pipeline has training part and testing part. But while execution the training is done together for both the pipelines and testing for both is done together. The training is done to build a database for object recognition. It has to be done for different views of the object. In testing part, there are additional steps for matching. Training is an offline process i.e. it is done once and then the system is ready recognition while testing is an online process. The efforts are always made to reduce the time required for testing. Training always takes more time since it has to process more number of point clouds.
  • 17. 12 5 Experiment Results Two experiments are conducted. The first is using the object dataset from the internet to train the system and in the second the chair dataset is constructed using an office. The matching accuracy results are stated in each case 5.1 Experiment 1 – RGB-D object dataset from the internet The standard RGB-D object dataset is used in this project for experimentation. The dataset has different folders for each class (for example inside Keyboard folder there are keyboard_1, keyboard_2, etc. folders). Each folder has the object view from a particular elevation. We used only 10 classes (object) from this dataset for final testing. The following are the classes used:
  • 18. 13 We tried global recognition with 2 different algorithms. VFH: 1. Training done with folder 1 (i.e. for e.g. keyboard_1) and testing done on different images from folder 1. Object Matched/Tested Mostly confused with Apple 213/219 Orange, Cap Banana 219/257 Soda_can, Shampoo Cap 222/227 Apple, Soda_can Coffee_mug 197/200 Apple Keyboard 252/252 Kleenex 232/271 Apple, Cap Orange 251/252 Apple Plate 249/253 Coffee_mug Shampoo 208/273 Soda_can, Kleenex Soda_can 207/227 Apple, Shampoo 2. Training done with folder 1 (i.e. for e.g. keyboard_1) and testing done on different images from folder 2 (i.e. for e.g. keyboard_2). Object Matches Mostly confused with Apple 208/225 Orange, Soda_can Banana 167/253 Soda_can, Shampoo Cap 191/238 Coffee_mug, Soda_can Coffee_mug 171/201 Shampoo, Soda_can Keyboard 160/211 Shampoo, Kleenex Kleenex 213/273 Cap, Soda_can Orange 190/253 Apple Plate 182/211 Shampoo 199/273 Soda_can, Kleenex Soda_can 182/211 Apple, Cap
  • 19. 14 OUR_CVFH: 1. Training done with folder 1 (i.e. for e.g. keyboard_1) and testing done on different images from folder 1. Object Matches Mostly confused with Apple 321/330 Orange, Soda_can Banana 288/300 Kleenex, Shampoo Cap 396/411 Kleenex, Coffee_mug Coffee_mug 638/640 Cap Keyboard 418/424 Banana, Plate Kleenex 560/603 Cap, Coffee_mug Orange 406/409 Apple Plate 547/553 Cap, Kleenex Shampoo 367/382 Soda_can, Kleenex Soda_can 309/324 Cap, Apple 2. Training done with folder 1 (i.e. for e.g. keyboard_1) and testing done on different images from folder 2 (i.e. for e.g. keyboard_2). Object Matches Mostly confused with Apple 326/345 Orange, Soda_can Banana 251/348 Shampoo Cap 366/446 Coffee_mug, Kleenex Coffee_mug 669/700 Kleenex, Cap Keyboard 286/511 Shampoo, Cap Kleenex 413/483 Cap, Soda_can Orange 400/415 Apple Plate 633/636 Cap Shampoo 242/411 Soda_can, Banana Soda_can 270/312 Cap, Coffee_mug
  • 20. 15 Based on the accuracy of the 2 methods, VFH global descriptors are used for matching the object in the dataset. The results are demonstrated by capturing a 3D image of a coffee mug and a keyboard and matching it with the available dataset. For Keyboard: So the system identified the given object correctly as keyboard from among the 9 objects used for training. The white keyboard is from the dataset set while the black one is the one captured from the Kinect. There are problems with the sizes of the keyboards. Both are of different sizes so they can’t be aligned exactly. The following is the rotational matrix and the translation vector for the matching: For Coffee Mug: Again for the coffee mug the system identified it correctly. The light purple is the coffee mug from the dataset while the red one is captured using the Kinect. As one can see, the alignment is not proper, so for improving the results one can increase the number of views used for training or increase the number of points used for finding a single local descriptor. The alignment issue arises due to inaccurate performance of local descriptors.
  • 21. 16 5.2 Experiment 2 – Creating and testing on own dataset In this experiment a Chair dataset is created by taking 3D images of an office chair from a particular elevation. The chair was rotated at the same place and different 3D views were collected. All these images are first passed through a Pass Through filter to remove the background objects and the ground and then only the chair remains in the image. VFH global descriptors are found for these images and then the descriptors are stored in a KDtree along with the other 10 object dataset. An image of a different chair is taken, and similarly it is passed through the filter and descriptor calculated to be used for matching. The following are the results of chair matching: In the above figures, the brown chair is the one used in the dataset and the pink chair is used for testing. All the 3 images different view angles of pink chair are given for testing. All the times, the system has correctly identified the given object as chair. Here again, the accuracy of alignment is different for different views of the test chair. The following is the rotational matrix and translation vector for the 1st alignment figure:
  • 22. 17 6 Conclusion A comparative study of algorithms for 3D object recognition is performed. The algorithms considered in this project are Global – VFH, OUR_CVFH, Local – SHOT, FPFH, PFH. After the execution of the algorithms individually the results showed that VFH and SHOT descriptors are better along with ISS3D keypoints. Also when both VFH and SHOT were executed together in a pipeline, it showed good results which evident in this report.
  • 23. 18 References - 3D Object Recognition Based on Local and Global Features Using Point Cloud Library. Author - Khaled Alhamzi, Mohammed Elmogy, Sherif Barakat. - Tutorial Documentation from the original documentation from their website: http://pointclouds.org/documentation/tutorials/ - A Large-Scale Hierarchical Multi-View RGB-D Object Dataset Kevin Lai, Liefeng Bo, Xiaofeng Ren, and Dieter Fox In IEEE International Conference on Robotics and Automation (ICRA), May 2011. - Wikipedia https://en.wikipedia.org/wiki/Structured-light_3D_scanner https://en.wikipedia.org/wiki/Time-of-flight_camera - PCL/OpenNI tutorials http://robotica.unileon.es/mediawiki/index.php/PhD-3D-Object-Tracking