SlideShare a Scribd company logo
1 of 8
Download to read offline
Reconstructing Street-Scenes in Real-Time From a Driving Car
Vladyslav Usenko, Jakob Engel, J¨org St¨uckler, and Daniel Cremers
Technische Universit¨at M¨unchen, {usenko, engelj, stueckle, cremers}@in.tum.de
Abstract
Most current approaches to street-scene 3D reconstruc-
tion from a driving car to date rely on 3D laser scan-
ning or tedious offline computation from visual images.
In this paper, we compare a real-time capable 3D recon-
struction method using a stereo extension of large-scale di-
rect SLAM (LSD-SLAM) with laser-based maps and tradi-
tional stereo reconstructions based on processing individ-
ual stereo frames. In our reconstructions, small-baseline
comparison over several subsequent frames are fused with
fixed-baseline disparity from the stereo camera setup. These
results demonstrate that our direct SLAM technique pro-
vides an excellent compromise between speed and accuracy,
generating visually pleasing and globally consistent semi-
dense reconstructions of the environment in real-time on a
single CPU.
1. Introduction
The 3D reconstruction of environments has been a re-
search topic in computer vision for many years with many
applications in robotics or surveying. In particular this
problem is essential for creating self-driving vehicles, since
3D maps are required for localization and obstacle detec-
tion.
In the beginning, laser-based distance sensors were pri-
marily used for 3D reconstruction [4]. However, these so-
lutions have several drawbacks. Laser scanners usually are
quiet expensive and produce a sparse set of measurements
which are subject to rolling shutter effects under motion.
Recently, the development of powerful camera and com-
puting hardware as well as tremendous research progress
in the field of computer vision have fueled novel research
in the area of real-time 3D reconstruction with RGB-D,
monocular, and stereo cameras. Whereas traditional vi-
sual SLAM techniques are based on keypoints, and camera
tracking and dense stereo reconstruction are done individu-
ally, in this paper, we demonstrate that accurate and consis-
tent results can be obtained with a direct real-time capable
SLAM technique for stereo cameras. To this end, we si-
multaneously track the 6D pose of the stereo camera and
Figure 1. Reconstruction of a street-scene obtained with Large-
scale Direct SLAM using a stereo camera. The method uses static
and temporal stereo to compute semi-dense maps, which are then
used to track camera motion. The figure shows a sample 3D recon-
struction on one of the sequences from the popular Kitti dataset.
reconstruct semi-dense depth maps using spatial and tem-
poral stereo cues. A loop closure thread running in the
background ensures globally consistent reconstructions of
the environment.
We evaluate the reconstruction quality of our method on
the popular Kitti dataset [8]. It contains images captured
from a synchronized stereo pair which is mounted on the
roof of a car driving through a city. The dataset also pro-
vides point clouds from a Velodyne scanner, which we use
as ground-truth to evaluate the accuracy of our reconstruc-
tion method. We also compare depth estimation accuracy of
our approach with other state-of-the-art stereo reconstruc-
tion methods.
1
Figure 2. Top-down view on the reconstructions on the Kitti sequences 00-09 obtained with our method.
1.1. Related Work
Traditionally, laser scanners have been used for obtain-
ing 3D reconstructions of environments. Several research
communities tackle this problem using laser scanners, in-
cluding geomatics (e.g. [22]), computer graphics (e.g. [21]),
and robotics (e.g. [17]). The popularity of laser scanners
stems from the fact that they directly measure 3D point
clouds with high accuracy. Still, their measurement fre-
quency is limited since typically the laser beam is redirected
in all directions using mechanically moving parts. In effect,
either scanning time is slow, scanners become huge with
many individual beams, or sparse scans are produced.
For the alignment of the 3D point clouds produced
by static laser scanners, typically variants of the Iterative
Closest Point (ICP) algorithm [2] are used. An exten-
sive overview and comparisons of different variants can be
found in [18], but most of them are not capable of real-time
reconstruction of large-scale environments, such as cities.
If the laser scanner itself moves during acquisition, e.g.
on a self-driving car, spatial referencing of the individual 3D
measurements becomes more difficult. An accurate odom-
etry and mapping system that utilizes laser scans during
motion has been presented in [23]. Real-time capability is
achieved by running 6D motion tracking of the laser scan-
ner at high frequency on a small time-slice of points and
registering all available data at lower frequency. Recently,
this algorithm has been extended to use stereo images to
improve the odometry accuracy [24].
Camera based 3D reconstruction methods can be divided
into sparse and dense methods. Sparse methods typically
use bundle adjustment techniques to simultaneously track a
motion of the camera and estimate positions of the observed
keypoints. Parallel Tracking and Mapping (PTAM [13]) is
a prominent example which requires just a single monoc-
ular camera to perform 3D reconstruction, however in the
monocular case, the scale of the map is not observable. This
problem is tackled in FrameSLAM [14] and RSLAM [16]
which also apply bundle adjustment. Through the use of
stereo cameras with known baseline, these methods are ca-
pable to estimate metric scale. Some stereo SLAM meth-
ods estimate camera motion based on sparse interest points
[12], but use a dense stereo-processing algorithms in a sec-
ond stage to obtain dense 3D maps [15, 10].
Over the last decades, a large amount of the algorithms
for stereo reconstructions have been developed. Methods
using fixed-baseline rectified images became very popular
because of their low computational costs and accurate re-
sults. These algorithms can be divided into local and global.
Local methods such as block-matching (e.g. [1]) use only
the local neighborhood of the pixel when searching for the
corresponding pixel in a different image. This makes them
fast, but usually resulting reconstructions are not spatially
consistent. Global methods (e.g. [11, 19]) on the contrary
use regularization to improve spatial consistency, which re-
sults in better reconstructions but also requires much higher
computational costs.
2. Reference Methods
In the following, we discuss some of the most prominent
real-time capable stereo reconstruction algorithms which
we will use as reference for comparative evaluation with
our method.
2.1. Block Matching
The Block Matching (BM) algorithm is one of the most
simple local algorithms for stereo reconstruction. First im-
plementations date back to the eighties [1] and comprehen-
sive descriptions can be found in [3] and [20]. It estimates a
disparity by searching for the most similar rectangular patch
in the second stereo image using some patch comparison
metric. Ideally a metric should be tolerant to intensity dif-
ferences, such as change of camera gain or bias, and also
tolerant to deformations when dealing with slanted surfaces.
One typical example of such a metric is normalized cross-
correlation.
The outcome of the algorithm is sensitive to the selected
local window size. If the window size is small, low-textured
areas will not be reconstructed since no distinct features will
fit into a window. On the other hand a large window size
will result in over-smoothed and low-detail reconstructions
in highly textured areas.
2.2. Semi-Global Block Matching
Semi-Global Block Matching (SGBM) proposed in [11]
is an algorithm that estimates optimal disparities along 1D
lines using a polynomial time algorithm. This allows the
algorithm to produce more accurate results than local meth-
ods, but since disparities are optimized only along one di-
rection the results are typically worse than global methods
that regularize in the 2D image domain.
2.3. Efficient Large-Scale Stereo Matching
The Efficient Large-Scale Stereo (elas) Matching
method proposed in [9] uses a set of support points that can
be reliably matched between the images for speeding up dis-
parity computations on high-resolution images. Based on
the matched support points, maximum a-posteriory estima-
tion yields a dense disparity map.
3. Method
Our approach to real-time 3D reconstruction is based
on a stereo-extension to the LSD-SLAM [6] method. In
LSD-SLAM, key-frames along the camera trajectory are
extracted. The motion of the camera is tracked towards a
reference key-frame using direct image alignment. Concur-
rently, semi-dense depth at high-gradient pixels in the refer-
ence key-frame is estimated from stereo towards the current
frame. For SLAM on the global scale, the relative poses be-
tween the key-frames is estimated using scale-aware direct
image alignment, and the scale and view poses of the refer-
ence key-frames are estimated using graph optimization.
In our stereo-extension to LSD-SLAM, scale is directly
observable through the fixed baseline of the stereo camera.
Also for depth estimation, now also static stereo through
the fixed baseline complements the temporal stereo of the
tracked frames towards the reference key-frame. In tempo-
ral stereo, disparity can be estimated along epipolar lines
whose direction depends on the motion direction of the
camera. This extends the fixed direction for disparity es-
timation of the static stereo camera. On the other hand, the
baseline of the stereo camera can be precisely calibrated in
advance and allows for determing reconstruction scale un-
ambiguously.
In this paper we use a stereo version of the algorithm,
since the monocular version [7] does not perform well on
the Kitti sequences and fails frequently. This is presumably
due to the large inter-frame motion along the line-of-sight
of the camera.
In the following we will briefly describe the main steps
of our stereo LSD-SLAM method. Special focus is given to
semi-dense depth estimation and 3D reconstruction.
3.1. Notation
We briefly introduce basic notation that we will use
throughout the paper. Matrices will be denoted by bold
capital letters (R), while bold lower case letters are used
for vectors (ξ). The operator [·]n selects the n-th row of a
matrix. The symbol d denotes the inverse d = z−1
of depth
z.
The global map is maintained as a set of keyframes
Ki = Il
i, Ir
i , Di, Vi . They include a left and right stereo
image I
l/r
i : Ω → R, an inverse depth map Di : ΩDi
→ R+
and a variance map Vi : ΩDi → R+
. Depth and variance
Figure 3. Top down view on the reconstruction obtained with out method on Kitti sequence 05 with two close views.
are only maintained for one of the images in the stereo
pair. We assume the image domain Ω ⊂ R2
to be given
in stereo-rectified image coordinates, i.e., the intrinsic and
extrinsic camera parameters are known a-priori. The do-
main ΩDi ⊂ Ω is the semi-dense restriction to the pixels
which are selected for depth estimation.
We denote pixel coordinates by u = (ux uy 1)
T
. A 3D
position p = (px py pz 1)
T
is projected into the image plane
through the mapping
u = π(p) := K ((px/pz) (py/pz) 1)
T
, (1)
where K is the camera matrix. The mapping p =
π−1
(u, d) := d−1
K−1
u
T
1
T
inverts the projection
with the inverse depth d. We parametrize poses ξ ∈ se(3)
as elements of the Lie algebra of SE(3) with exponential
map Tξ = exp(ξ) ∈ SE(3). We will use Rξ and tξ for the
respective rotation matrix and translation vector.
3.2. Tracking through Direct Image Alignment
The relative pose between images Icurr and Iref is esti-
mated by minimizing pixel-wise residuals
E(ξ) =
u∈Ω
ρ ru(ξ)T
Σ−1
r,uru(ξ) (2)
which we solve as a non-linear least-squares problem. We
implement a robust norm ρ on the square residuals such
as the Huber-norm using the iteratively re-weighted least
squares method.
One common choice of residuals are the photometric
residuals
rI
u(ξ) := Iref (u) − Icurr π Tξπ−1
(u, Dref(u)) . (3)
The uncertainty σI
r,u in this residual can be determined
from a constant intensity measurement variance σ2
I and the
propagated inverse depth uncertainty [5]. Pure photometric
alignment uses ru(ξ) = rI
u(ξ).
If depth maps are available in both images, we can also
measure the geometric residuals
rG
u (ξ) := [p ]z − Dcurr (π (p )) (4)
with p := Tξπ−1
(u, Dref(u)). The variance of the geo-
metric residual is determined from the variance in the in-
verse depth estimates in both frames [5]. Note that we
can easily use both types of residuals simultaneously by
combining them in a stacked residual. We minimize the
direct image alignment objectives using the iteratively re-
weighted Levenberg-Marquardt algorithm.
We furthermore compensate for lighting changes with an
affine lighting model with scale and bias parameters. These
parameters are optimized with the pose ξ in an alternating
way.
3.3. Depth Estimation
Scene geometry is estimated for pixels of the key frame
with high image gradient, since they provide stable dispar-
ity estimates. We initialize the depth map with the propa-
gated depth from the previous keyframe. The depth map is
subsequently updated with new observations in a pixel-wise
depth-filtering framework. We also regularize the depth
maps spatially and remove outliers [5].
We estimate disparity between the current frame and
the reference keyframe using the pose estimate obtained
through tracking. These estimates are fused in the
keyframe. Only pixels are updated with temporal stereo,
whose expected inverse depth error is sufficiently small.
This also constrains depth estimates to pixels with high im-
age gradient along the epipolar line, producing a semi-dense
depth map.
For direct visual odometry with stereo cameras we esti-
mate depth both from static stereo (i.e., using images from
different physical cameras, but taken at the same point in
time) as well as from temporal stereo (i.e., using images
from the same physical camera, taken at different points in
time).
We determine the static stereo disparity at a pixel by a
correspondence search along its epipolar line in the other
stereo image. In our case of stereo-rectified images, this
search can be performed very efficiently along horizontal
lines. As correspondence measure we use the SSD photo-
metric error over five pixels along the scanline.
Static stereo is integrated in two ways. If a new stereo
Figure 4. Close-by views on the point clouds generated by our method from the Kitti sequences 08, 01, 08, 06, 02 and 04.
keyframe is created, the static stereo in this keyframe stereo
pair is used to initialize the depth map. During tracking,
static stereo in the current frame is propagated to the refer-
ence frame and fused with its depth map.
There are several benefits of complementing static with
temporal stereo in a tracking and mapping framework.
Static stereo makes reconstruction scale observable. It is
also independent of camera movement, but is constrained to
a constant baseline, which limits static stereo to an effective
operating range. Temporal stereo requires non-degenerate
camera movement, but is not bound to a specific range
as demonstrated in [5]. The method can reconstruct very
small and very large environments at the same time. Finally,
through the combination of static with temporal stereo, mul-
tiple baseline directions are available: while static stereo
typically has a horizontal baseline – which does not allow
for estimating depth along horizontal edges, temporal stereo
allows for completing the depth map by providing other mo-
tion directions.
3.4. Key-Frame-Based SLAM
With the tracking and mapping approach presented in the
previous section, it is possible to extract keyframes along
the camera trajectory and to estimate their poses consis-
tently in a global reference frame. The method operates by
tracking the camera motion towards reference keyframes.
Once the motion of the camera is sufficiently large, and the
image overlap is too small, the current camera image is se-
lected as a new reference keyframe. The old reference key
frame is added to a keyframe pose-graph. We estimate rel-
ative pose constraints between the keyframes through pho-
tometric and geometric direct image alignment. We test as
set of possible loop-closure candidates that are either found
as nearby view poses or through appearance-based image
retrieval.
In monocular LSD-SLAM, we need to use Sim(3) pose
representation due to the scale ambiguity. Conversely, for
stereo LSD-SLAM, we can directly use SE(3) parametriza-
tion.
For each candidate constaint Kjk
between keyframe k
and j we independently compute ξjki and ξijk
through
direct image alignment using photometric and geometric
residuals. Only if the two estimates are statistically simi-
lar, i.e., if
e(ξjki, ξijk
) := (ξjki ◦ ξijk
)T
Σ−1
(ξjki ◦ ξijk
) (5)
with Σ := Σjki + AdjjkiΣijk
AdjT
jki (6)
is sufficiently small, they are added as constraints to the
pose-graph. To speed up the removal of incorrect loop-
closure candidates, we apply this consistency check after
each pyramid level in the coarse-to-fine scheme. Only if
it passes, direct image alignment is continued on the next
higher resolution. This allows for discarding most incor-
rect candidates with only very little wasted computational
resources.
4. Evaluation
We evaluate our method on the popular and publicly
available Kitti dataset [8]. It provides rectified stereo im-
ages together with Velodyne laser scans captured from a
car driving in various street-scene environments. Specifi-
cally, we used the odometry dataset sequences 00-09 which
provide camera pose ground-truth that can be used to align
several Velodyne scans and to generate denser ground-truth
point clouds in this way.
We present qualitative results of the point clouds gener-
ated from the semi-dense depth maps over the whole tra-
jectory. We also evaluate the accuracy of the reconstructed
point clouds for different processed image resolutions by
comparing them with the Velodyne ground-truth. Finally,
we compare our reconstructions with other state-of-the-art
stereo reconstruction methods in terms of accuracy and run-
time. We use a PC with Intel i7-2600K CPU running at 3.4
GHz for run-time evaluation.
4.1. Qualitative Results
In Fig. 2, we show reconstructions over the full length
trajectories with our method which estimates the camera
motion and semi-dense depth maps in real-time. Loop clo-
sures and pose graph optimization running in a separate
thread allows for keeping the reconstructions globally con-
sistent.
Figs. 1, 3 and 4 show a closer view on the reconstructed
pointclouds. As can be seen, most parts of the high gradi-
ent areas of the image are reconstructed. Using both fixed-
baseline stereo and temporal stereo allows us to estimate a
proper metric scale of the scene.
The main advantage of the proposed method is that
the generated semi-dense maps are directly used for cam-
era tracking and loop closures, while other methods, such
as [10], rely on external keypoint-based odometry. Our ap-
proach results in consistent and visually pleasing 3D recon-
structions.
4.2. Reconstruction Accuracy Depending on Image
Resolution
In order to evaluate 3D reconstruction accuracy, we gen-
erate a ground-truth pointcloud by aligning several Velo-
dyne scans. For every point in the reference point cloud
we estimate a local surface normal by fitting a plane to the
points in a 20 cm neighborhood. For each keyframe, we
reproject the estimated semi-dense point cloud and remove
the points that are higher than the maximum height of Velo-
dyne measurements. For all other points, we determine the
point-to-plane distance to the closest point in the reference
point cloud.
Figure 5 shows the median, 5th and 95th percentiles of
the average point-to-plane distance across all keyframes in
different sequences. The results show that there is no direct
dependency between the resolution of the image and accu-
racy of the resulting pointcloud. On a smaller resolution
images depth-estimation is less sensitive to small repetitive
structures, which results in smaller number of outliers. Also
with all cases we use a sub-pixel depth estimation to deter-
mine depth, which can also lower the difference between
different resolutions.
4.3. Comparison to Other Stereo Reconstruction
Methods
Fig. 6 gives a comparison of reconstructions generated
with our method (LSD) to the point clouds generated by
Efficient Large-Scale Stereo Matching (elas, [9]), Semi-
Global Block Matching (SGBM, [11]) and regular Block
Matching (BM). For accuracy evaluation, we use the same
method as in Sec. 4.2. Fig. 7 shows an example of a ground-
truth point cloud overlayed with reconstructions from the
stereo methods.
In most of the sequences our method has less or equal
average point-to-plane distance to ground-truth point cloud
than SGBM and BM, but elas often performs slightly bet-
ter than our method. However, elas is a pure stereo recon-
struction algorithm that only uses static stereo cues from
the fixed-baseline camera arrangement. Elas also relies on
an extra, e.g. keypoint-based, visual tracking method, while
our simultaneous tracking and mapping approach uses the
00 01 02 03 04 05 06 07 08 09
Sequence
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
Avaragepointtoplanedistance[m]
1240x386
620x184
310x92
Figure 5. Point-to-plane distance between ground-truth point clouds and reconstructions with our method generated at different image
resolutions. We give median, 5th, and 95th percentile on the various Kitti sequences.
00 01 02 03 04 05 06 07 08 09
Sequence
0.0
0.1
0.2
0.3
0.4
0.5
Avaragepointtoplanedistance[m]
lsd
elas
sgbm
bm
Figure 6. Point-to-plane distance between ground-truth point clouds and various stereo-reconstruction methods (including ours). We give
median, 5th, and 95th percentile on the various Kitti sequences.
310 x 92 620 x 184 1240 x 368
Mapping time LSD 5.8ms 23ms 103ms
Table 1. Depth estimation run-time of LSD for different image
resolutions.
LSD SGBM BM elas
Mapping time 103ms 145ms 24ms 142ms
Table 2. Depth estimation run-time for different stereo methods.
3D reconstruction directly for tracking and vice versa.
4.4. Run-Time
Table 1 shows the run-time required for a stereo recon-
struction by our method for a single frame depending on the
image resolution. It can be seen that run-time scales almost
linearly with the number of pixels in the image. In Table 2,
we compare the run-times of the competing methods. The
block matching is the fastest method, but it is also slightly
less accurate than the other methods. Our method shows a
second result, and both SGBM and elas exhibit similar run-
times which are approximately 1.4 times higher than for our
method.
5. Conclusions
In this paper, we compare real-time 3D reconstruction of
street scenes using a semi-dense large-scale direct SLAM
method for stereo cameras with traditional stereo and laser-
based approaches. We demonstrate qualitative reconstruc-
tion results on the Kitti dataset and compare the accuracy
of our method to the state-of-the-art stereo reconstruction
methods. These approaches often rely on a separate visual
odometry or SLAM method, e.g. based on sparse interest
points, in order to recover the camera trajectory. Dense
depth is then obtained in the individual stereo frames. To
our knowledge, our approach is the first method that can re-
cover metrically accurate and globally consistent 3D recon-
structions of large-scale environments from a stereo-camera
system in real-time on a single CPU.
Acknowledgements
This work has been partially supported by grant CR
250/9-2 (Mapping on Demand) of German Research Foun-
dation (DFG) and grant 16SV6394 (AuRoRoll) of BMBF.
References
[1] S. T. Barnard and M. A. Fischler. Computational stereo.
ACM Comput. Surv., 14(4):553–572, Dec. 1982. 3
[2] P. Besl and N. D. McKay. A method for registration of 3-
d shapes. Pattern Analysis and Machine Intelligence, IEEE
Transactions on, 14(2):239–256, 1992. 2
[3] M. Brown, D. Burschka, and G. Hager. Advances in compu-
tational stereo. Pattern Analysis and Machine Intelligence,
IEEE Transactions on, 25(8):993–1008, Aug 2003. 3
[4] M. Buehler, K. Iagnemma, and S. Singh, editors. The
DARPA Urban Challenge: Autonomous Vehicles in City
Figure 7. Single keyframe 3D reconstructions using LSD (purple), elas (blue), SGBM (green) and BM (red) are compared to the point
cloud produced by a Velodyne laser scanner. Laser point clouds from several frames are aligned and combined into a denser point cloud.
The point clouds generated by the stereo reconstruction methods are cut at the maximum measurement height of the Velodyne scanner.
Traffic, George Air Force Base, Victorville, California, USA,
volume 56 of Springer STAR, 2009. 1
[5] J. Engel, T. Sch¨ops, and D. Cremers. LSD-SLAM: Large-
scale direct monocular SLAM. In European Conference on
Computer Vision (ECCV), September 2014. 4, 5
[6] J. Engel, J. St¨uckler, and D. Cremers. Large-scale direct
SLAM with stereo cameras. In Int. Conf. on Intelligent Robot
Systems (IROS), 2015. 3
[7] J. Engel, J. Sturm, and D. Cremers. Semi-dense visual odom-
etry for a monocular camera. In Int. Conf. on Computer Vi-
sion (ICCV), 2013. 3
[8] A. Geiger. Are we ready for autonomous driving? the KITTI
vision benchmark suite. In Int. Conf. on Computer Vision
and Pattern Recognition (CVPR), 2012. 1, 6
[9] A. Geiger, M. Roser, and R. Urtasun. Efficient large-scale
stereo matching. In Asian Conference on Computer Vision
(ACCV), 2010. 3, 6
[10] A. Geiger, J. Ziegler, and C. Stiller. Stereoscan: Dense 3D
reconstruction in real-time. In Proceedings of the IEEE In-
telligent Vehicles Symposium, pages 963–968, 2011. 3, 6
[11] H. Hirschmuller. Stereo processing by semiglobal matching
and mutual information. IEEE Trans. on PAMI, 30(2):328–
341, 2008. 3, 6
[12] B. Kitt, A. Geiger, and H. Lategahn. Visual odometry based
on stereo image sequences with RANSAC-based outlier re-
jection scheme. In Intelligent Vehicles Symp. (IV), 2010. 3
[13] G. Klein and D. Murray. Parallel tracking and mapping for
small AR workspaces. In Int. Symp. on Mixed and Aug-
mented Reality (ISMAR), 2007. 2
[14] K. Konolige and M. Agrawal. Frameslam: From bundle
adjustment to real-time visual mapping. Transaction on
Robotics, 24(5):1066–1077, Oct. 2008. 2
[15] M. Pollefeys et al. Detailed real-time urban 3D reconstruc-
tion from video. Int. J. Comput. Vision (IJCV), 78(2-3):143–
167, 2008. 3
[16] C. Mei, G. Sibley, M. Cummins, P. Newman, and I. Reid.
RSLAM: A system for large-scale mapping in constant-time
using stereo. Int. J. Comput. Vision (IJCV), 2010. 2
[17] A. N¨uchter. 3D robotic mapping : the simultaneous local-
ization and mapping problem with six degrees of freedom.
STAR. Springer, Berlin, Heidelberg, 2009. 2
[18] F. Pomerleau, F. Colas, R. Siegwart, and S. Magnenat. Com-
paring ICP variants on real-world data sets. Auton. Robots,
34(3):133–148, Apr. 2013. 2
[19] R. Ranftl, S. Gehrig, T. Pock, and H. Bischof. Pushing the
limits of stereo using variational stereo estimation. In Intel-
ligent Vehicles Symposium, pages 401–407, 2012. 3
[20] D. Scharstein and R. Szeliski. A taxonomy and evaluation of
dense two-frame stereo correspondence algorithms. Interna-
tional Journal of Computer Vision, 47:7–42, 2002. 3
[21] G. K. L. Tam, Z.-Q. Cheng, Y.-K. Lai, F. C. Langbein, Y. Liu,
D. Marshall, R. R. Martin, X.-F. Sun, and P. L. Rosin. Regis-
tration of 3D point clouds and meshes: A survey from rigid
to nonrigid. IEEE Transactions on Visualization and Com-
puter Graphics, 19(7):1199–1217, 2013. 2
[22] P. W. Theiler and K. Schindler. Automatic registration of ter-
restrial laser scanner point clouds using natural planar sur-
faces. ISPRS Annals of Photogrammetry, Remote Sensing
and Spatial Information Sciences, I-3:173–178, 2012. 2
[23] J. Zhang and S. Singh. Loam: Lidar odometry and mapping
in real-time. In Robotics: Science and Systems, 2014. 2
[24] J. Zhang and S. Singh. Visual-lidar odometry and mapping:
Low-drift, robust, and fast. IEEE Intl. Conf. on Robotics and
Automation (ICRA), May 2015. 2

More Related Content

What's hot

Computer vision - two view geometry
Computer vision -  two view geometryComputer vision -  two view geometry
Computer vision - two view geometryWael Badawy
 
Mathematical operations in image processing
Mathematical operations in image processingMathematical operations in image processing
Mathematical operations in image processingAsad Ali
 
auto-assistance system for visually impaired person
auto-assistance system for visually impaired personauto-assistance system for visually impaired person
auto-assistance system for visually impaired personshahsamkit73
 
object recognition for robots
object recognition for robotsobject recognition for robots
object recognition for robotss1240148
 
Lec14 multiview stereo
Lec14 multiview stereoLec14 multiview stereo
Lec14 multiview stereoBaliThorat1
 
Image Restoration
Image RestorationImage Restoration
Image RestorationPoonam Seth
 
Image processing fundamentals
Image processing fundamentalsImage processing fundamentals
Image processing fundamentalsA B Shinde
 
AI Computer vision
AI Computer visionAI Computer vision
AI Computer visionKashafnaz2
 
3 d transformation
3 d transformation3 d transformation
3 d transformationPooja Dixit
 
Holographic projection technology
Holographic projection technologyHolographic projection technology
Holographic projection technologyJanardhan Raju
 
Introduction to Image Compression
Introduction to Image CompressionIntroduction to Image Compression
Introduction to Image CompressionKalyan Acharjya
 
Comparative study on image segmentation techniques
Comparative study on image segmentation techniquesComparative study on image segmentation techniques
Comparative study on image segmentation techniquesgmidhubala
 
Content Based Image Retrieval
Content Based Image Retrieval Content Based Image Retrieval
Content Based Image Retrieval Swati Chauhan
 
Classical relations and fuzzy relations
Classical relations and fuzzy relationsClassical relations and fuzzy relations
Classical relations and fuzzy relationsBaran Kaynak
 

What's hot (20)

Computer vision - two view geometry
Computer vision -  two view geometryComputer vision -  two view geometry
Computer vision - two view geometry
 
Mathematical operations in image processing
Mathematical operations in image processingMathematical operations in image processing
Mathematical operations in image processing
 
auto-assistance system for visually impaired person
auto-assistance system for visually impaired personauto-assistance system for visually impaired person
auto-assistance system for visually impaired person
 
Segmentation
SegmentationSegmentation
Segmentation
 
object recognition for robots
object recognition for robotsobject recognition for robots
object recognition for robots
 
Lec14 multiview stereo
Lec14 multiview stereoLec14 multiview stereo
Lec14 multiview stereo
 
Hit and-miss transform
Hit and-miss transformHit and-miss transform
Hit and-miss transform
 
Image Restoration
Image RestorationImage Restoration
Image Restoration
 
Image processing fundamentals
Image processing fundamentalsImage processing fundamentals
Image processing fundamentals
 
AI Computer vision
AI Computer visionAI Computer vision
AI Computer vision
 
3 d transformation
3 d transformation3 d transformation
3 d transformation
 
3D Hand Tracking
3D Hand Tracking3D Hand Tracking
3D Hand Tracking
 
Image Compression
Image CompressionImage Compression
Image Compression
 
Stereo vision
Stereo visionStereo vision
Stereo vision
 
Holographic projection technology
Holographic projection technologyHolographic projection technology
Holographic projection technology
 
Introduction to Image Compression
Introduction to Image CompressionIntroduction to Image Compression
Introduction to Image Compression
 
Comparative study on image segmentation techniques
Comparative study on image segmentation techniquesComparative study on image segmentation techniques
Comparative study on image segmentation techniques
 
Content Based Image Retrieval
Content Based Image Retrieval Content Based Image Retrieval
Content Based Image Retrieval
 
Classical relations and fuzzy relations
Classical relations and fuzzy relationsClassical relations and fuzzy relations
Classical relations and fuzzy relations
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 

Similar to 3D reconstruction

EVALUATION OF THE VISUAL ODOMETRY METHODS FOR SEMI-DENSE REAL-TIME
EVALUATION OF THE VISUAL ODOMETRY METHODS FOR SEMI-DENSE REAL-TIMEEVALUATION OF THE VISUAL ODOMETRY METHODS FOR SEMI-DENSE REAL-TIME
EVALUATION OF THE VISUAL ODOMETRY METHODS FOR SEMI-DENSE REAL-TIMEacijjournal
 
A ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHM
A ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHMA ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHM
A ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHMcsandit
 
Review_of_phase_measuring_deflectometry.pdf
Review_of_phase_measuring_deflectometry.pdfReview_of_phase_measuring_deflectometry.pdf
Review_of_phase_measuring_deflectometry.pdfSevgulDemir
 
Robot Pose Estimation: A Vertical Stereo Pair Versus a Horizontal One
Robot Pose Estimation: A Vertical Stereo Pair Versus a Horizontal OneRobot Pose Estimation: A Vertical Stereo Pair Versus a Horizontal One
Robot Pose Estimation: A Vertical Stereo Pair Versus a Horizontal Oneijcsit
 
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...c.choi
 
Vision based non-invasive tool for facial swelling assessment
Vision based non-invasive tool for facial swelling assessment Vision based non-invasive tool for facial swelling assessment
Vision based non-invasive tool for facial swelling assessment University of Moratuwa
 
isvc_draft6_final_1_harvey_mudd (1)
isvc_draft6_final_1_harvey_mudd (1)isvc_draft6_final_1_harvey_mudd (1)
isvc_draft6_final_1_harvey_mudd (1)David Tenorio
 
Matching algorithm performance analysis for autocalibration method of stereo ...
Matching algorithm performance analysis for autocalibration method of stereo ...Matching algorithm performance analysis for autocalibration method of stereo ...
Matching algorithm performance analysis for autocalibration method of stereo ...TELKOMNIKA JOURNAL
 
Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithm
Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithmPaper 58 disparity-of_stereo_images_by_self_adaptive_algorithm
Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithmMDABDULMANNANMONDAL
 
16 channels velodyne versus planar lidars based perception system for large s...
16 channels velodyne versus planar lidars based perception system for large s...16 channels velodyne versus planar lidars based perception system for large s...
16 channels velodyne versus planar lidars based perception system for large s...Brett Johnson
 
16 channels Velodyne versus planar LiDARs based perception system for Large S...
16 channels Velodyne versus planar LiDARs based perception system for Large S...16 channels Velodyne versus planar LiDARs based perception system for Large S...
16 channels Velodyne versus planar LiDARs based perception system for Large S...Brett Johnson
 
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...CSCJournals
 
Efficient 3D stereo vision stabilization for multi-camera viewpoints
Efficient 3D stereo vision stabilization for multi-camera viewpointsEfficient 3D stereo vision stabilization for multi-camera viewpoints
Efficient 3D stereo vision stabilization for multi-camera viewpointsjournalBEEI
 
Performance analysis on color image mosaicing techniques on FPGA
Performance analysis on color image mosaicing techniques on FPGAPerformance analysis on color image mosaicing techniques on FPGA
Performance analysis on color image mosaicing techniques on FPGAIJECEIAES
 
Surface generation from point cloud.pdf
Surface generation from point cloud.pdfSurface generation from point cloud.pdf
Surface generation from point cloud.pdfssuserd3982b1
 

Similar to 3D reconstruction (20)

EVALUATION OF THE VISUAL ODOMETRY METHODS FOR SEMI-DENSE REAL-TIME
EVALUATION OF THE VISUAL ODOMETRY METHODS FOR SEMI-DENSE REAL-TIMEEVALUATION OF THE VISUAL ODOMETRY METHODS FOR SEMI-DENSE REAL-TIME
EVALUATION OF THE VISUAL ODOMETRY METHODS FOR SEMI-DENSE REAL-TIME
 
A ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHM
A ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHMA ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHM
A ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHM
 
Graphics
GraphicsGraphics
Graphics
 
Review_of_phase_measuring_deflectometry.pdf
Review_of_phase_measuring_deflectometry.pdfReview_of_phase_measuring_deflectometry.pdf
Review_of_phase_measuring_deflectometry.pdf
 
AR/SLAM for end-users
AR/SLAM for end-usersAR/SLAM for end-users
AR/SLAM for end-users
 
Robot Pose Estimation: A Vertical Stereo Pair Versus a Horizontal One
Robot Pose Estimation: A Vertical Stereo Pair Versus a Horizontal OneRobot Pose Estimation: A Vertical Stereo Pair Versus a Horizontal One
Robot Pose Estimation: A Vertical Stereo Pair Versus a Horizontal One
 
06466595
0646659506466595
06466595
 
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
 
Vision based non-invasive tool for facial swelling assessment
Vision based non-invasive tool for facial swelling assessment Vision based non-invasive tool for facial swelling assessment
Vision based non-invasive tool for facial swelling assessment
 
isvc_draft6_final_1_harvey_mudd (1)
isvc_draft6_final_1_harvey_mudd (1)isvc_draft6_final_1_harvey_mudd (1)
isvc_draft6_final_1_harvey_mudd (1)
 
Matching algorithm performance analysis for autocalibration method of stereo ...
Matching algorithm performance analysis for autocalibration method of stereo ...Matching algorithm performance analysis for autocalibration method of stereo ...
Matching algorithm performance analysis for autocalibration method of stereo ...
 
reviewpaper
reviewpaperreviewpaper
reviewpaper
 
Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithm
Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithmPaper 58 disparity-of_stereo_images_by_self_adaptive_algorithm
Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithm
 
16 channels velodyne versus planar lidars based perception system for large s...
16 channels velodyne versus planar lidars based perception system for large s...16 channels velodyne versus planar lidars based perception system for large s...
16 channels velodyne versus planar lidars based perception system for large s...
 
16 channels Velodyne versus planar LiDARs based perception system for Large S...
16 channels Velodyne versus planar LiDARs based perception system for Large S...16 channels Velodyne versus planar LiDARs based perception system for Large S...
16 channels Velodyne versus planar LiDARs based perception system for Large S...
 
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
 
Efficient 3D stereo vision stabilization for multi-camera viewpoints
Efficient 3D stereo vision stabilization for multi-camera viewpointsEfficient 3D stereo vision stabilization for multi-camera viewpoints
Efficient 3D stereo vision stabilization for multi-camera viewpoints
 
Performance analysis on color image mosaicing techniques on FPGA
Performance analysis on color image mosaicing techniques on FPGAPerformance analysis on color image mosaicing techniques on FPGA
Performance analysis on color image mosaicing techniques on FPGA
 
Surface generation from point cloud.pdf
Surface generation from point cloud.pdfSurface generation from point cloud.pdf
Surface generation from point cloud.pdf
 
paper
paperpaper
paper
 

Recently uploaded

Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Autonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptAutonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptbibisarnayak0
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 

Recently uploaded (20)

Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Autonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptAutonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.ppt
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 

3D reconstruction

  • 1. Reconstructing Street-Scenes in Real-Time From a Driving Car Vladyslav Usenko, Jakob Engel, J¨org St¨uckler, and Daniel Cremers Technische Universit¨at M¨unchen, {usenko, engelj, stueckle, cremers}@in.tum.de Abstract Most current approaches to street-scene 3D reconstruc- tion from a driving car to date rely on 3D laser scan- ning or tedious offline computation from visual images. In this paper, we compare a real-time capable 3D recon- struction method using a stereo extension of large-scale di- rect SLAM (LSD-SLAM) with laser-based maps and tradi- tional stereo reconstructions based on processing individ- ual stereo frames. In our reconstructions, small-baseline comparison over several subsequent frames are fused with fixed-baseline disparity from the stereo camera setup. These results demonstrate that our direct SLAM technique pro- vides an excellent compromise between speed and accuracy, generating visually pleasing and globally consistent semi- dense reconstructions of the environment in real-time on a single CPU. 1. Introduction The 3D reconstruction of environments has been a re- search topic in computer vision for many years with many applications in robotics or surveying. In particular this problem is essential for creating self-driving vehicles, since 3D maps are required for localization and obstacle detec- tion. In the beginning, laser-based distance sensors were pri- marily used for 3D reconstruction [4]. However, these so- lutions have several drawbacks. Laser scanners usually are quiet expensive and produce a sparse set of measurements which are subject to rolling shutter effects under motion. Recently, the development of powerful camera and com- puting hardware as well as tremendous research progress in the field of computer vision have fueled novel research in the area of real-time 3D reconstruction with RGB-D, monocular, and stereo cameras. Whereas traditional vi- sual SLAM techniques are based on keypoints, and camera tracking and dense stereo reconstruction are done individu- ally, in this paper, we demonstrate that accurate and consis- tent results can be obtained with a direct real-time capable SLAM technique for stereo cameras. To this end, we si- multaneously track the 6D pose of the stereo camera and Figure 1. Reconstruction of a street-scene obtained with Large- scale Direct SLAM using a stereo camera. The method uses static and temporal stereo to compute semi-dense maps, which are then used to track camera motion. The figure shows a sample 3D recon- struction on one of the sequences from the popular Kitti dataset. reconstruct semi-dense depth maps using spatial and tem- poral stereo cues. A loop closure thread running in the background ensures globally consistent reconstructions of the environment. We evaluate the reconstruction quality of our method on the popular Kitti dataset [8]. It contains images captured from a synchronized stereo pair which is mounted on the roof of a car driving through a city. The dataset also pro- vides point clouds from a Velodyne scanner, which we use as ground-truth to evaluate the accuracy of our reconstruc- tion method. We also compare depth estimation accuracy of our approach with other state-of-the-art stereo reconstruc- tion methods. 1
  • 2. Figure 2. Top-down view on the reconstructions on the Kitti sequences 00-09 obtained with our method. 1.1. Related Work Traditionally, laser scanners have been used for obtain- ing 3D reconstructions of environments. Several research communities tackle this problem using laser scanners, in- cluding geomatics (e.g. [22]), computer graphics (e.g. [21]), and robotics (e.g. [17]). The popularity of laser scanners stems from the fact that they directly measure 3D point clouds with high accuracy. Still, their measurement fre- quency is limited since typically the laser beam is redirected in all directions using mechanically moving parts. In effect, either scanning time is slow, scanners become huge with many individual beams, or sparse scans are produced. For the alignment of the 3D point clouds produced by static laser scanners, typically variants of the Iterative Closest Point (ICP) algorithm [2] are used. An exten- sive overview and comparisons of different variants can be found in [18], but most of them are not capable of real-time reconstruction of large-scale environments, such as cities. If the laser scanner itself moves during acquisition, e.g. on a self-driving car, spatial referencing of the individual 3D measurements becomes more difficult. An accurate odom- etry and mapping system that utilizes laser scans during motion has been presented in [23]. Real-time capability is achieved by running 6D motion tracking of the laser scan- ner at high frequency on a small time-slice of points and registering all available data at lower frequency. Recently, this algorithm has been extended to use stereo images to improve the odometry accuracy [24]. Camera based 3D reconstruction methods can be divided into sparse and dense methods. Sparse methods typically use bundle adjustment techniques to simultaneously track a motion of the camera and estimate positions of the observed keypoints. Parallel Tracking and Mapping (PTAM [13]) is a prominent example which requires just a single monoc- ular camera to perform 3D reconstruction, however in the monocular case, the scale of the map is not observable. This problem is tackled in FrameSLAM [14] and RSLAM [16] which also apply bundle adjustment. Through the use of stereo cameras with known baseline, these methods are ca- pable to estimate metric scale. Some stereo SLAM meth-
  • 3. ods estimate camera motion based on sparse interest points [12], but use a dense stereo-processing algorithms in a sec- ond stage to obtain dense 3D maps [15, 10]. Over the last decades, a large amount of the algorithms for stereo reconstructions have been developed. Methods using fixed-baseline rectified images became very popular because of their low computational costs and accurate re- sults. These algorithms can be divided into local and global. Local methods such as block-matching (e.g. [1]) use only the local neighborhood of the pixel when searching for the corresponding pixel in a different image. This makes them fast, but usually resulting reconstructions are not spatially consistent. Global methods (e.g. [11, 19]) on the contrary use regularization to improve spatial consistency, which re- sults in better reconstructions but also requires much higher computational costs. 2. Reference Methods In the following, we discuss some of the most prominent real-time capable stereo reconstruction algorithms which we will use as reference for comparative evaluation with our method. 2.1. Block Matching The Block Matching (BM) algorithm is one of the most simple local algorithms for stereo reconstruction. First im- plementations date back to the eighties [1] and comprehen- sive descriptions can be found in [3] and [20]. It estimates a disparity by searching for the most similar rectangular patch in the second stereo image using some patch comparison metric. Ideally a metric should be tolerant to intensity dif- ferences, such as change of camera gain or bias, and also tolerant to deformations when dealing with slanted surfaces. One typical example of such a metric is normalized cross- correlation. The outcome of the algorithm is sensitive to the selected local window size. If the window size is small, low-textured areas will not be reconstructed since no distinct features will fit into a window. On the other hand a large window size will result in over-smoothed and low-detail reconstructions in highly textured areas. 2.2. Semi-Global Block Matching Semi-Global Block Matching (SGBM) proposed in [11] is an algorithm that estimates optimal disparities along 1D lines using a polynomial time algorithm. This allows the algorithm to produce more accurate results than local meth- ods, but since disparities are optimized only along one di- rection the results are typically worse than global methods that regularize in the 2D image domain. 2.3. Efficient Large-Scale Stereo Matching The Efficient Large-Scale Stereo (elas) Matching method proposed in [9] uses a set of support points that can be reliably matched between the images for speeding up dis- parity computations on high-resolution images. Based on the matched support points, maximum a-posteriory estima- tion yields a dense disparity map. 3. Method Our approach to real-time 3D reconstruction is based on a stereo-extension to the LSD-SLAM [6] method. In LSD-SLAM, key-frames along the camera trajectory are extracted. The motion of the camera is tracked towards a reference key-frame using direct image alignment. Concur- rently, semi-dense depth at high-gradient pixels in the refer- ence key-frame is estimated from stereo towards the current frame. For SLAM on the global scale, the relative poses be- tween the key-frames is estimated using scale-aware direct image alignment, and the scale and view poses of the refer- ence key-frames are estimated using graph optimization. In our stereo-extension to LSD-SLAM, scale is directly observable through the fixed baseline of the stereo camera. Also for depth estimation, now also static stereo through the fixed baseline complements the temporal stereo of the tracked frames towards the reference key-frame. In tempo- ral stereo, disparity can be estimated along epipolar lines whose direction depends on the motion direction of the camera. This extends the fixed direction for disparity es- timation of the static stereo camera. On the other hand, the baseline of the stereo camera can be precisely calibrated in advance and allows for determing reconstruction scale un- ambiguously. In this paper we use a stereo version of the algorithm, since the monocular version [7] does not perform well on the Kitti sequences and fails frequently. This is presumably due to the large inter-frame motion along the line-of-sight of the camera. In the following we will briefly describe the main steps of our stereo LSD-SLAM method. Special focus is given to semi-dense depth estimation and 3D reconstruction. 3.1. Notation We briefly introduce basic notation that we will use throughout the paper. Matrices will be denoted by bold capital letters (R), while bold lower case letters are used for vectors (ξ). The operator [·]n selects the n-th row of a matrix. The symbol d denotes the inverse d = z−1 of depth z. The global map is maintained as a set of keyframes Ki = Il i, Ir i , Di, Vi . They include a left and right stereo image I l/r i : Ω → R, an inverse depth map Di : ΩDi → R+ and a variance map Vi : ΩDi → R+ . Depth and variance
  • 4. Figure 3. Top down view on the reconstruction obtained with out method on Kitti sequence 05 with two close views. are only maintained for one of the images in the stereo pair. We assume the image domain Ω ⊂ R2 to be given in stereo-rectified image coordinates, i.e., the intrinsic and extrinsic camera parameters are known a-priori. The do- main ΩDi ⊂ Ω is the semi-dense restriction to the pixels which are selected for depth estimation. We denote pixel coordinates by u = (ux uy 1) T . A 3D position p = (px py pz 1) T is projected into the image plane through the mapping u = π(p) := K ((px/pz) (py/pz) 1) T , (1) where K is the camera matrix. The mapping p = π−1 (u, d) := d−1 K−1 u T 1 T inverts the projection with the inverse depth d. We parametrize poses ξ ∈ se(3) as elements of the Lie algebra of SE(3) with exponential map Tξ = exp(ξ) ∈ SE(3). We will use Rξ and tξ for the respective rotation matrix and translation vector. 3.2. Tracking through Direct Image Alignment The relative pose between images Icurr and Iref is esti- mated by minimizing pixel-wise residuals E(ξ) = u∈Ω ρ ru(ξ)T Σ−1 r,uru(ξ) (2) which we solve as a non-linear least-squares problem. We implement a robust norm ρ on the square residuals such as the Huber-norm using the iteratively re-weighted least squares method. One common choice of residuals are the photometric residuals rI u(ξ) := Iref (u) − Icurr π Tξπ−1 (u, Dref(u)) . (3) The uncertainty σI r,u in this residual can be determined from a constant intensity measurement variance σ2 I and the propagated inverse depth uncertainty [5]. Pure photometric alignment uses ru(ξ) = rI u(ξ). If depth maps are available in both images, we can also measure the geometric residuals rG u (ξ) := [p ]z − Dcurr (π (p )) (4) with p := Tξπ−1 (u, Dref(u)). The variance of the geo- metric residual is determined from the variance in the in- verse depth estimates in both frames [5]. Note that we can easily use both types of residuals simultaneously by combining them in a stacked residual. We minimize the direct image alignment objectives using the iteratively re- weighted Levenberg-Marquardt algorithm. We furthermore compensate for lighting changes with an affine lighting model with scale and bias parameters. These parameters are optimized with the pose ξ in an alternating way. 3.3. Depth Estimation Scene geometry is estimated for pixels of the key frame with high image gradient, since they provide stable dispar- ity estimates. We initialize the depth map with the propa- gated depth from the previous keyframe. The depth map is subsequently updated with new observations in a pixel-wise depth-filtering framework. We also regularize the depth maps spatially and remove outliers [5]. We estimate disparity between the current frame and the reference keyframe using the pose estimate obtained through tracking. These estimates are fused in the keyframe. Only pixels are updated with temporal stereo, whose expected inverse depth error is sufficiently small. This also constrains depth estimates to pixels with high im- age gradient along the epipolar line, producing a semi-dense depth map. For direct visual odometry with stereo cameras we esti- mate depth both from static stereo (i.e., using images from different physical cameras, but taken at the same point in time) as well as from temporal stereo (i.e., using images from the same physical camera, taken at different points in time). We determine the static stereo disparity at a pixel by a correspondence search along its epipolar line in the other stereo image. In our case of stereo-rectified images, this search can be performed very efficiently along horizontal lines. As correspondence measure we use the SSD photo- metric error over five pixels along the scanline. Static stereo is integrated in two ways. If a new stereo
  • 5. Figure 4. Close-by views on the point clouds generated by our method from the Kitti sequences 08, 01, 08, 06, 02 and 04. keyframe is created, the static stereo in this keyframe stereo pair is used to initialize the depth map. During tracking, static stereo in the current frame is propagated to the refer- ence frame and fused with its depth map. There are several benefits of complementing static with temporal stereo in a tracking and mapping framework. Static stereo makes reconstruction scale observable. It is also independent of camera movement, but is constrained to a constant baseline, which limits static stereo to an effective operating range. Temporal stereo requires non-degenerate camera movement, but is not bound to a specific range as demonstrated in [5]. The method can reconstruct very small and very large environments at the same time. Finally, through the combination of static with temporal stereo, mul- tiple baseline directions are available: while static stereo typically has a horizontal baseline – which does not allow for estimating depth along horizontal edges, temporal stereo allows for completing the depth map by providing other mo- tion directions. 3.4. Key-Frame-Based SLAM With the tracking and mapping approach presented in the previous section, it is possible to extract keyframes along the camera trajectory and to estimate their poses consis- tently in a global reference frame. The method operates by tracking the camera motion towards reference keyframes. Once the motion of the camera is sufficiently large, and the image overlap is too small, the current camera image is se- lected as a new reference keyframe. The old reference key frame is added to a keyframe pose-graph. We estimate rel-
  • 6. ative pose constraints between the keyframes through pho- tometric and geometric direct image alignment. We test as set of possible loop-closure candidates that are either found as nearby view poses or through appearance-based image retrieval. In monocular LSD-SLAM, we need to use Sim(3) pose representation due to the scale ambiguity. Conversely, for stereo LSD-SLAM, we can directly use SE(3) parametriza- tion. For each candidate constaint Kjk between keyframe k and j we independently compute ξjki and ξijk through direct image alignment using photometric and geometric residuals. Only if the two estimates are statistically simi- lar, i.e., if e(ξjki, ξijk ) := (ξjki ◦ ξijk )T Σ−1 (ξjki ◦ ξijk ) (5) with Σ := Σjki + AdjjkiΣijk AdjT jki (6) is sufficiently small, they are added as constraints to the pose-graph. To speed up the removal of incorrect loop- closure candidates, we apply this consistency check after each pyramid level in the coarse-to-fine scheme. Only if it passes, direct image alignment is continued on the next higher resolution. This allows for discarding most incor- rect candidates with only very little wasted computational resources. 4. Evaluation We evaluate our method on the popular and publicly available Kitti dataset [8]. It provides rectified stereo im- ages together with Velodyne laser scans captured from a car driving in various street-scene environments. Specifi- cally, we used the odometry dataset sequences 00-09 which provide camera pose ground-truth that can be used to align several Velodyne scans and to generate denser ground-truth point clouds in this way. We present qualitative results of the point clouds gener- ated from the semi-dense depth maps over the whole tra- jectory. We also evaluate the accuracy of the reconstructed point clouds for different processed image resolutions by comparing them with the Velodyne ground-truth. Finally, we compare our reconstructions with other state-of-the-art stereo reconstruction methods in terms of accuracy and run- time. We use a PC with Intel i7-2600K CPU running at 3.4 GHz for run-time evaluation. 4.1. Qualitative Results In Fig. 2, we show reconstructions over the full length trajectories with our method which estimates the camera motion and semi-dense depth maps in real-time. Loop clo- sures and pose graph optimization running in a separate thread allows for keeping the reconstructions globally con- sistent. Figs. 1, 3 and 4 show a closer view on the reconstructed pointclouds. As can be seen, most parts of the high gradi- ent areas of the image are reconstructed. Using both fixed- baseline stereo and temporal stereo allows us to estimate a proper metric scale of the scene. The main advantage of the proposed method is that the generated semi-dense maps are directly used for cam- era tracking and loop closures, while other methods, such as [10], rely on external keypoint-based odometry. Our ap- proach results in consistent and visually pleasing 3D recon- structions. 4.2. Reconstruction Accuracy Depending on Image Resolution In order to evaluate 3D reconstruction accuracy, we gen- erate a ground-truth pointcloud by aligning several Velo- dyne scans. For every point in the reference point cloud we estimate a local surface normal by fitting a plane to the points in a 20 cm neighborhood. For each keyframe, we reproject the estimated semi-dense point cloud and remove the points that are higher than the maximum height of Velo- dyne measurements. For all other points, we determine the point-to-plane distance to the closest point in the reference point cloud. Figure 5 shows the median, 5th and 95th percentiles of the average point-to-plane distance across all keyframes in different sequences. The results show that there is no direct dependency between the resolution of the image and accu- racy of the resulting pointcloud. On a smaller resolution images depth-estimation is less sensitive to small repetitive structures, which results in smaller number of outliers. Also with all cases we use a sub-pixel depth estimation to deter- mine depth, which can also lower the difference between different resolutions. 4.3. Comparison to Other Stereo Reconstruction Methods Fig. 6 gives a comparison of reconstructions generated with our method (LSD) to the point clouds generated by Efficient Large-Scale Stereo Matching (elas, [9]), Semi- Global Block Matching (SGBM, [11]) and regular Block Matching (BM). For accuracy evaluation, we use the same method as in Sec. 4.2. Fig. 7 shows an example of a ground- truth point cloud overlayed with reconstructions from the stereo methods. In most of the sequences our method has less or equal average point-to-plane distance to ground-truth point cloud than SGBM and BM, but elas often performs slightly bet- ter than our method. However, elas is a pure stereo recon- struction algorithm that only uses static stereo cues from the fixed-baseline camera arrangement. Elas also relies on an extra, e.g. keypoint-based, visual tracking method, while our simultaneous tracking and mapping approach uses the
  • 7. 00 01 02 03 04 05 06 07 08 09 Sequence 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Avaragepointtoplanedistance[m] 1240x386 620x184 310x92 Figure 5. Point-to-plane distance between ground-truth point clouds and reconstructions with our method generated at different image resolutions. We give median, 5th, and 95th percentile on the various Kitti sequences. 00 01 02 03 04 05 06 07 08 09 Sequence 0.0 0.1 0.2 0.3 0.4 0.5 Avaragepointtoplanedistance[m] lsd elas sgbm bm Figure 6. Point-to-plane distance between ground-truth point clouds and various stereo-reconstruction methods (including ours). We give median, 5th, and 95th percentile on the various Kitti sequences. 310 x 92 620 x 184 1240 x 368 Mapping time LSD 5.8ms 23ms 103ms Table 1. Depth estimation run-time of LSD for different image resolutions. LSD SGBM BM elas Mapping time 103ms 145ms 24ms 142ms Table 2. Depth estimation run-time for different stereo methods. 3D reconstruction directly for tracking and vice versa. 4.4. Run-Time Table 1 shows the run-time required for a stereo recon- struction by our method for a single frame depending on the image resolution. It can be seen that run-time scales almost linearly with the number of pixels in the image. In Table 2, we compare the run-times of the competing methods. The block matching is the fastest method, but it is also slightly less accurate than the other methods. Our method shows a second result, and both SGBM and elas exhibit similar run- times which are approximately 1.4 times higher than for our method. 5. Conclusions In this paper, we compare real-time 3D reconstruction of street scenes using a semi-dense large-scale direct SLAM method for stereo cameras with traditional stereo and laser- based approaches. We demonstrate qualitative reconstruc- tion results on the Kitti dataset and compare the accuracy of our method to the state-of-the-art stereo reconstruction methods. These approaches often rely on a separate visual odometry or SLAM method, e.g. based on sparse interest points, in order to recover the camera trajectory. Dense depth is then obtained in the individual stereo frames. To our knowledge, our approach is the first method that can re- cover metrically accurate and globally consistent 3D recon- structions of large-scale environments from a stereo-camera system in real-time on a single CPU. Acknowledgements This work has been partially supported by grant CR 250/9-2 (Mapping on Demand) of German Research Foun- dation (DFG) and grant 16SV6394 (AuRoRoll) of BMBF. References [1] S. T. Barnard and M. A. Fischler. Computational stereo. ACM Comput. Surv., 14(4):553–572, Dec. 1982. 3 [2] P. Besl and N. D. McKay. A method for registration of 3- d shapes. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 14(2):239–256, 1992. 2 [3] M. Brown, D. Burschka, and G. Hager. Advances in compu- tational stereo. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 25(8):993–1008, Aug 2003. 3 [4] M. Buehler, K. Iagnemma, and S. Singh, editors. The DARPA Urban Challenge: Autonomous Vehicles in City
  • 8. Figure 7. Single keyframe 3D reconstructions using LSD (purple), elas (blue), SGBM (green) and BM (red) are compared to the point cloud produced by a Velodyne laser scanner. Laser point clouds from several frames are aligned and combined into a denser point cloud. The point clouds generated by the stereo reconstruction methods are cut at the maximum measurement height of the Velodyne scanner. Traffic, George Air Force Base, Victorville, California, USA, volume 56 of Springer STAR, 2009. 1 [5] J. Engel, T. Sch¨ops, and D. Cremers. LSD-SLAM: Large- scale direct monocular SLAM. In European Conference on Computer Vision (ECCV), September 2014. 4, 5 [6] J. Engel, J. St¨uckler, and D. Cremers. Large-scale direct SLAM with stereo cameras. In Int. Conf. on Intelligent Robot Systems (IROS), 2015. 3 [7] J. Engel, J. Sturm, and D. Cremers. Semi-dense visual odom- etry for a monocular camera. In Int. Conf. on Computer Vi- sion (ICCV), 2013. 3 [8] A. Geiger. Are we ready for autonomous driving? the KITTI vision benchmark suite. In Int. Conf. on Computer Vision and Pattern Recognition (CVPR), 2012. 1, 6 [9] A. Geiger, M. Roser, and R. Urtasun. Efficient large-scale stereo matching. In Asian Conference on Computer Vision (ACCV), 2010. 3, 6 [10] A. Geiger, J. Ziegler, and C. Stiller. Stereoscan: Dense 3D reconstruction in real-time. In Proceedings of the IEEE In- telligent Vehicles Symposium, pages 963–968, 2011. 3, 6 [11] H. Hirschmuller. Stereo processing by semiglobal matching and mutual information. IEEE Trans. on PAMI, 30(2):328– 341, 2008. 3, 6 [12] B. Kitt, A. Geiger, and H. Lategahn. Visual odometry based on stereo image sequences with RANSAC-based outlier re- jection scheme. In Intelligent Vehicles Symp. (IV), 2010. 3 [13] G. Klein and D. Murray. Parallel tracking and mapping for small AR workspaces. In Int. Symp. on Mixed and Aug- mented Reality (ISMAR), 2007. 2 [14] K. Konolige and M. Agrawal. Frameslam: From bundle adjustment to real-time visual mapping. Transaction on Robotics, 24(5):1066–1077, Oct. 2008. 2 [15] M. Pollefeys et al. Detailed real-time urban 3D reconstruc- tion from video. Int. J. Comput. Vision (IJCV), 78(2-3):143– 167, 2008. 3 [16] C. Mei, G. Sibley, M. Cummins, P. Newman, and I. Reid. RSLAM: A system for large-scale mapping in constant-time using stereo. Int. J. Comput. Vision (IJCV), 2010. 2 [17] A. N¨uchter. 3D robotic mapping : the simultaneous local- ization and mapping problem with six degrees of freedom. STAR. Springer, Berlin, Heidelberg, 2009. 2 [18] F. Pomerleau, F. Colas, R. Siegwart, and S. Magnenat. Com- paring ICP variants on real-world data sets. Auton. Robots, 34(3):133–148, Apr. 2013. 2 [19] R. Ranftl, S. Gehrig, T. Pock, and H. Bischof. Pushing the limits of stereo using variational stereo estimation. In Intel- ligent Vehicles Symposium, pages 401–407, 2012. 3 [20] D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Interna- tional Journal of Computer Vision, 47:7–42, 2002. 3 [21] G. K. L. Tam, Z.-Q. Cheng, Y.-K. Lai, F. C. Langbein, Y. Liu, D. Marshall, R. R. Martin, X.-F. Sun, and P. L. Rosin. Regis- tration of 3D point clouds and meshes: A survey from rigid to nonrigid. IEEE Transactions on Visualization and Com- puter Graphics, 19(7):1199–1217, 2013. 2 [22] P. W. Theiler and K. Schindler. Automatic registration of ter- restrial laser scanner point clouds using natural planar sur- faces. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, I-3:173–178, 2012. 2 [23] J. Zhang and S. Singh. Loam: Lidar odometry and mapping in real-time. In Robotics: Science and Systems, 2014. 2 [24] J. Zhang and S. Singh. Visual-lidar odometry and mapping: Low-drift, robust, and fast. IEEE Intl. Conf. on Robotics and Automation (ICRA), May 2015. 2