Multi-hypothesis projection-based shift estimation for sweeping panorama reconstruction
1. MULTI-HYPOTHESIS PROJECTION-BASED SHIFT ESTIMATION FOR SWEEPING
PANORAMA RECONSTRUCTION
Tuan Q. Pham† Philip Cox
Canon Information Systems Research Australia (CiSRA)
1 Thomas Holt drive, North Ryde, NSW 2113, Australia.
†
tuan.pham@cisra.canon.com.au
ABSTRACT one that produces the highest two-dimensional (2D) Normal-
Global alignment is an important step in many imaging appli- ized Cross-Correlation (NCC) score.
cations for hand-held cameras. We propose a fast algorithm Various enhancements are added to the basic alignment
that can handle large global translations in either x- or y- algorithm above to improve its performance. The input im-
direction from a pan-tilt camera. The algorithm estimates the ages are subsampled prior to analysis to improve speed and
translations in x- and y-direction separately using 1D corre- noise robustness. Shift estimation is performed over multiple
lation of the absolute gradient projections along the x- and y- scales to rule out incorrect shifts due to strong correlation of
axis. Synthetic experiments show that the proposed multiple texture at certain frequencies. When appropriate, the images
shift hypotheses approach is robust to translations up to 90% are automatically cropped to improve overlap before gradient
of the image width, whereas other projection-based alignment projection.
methods can handle up to 25% only. The proposed approach Given the alignment between consecutive frames from a
can also handle larger rotations than other methods. The ro- panning camera, a panoramic image can be constructed dur-
bustness of the alignment to non-purely translational image ing image capturing. Overlapping images are stitched along
motion and moving objects in the scene is demonstrated by a an irregular seam that avoids cutting through moving objects.
sweeping panorama application on live images from a Canon This seam also minimizes an intensity mismatch of the two
camera with minimal user interaction. images on either side of the seam. Image blending is fi-
nally used to eliminate any remaining intensity mismatch af-
Index Terms— shift estimation, image projection, sweep ter stitching.
panorama
1.1. Literature review
1. INTRODUCTION
Numerous solutions are available for translational image
Global alignment is an important task for many imaging ap- alignment. Amongst them, a correlation-based method is
plications such as image quality measurement, video stabi- popular for its robustness. However, 2D correlation is costly
lization, and moving object detection. For applications on for large images. 2D phase correlation, for example, requires
embedded devices, the alignment needs to be both accurate O(N 2 logN 2 ) computations for an N×N image using Fast
and fast. Robustness against difficult imaging conditions such Fourier Transform (FFT). The computational complexity can
as low light, camera motion blur or motion in the scene is be reduced to O(N logN ) if the correlation is performed on
also desirable. In this paper, we describe a low-cost global 1D image projections only [1]. This projection-based align-
shift estimation algorithm that addresses these needs. The ment algorithm is suitable for images with strong gradient
algorithm’s robustness against difficult imaging conditions structures along the projection axes. This assumption holds
and its real-time performance is demonstrated on a sweep- for most indoor and natural landscape scenes.
ing panorama application using live images from a hand-held Adams et al. reported a real-time projection-based align-
Canon camera. ment of a 320×240 viewfinder video stream at 30 frames per
In particular, our global alignment algorithm performs second on standard smartphone hardware [2]. Their algorithm
separable shift estimation using one-dimensional (1D) pro- uses projections of the image’s gradient energy along four di-
jections of the absolute gradient images along the sampling rections. The use of image gradient rather than intensity im-
axes. For each image dimension, multiple shift hypotheses proves alignment robustness against local lighting changes.
are maintained to avoid misdetection due to non-purely trans- Despite their speed advantage, previous projection-based
lational motion, independent moving objects, or distractions alignment algorithms have a number of limitations. First, the
from the non-overlapping areas. The final shift estimate is the images must have a substantial overlap (e.g., more than 90%
2. Fig. 1. Simultaneous shift estimation over multiple scales.
of the frame area according to [2]) for the alignment to work.
This is because image data from non-overlapping areas cor-
rupt the image projections, eventually breaking their correla-
tion. Second, any deviation from a pure translation is likely to
break the alignment. The viewfinder algorithm [2], for exam-
ple, claims to handle a maximum of 1◦ rotation only. Third,
previous gradient projection algorithms are not robust to low
lighting condition. The weak gradient energy of dark cur-
rent noise at every pixel often overpowers the stronger but
sparse gradient of the scene structures when integrated over Fig. 2. Multiple shift hypotheses from gradient projection
a whole image row or column. For a similar reason, gradi- correlation.
ent projection algorithms are also not robust against highly
textured scene such as carpet or foliage.
the base of the pyramid. Block summing is used to subsample
the images for efficiency. Because block summing produces
1.2. Structure of this paper
slightly more aliased images compared to Gaussian subsam-
In this paper, we present a fast global alignment algorithm pling, some subpixel alignment error is expected. However,
with application in sweeping panorama reconstruction. Sec- the alignment error can be corrected by subpixel peak inter-
tion 2 presents the new multiple hypotheses global alignment polation of the NCC score at the base pyramid level.
algorithm using gradient projections. Section 3 describes a
software prototype for sweeping panorama stitching using the 2.1. Multi-hypothesis gradient projection correlation
new alignment algorithm. Section 4 evaluates the alignment
and panorama stitching algorithms. Section 5 concludes the At each pyramid level, the translation between two input im-
paper. ages I1 , I2 is estimated by a multi-hypothesis projection-
based shift estimation algorithm described in Fig. 2. Im-
age gradients |∂I1 /∂x| and |∂I1 /∂y| are estimated using fi-
2. PROJECTION-BASED IMAGE ALIGNMENT nite difference. The magnitude of the x-gradient image is
then integrated along image columns to obtain the x-gradient
We propose a projection-based shift estimation algorithm that projection: px = |∂I1 /∂x|dy. The y-gradient projec-
1
is robust to large translation, small rotation and perspective tion is similarly obtained from the y-gradient image: py =
1
change, noise and texture. The global shift is computed over |∂I1 /∂y|dx. The corresponding gradient projections from
multiple scales as shown in Fig. 1. The input images are first the two images are correlated to find multiple possible trans-
subsampled to a manageable size to reduce noise and compu- lations in either dimension. Cross-correlation of zero-padded
tation. A dyadic image pyramid is then constructed for each zero-mean signals is used instead of a brute-force search for
image [3]. At each pyramid level, a shift estimate is obtained a correlation peak in [2] to handle a larger range of possible
independently using the new projection-based image align- motion. Multiple 2D shift hypotheses are derived from all
ment algorithm described in Section 2.1. The shift candidate combinations of the 1D shift hypotheses in both dimensions.
with the highest 2D NCC score is the final shift estimate. A 2D NCC score is obtained for each of these 2D shift hy-
Aligning two images at multiple subsampled resolutions potheses from the overlapping area of the input images dic-
and taking the best solution is more robust than alignment at tated by the shift. The shift hypothesis with the highest 2D
a single original resolution for a number of reasons. First, NCC score is then refined to a subpixel accuracy by an ellip-
noise is substantially reduced by subsampling while the gra- tical paraboloid fit over a 3×3 neighborhood around the 2D
dient information of the scene is largely preserved. Second, NCC peak.
subsampling reduces texture variation and its contribution to Fig. 3 shows a block diagram with all steps and possible
the gradient projections. execution paths of our multi-hypothesis projection-based shift
Too much subsampling, however, eliminates useful align- estimation algorithm. The efficiency of the new algorithm
ment details. To achieve an optimal gain in signal-to-noise ra- comes from two improvements over [2] in steps 1 and 2:
tio, we align the images over three successively halved pyra-
mid levels starting from an image size around 2562 pixels at 1. The input images are subsampled to a manageable size
3. Frame 1 panorama from
Frame 21 Frame 31
Frame 11 48 panning images (6 actually used) Frame 41
50
100
150
Fig. 3. Flow chart describing the proposed projection-based 200
shift estimation algorithm. 250
300
350
100 200 300 400 500 600 700 800 900 1000 1100
(e.g., 256×256 pixels) before alignment; Fig. 4. Sweeping panorama (1119 × 353) in the presence of
2. The 2D translation is estimated separately in x- and y- moving objects and perspective image motion (seams shown
dimension (rather than in four orientations as in [2]) us- in yellow).
ing projections of the images directional gradient mag-
nitude (rather than the gradient energy as in [2]) onto
robust compositing method is to segment the mosaic and use
the corresponding axis.
a single image per segment [6]. For sweeping panorama, the
The algorithm is robust to large translations thanks to a images undergo a translation mainly in one direction. Two
new multiple shift hypotheses algorithm in steps 3 to 6: consecutive images can therefore be joined together along a
seam that minimizes the intensity mismatch between adjacent
3. For each pair of 1D projections, k shift hypotheses are segments [7]. Laplacian pyramid fusion [3] can then be used
selected from the k strongest 1D NCC peaks (e.g., k=5) to smooth out any remaining seam artefacts.
using non-maximal suppression [4];
To demonstrate our alignment technology on realistic
4. Any shift candidate with a dominant 1D NCC score, scenes, we built a standalone application that stitches live im-
which is higher than 1.5-time the second highest score ages from a panning camera. The images are automatically
along the same dimension, is the final shift for that di- transferred from a Canon 40D camera to a PC. A screen-
mension; shot of our demo application is given in Fig. 4, where the
panorama was reconstructed from six panning images in real-
5. If only one dimension has a dominant NCC score, the time.
two images are cropped to an overlapping area along For efficiency, we do not use all captured images for
this dimension before returning to step 2; panorama stitching. The images whose fields of view are
6. If there is no shift hypothesis with a dominant 1D NCC covered by neighbouring frames can be skipped to reduce
score, k 2 2D shift hypotheses are constructed from the the seam computations. All incoming frames still need to be
1D shift hypotheses (see Fig. 2). The shift candidate aligned to determine their overlapping areas. The first frame
with the highest 2D NCC score is the final 2D shift. is always used in the panorama. A frame is skipped if it over-
laps more than 75% with the last used frame and if the next
Note that our algorithm terminates at step 4 if two images frame also overlaps more than 25% with the last used frame.
have substantial overlap. Step 5 is executed if there is a large The second condition ensures no coverage gap is created by
shift in only one dimension. Step 6 is the most expensive part removing a frame. These overlapping parameters can be in-
because it requires the computation of k 2 2D NCC scores. creased to encourage more frames to be used during stitching.
Fortunately, for a sweeping panorama application, the motion Fig. 4 illustrates an example with this default overlapping pa-
is mainly one-dimensional. As a result, most of the examples rameter where only four out of six captured frames are needed
in this paper branch to step 5, which requires significantly to construct a panoramic image.
fewer 2D NCC score computations to find the best translation. Our software prototype automatically determines the
sweep direction from the alignment information. There is
3. SEAMLESS PANORAMA STITCHING no need for the user to select the direction, as required in
some consumer cameras. Fig. 5b shows an example of a ver-
Using the alignment algorithm described in the previous sec- tical panorama constructed by our system from ten images in
tion, a panning image sequence can be combined to form a Fig. 5a. The output image is a good reproduction of the scene
panoramic image. If the alignment is accurate to a subpixel despite few horizontal or vertical structures in the scene, light-
level, frame averaging can be used for image composition [5]. ing change due to camera auto-gain, and texture of carpet on
However, subpixel alignment is difficult for images captured the floor. Another example on automatic sweep direction de-
by a moving camera with moving objects in the scene. A more tection can be seen in Fig. 9, where the camera was panned
4. Fig. 7. Estimated shifts for image pairs undergoing a syn-
(a) 10 input fra mes (512×340) (b) panorama (543×1330) thetic horizontal shift.
Fig. 5. Vertical sweeping panorama produced by our system.
Matlab. For each available image size, an average runtime
and its standard deviation are plotted as error bars in Fig. 6.
Runtime varies even for the same image size due to different
content overlap. A line is fit to the data points to predict the
runtime of each algorithm for an arbitrary image size. All al-
gorithms show a linear run-time performance with respect to
the number of input pixels. 2D correlation is the slowest algo-
rithm. Its floating-point FFT operation also triggers an out-of-
memory error for images larger than ten Mega Pixels (MP).
Our algorithm runs slightly faster than that of Adams et al.
because ours does not have the corner detection and match-
ing steps. The red line fit in Fig. 6 shows that it takes us less
Fig. 6. Shift estimation run time set out against image size for than 0.05 of a second in Matlab to align a 1 MP image pair
three algorithms. and roughly 0.1 second to align an 8 MP image pair. As the
image size gets larger, the major part of the run-time is spent
from right to left instead of the traditional left to right motion on image subsampling, which can be implemented more effi-
as in Fig. 4. ciently in hardware using CCD binning.
To measure the robustness of our projection-based align-
4. EVALUATION ment algorithm against large translation, we performed a syn-
thetic shift experiment. Two 512×340 images were cropped
We first present an evaluation of our projection-based shift es- from the panoramic image in Fig. 4 such that they are re-
timation, followed by results on seamless panorama stitching. lated by a purely horizontal translation, which ranges from
1 to 500 pixels. The estimated shifts [tx ty ] are plotted in
Fig. 7 for three algorithms: 2D correlation, viewfinder align-
4.1. Shift estimation
ment, and this paper s. Both 2D correlation and viewfinder
We compare our multi-hypothesis projection-based shift esti- alignment fail to estimate shifts larger than 128 pixels (i.e.
mation algorithm against an FFT-based 2D correlation and tx > 25% of image width). Our multi-hypothesis algorithm,
the viewfinder alignment algorithm [2]. All three algo- on the other hand, estimates both shift components correctly
rithms were implemented in Matlab version R2010b. For the for a synthetic translation up to 456 pixels (i.e. 90% of image
viewfinder alignment algorithm, the images were subsampled width). As suggested by the 2D correlation subplot on the top
to approximately 320×240 pixels to match the viewfinder res- row of Fig. 7, the strongest correlation peak does not always
olution in [2]. Harris corner detection followed by nearest correspond to the true shift. Large non-overlapping areas can
neighbour corner matching was used to correct for small ro- alter the correlation surface, leading to a sudden switch of the
tation and scale change as described in [2]. global peak to a different location. This sudden change in the
We applied the three shift estimators to panning image global correlation peak corresponds to the sudden jumps of
pairs of different sizes and recorded the execution time in the tx and ty curves in the 2D correlation subplot.
5. small image rotation are further described by the RMSEs in
Table 2. Within a ±1◦ rotation range, Adams et al. is the most
accurate method, closely followed by this paper. Both achieve
subpixel accuracy. For any larger rotation range, our algo-
rithm is the most accurate. We consistently produce less than
2-pixel alignment error for rotation up to 5◦ . Adams et al., on
the other hand, fail to align images with more than 3◦ rotation.
4.2. Panorama stitching
We demonstrate the accuracy of our multi-hypothesis
projection-based shift estimation on a sweeping panorama ap-
plication. Five images on the top row of Fig. 4 come from a
sequence of 48 images captured by a hand-held camera. Due
Fig. 8. Estimated shifts for image pairs undergoing a small to a panning motion of the camera, the input images undergo a
synthetic rotation. horizontal translation mainly. The translations are calculated
between consecutive image pairs using the alignment algo-
rithm presented in Section 2. Six frames (1,12,22,33,43,48)
The average accuracy of the estimated shifts in Fig. 7 is with sufficient content overlap are automatically selected for
tabulated in Table 1. We measured the Root Mean Squared panorama stitching. The selected frames are stitched together
Errors (RMSE) of the estimated shifts within two ground- along a set of irregular seams (shown as yellow lines in the
truth translation intervals. The first interval (1 ≤ tx ≤ 128) is panorama).
where all three algorithms achieve subpixel accuracy. Within Fig. 4 demonstrates our solution s robustness to moving
this interval, the viewfinder alignment algorithm is the most objects and non-purely translational motion. Because the in-
accurate and this paper s is the least accurate. The second in- tensity difference across the seams is minimized, the stitched
terval covers a larger range of shifts (1 ≤ tx ≤ 456) and this image appears seamless. The seams do not cut through mov-
is when all other algorithms fail. Within this larger motion ing objects such as the cars on the road. However, one of
range, our algorithm produces an average of 2-pixel align- these cars appears multiple times in the panorama as it moves
ment error for horizontal translation up to 90% of the image through the scene during image acquisition. Another visible
width. artefact is the bending of the balcony wall close to the camera.
We also tested the robustness of our shift estimation al- This geometric distortion is due to the approximation of a full
gorithm against small image rotation. Fig. 8 plots the esti- 3D projective transformation of the images by a simple 2D
mated shifts by the same three alignment algorithms on purely translation. Despite these artefacts, the produced panorama is
rotated image pairs. The images are generated from frame a plausible representation of the scene.
1 of the image sequence in Fig. 4 by a rotation, followed Our global alignment algorithm is also robust to motion
by central cropping to 276×448 pixels to remove the miss- blur. An example of a panning sequence with severe motion
ing image boundary. Under zero translation, the viewfinder blur is shown on the top row of Fig. 9. Because multiple 1D
alignment algorithm is robust up to 3◦ rotation. Outside shift hypotheses are kept, the correct 2D shifts are success-
this ±3◦ rotation range, however, the viewfinder alignment fully detected, leading to a good panorama reconstruction on
algorithm produces unreliably large shift estimation errors. the bottom row of Fig. 9. Note that the output panorama could
Note that the middle subplot has a 10-time larger vertical axis have been improved further using motion blur deconvolution.
limit compared to the other two subplots. Our algorithm per- However, deconvolution is out of the scope of this paper.
forms equally well to that of Adams et al. for small rotation More panoramas reconstructed by our system are given
(|θ| < 3◦ ). For larger rotation, the error of our alignment in Fig. 10. Our algorithm works well outdoors (Fig. 10a)
increases only gradually, reaching 10-pixel misalignment for because motion of distant scenes can be approximated by a
10◦ rotation.
The performances of the three alignment algorithms under
Table 2. RMSE of estimated shifts under small rotation
Table 1. RMSE of estimated shifts under large translation Correlation Adams et al. This paper
Correlation Adams et al. This paper −1◦ ≤ θ ≤ 1◦ 1.070 0.673 0.737
1 ≤ tx ≤ 128 0.118 0.083 0.420 −3◦ ≤ θ ≤ 3◦ 3.212 1.684 1.310
1 ≤ tx ≤ 456 278.444 279.549 2.281 −5◦ ≤ θ ≤ 5◦ 5.481 141.555 1.679
6. Frame 11 Frame 9 Frame 6 Frame 3
sweeping panorama from 12 panning images (9 actually used)
Frame 0
(a) motion trail of a moving car
200
400
600
500 1000 1500 2000 2500 3000 (c) over-exposed
Fig. 9. Seamless panorama reconstruction under motion blur (b) ripples due to unstable sweeping motion
(output size is 3456×704).
Fig. 11. Some panoramas produced by a consumer camera.
5. CONCLUSION
We have presented a new projection-based shift estimation
algorithm using multiple shift hypothesis testing. Our shift
(a) Outdoor panorama (8448×1428) from 14 images
estimation algorithm is fast and it can handle large image
translations in either x- or y-direction. The robustness of
the algorithm in real-life situations is demonstrated using a
sweeping panorama stitching application. Our alignment al-
(b) 360◦ panorama (4448×496) from a PTZ camera
gorithm is found to be robust against small perspective change
due to camera motion. It is also robust against motion blur
and moving objects in the scene. We have presented a demo
application for live panorama stitching from a Canon cam-
era. The panorama stitching solution comprises of a multi-
(c) 180◦ panorama (4000×704) of a busy shopping centre hypothesis projection-based image alignment step, an irregu-
Fig. 10. Sweeping panoramas constructed by our system. lar seam stitching step and an optional image blending step.
6. ACKNOWLEDGMENT
translation. Projective distortions only appear when there is The authors would like to thank Ankit Mohan from Canon
significant depth difference in the scene. The 360◦ indoor USA R&D and Edouard Francois from Canon Research
panorama in Fig. 10b, for example, shows bending of linear France for their help to improve this paper’s presentation.
structures due to this perspective effect. These distortions are
unavoidable for a wide-angle view because the panorama ef- 7. REFERENCES
fectively lies on a cylindrical surface, whereas each input im-
age lies on a different imaging plane. Finally, an 180◦ view [1] S. Alliney and C. Morandi, “Digital image registration using
of a busy shopping centre is presented in Fig. 10c. The recon- projections,” PAMI, 8(2):222–233, 1986.
structed panorama captures many people in motion, none of
[2] A. Adams, N. Gelfand, and K. Pulli, “Viewfinder alignment,”
them are cut by the hidden seams. Comput. Graph. Forum, 27(2):597–606, 2008.
For comparison purposes, we captured some panoramic [3] E. H. Adelson, C. H. Anderson, J. R. Bergen, P. J. Burt, and
images using a consumer camera available on the market. J. M. Ogden, “Pyramid method in image processing,” RCA
Different from our technology, which stitches as few frames Eng., 29(6):33–41, 1984.
as possible along some irregular seams, this camera joins as
many frames as it captures along straight vertical seams. This [4] T. Q. Pham, “Non-maximum suppression using fewer than two
comparisons per pixel,” in Proc. of ACIVS, 2010, pp. 438–451.
strip-based stitching algorithm is prone to motion artefacts
such as the motion trail of the car in Fig. 11a. The thin strip [5] H.-Y. Shum and R. Szeliski, Construction of panoramic mosaics
approach is also not robust to jittered camera motion. Fig. 11b with global and local alignment, IJCV, 36(2):101–130, 2000.
shows some jitter artefacts of a whiteboard and a nearby win-
[6] J. Davis, “Mosaics of scenes with moving objects,” in Proc. of
dow due to an uneven panning motion. The top drawer of the CVPR, 1998, pp. 354–360.
vertical panorama in Fig. 11c also looks distorted. Our solu-
tion does not suffer from jittered artefacts because the images [7] S. Avidan and A. Shamir, “Seam carving for content-aware
are aligned in both directions before fusion. image resizing,” in Proc. of SIGGRAPH, 2007.