Contenu connexe
Similaire à A review on video inpainting techniques
Similaire à A review on video inpainting techniques (20)
Plus de IAEME Publication
Plus de IAEME Publication (20)
A review on video inpainting techniques
- 1. INTERNATIONALComputer VolumeOF COMPUTER ENGINEERING
International Journal of JOURNAL
6367(Print), ISSN 0976 – 6375(Online)
Engineering and Technology (IJCET), ISSN 0976-
4, Issue 1, January- February (2013), © IAEME
& TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 1, January- February (2013), pp. 203-210
IJCET
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2012): 3.9580 (Calculated by GISI) ©IAEME
www.jifactor.com
A REVIEW ON VIDEO INPAINTING TECHNIQUES
Mrs.B.A.Ahire*1, Prof.Neeta A. Deshpande*2
*
Department of Computer Engineering, M.E. Student,
Matoshri college of Engg. And research Centre, Nasik,
*
Department of Computer Engineering, Associate professor,
Matoshri college of Engg. And Research Centre, Nasik.
1
bhawanaahire@yahoo.com; 2deshpande_neeta@yahoo.com
ABSTRACT
The problem of video completion whose goal is to reconstruct the missing pixels in
the holes created by damage to the video or removal of selected objects is critical to many
applications, such as video repairing, movie post production, etc. The key issues in video
completion are to keep the spatial-temporal coherence, and the faithful inference of pixels. A
lot of researchers have worked in the area of video inpainting. Most of the techniques try to
ensure either spatial consistency or temporal continuity between the frames. But none of them
try to ensure both of them in the same technique with a good quality.
Although the amount of work proposed in video completion is comparatively less as
that of image inpainting, a number of methods have been proposed in the recent years. The
methods can be classified as: Patch-based methods and object-based methods. Patch-based
methods use block-based sampling as well as simultaneous propagation of texture and
structure information as a result of which, computational efficiency is achieved. Hence, the
researchers have extended the similar concept in video inpainting.
The texture synthesis based methods doesn’t contain structural information while,
PDE-based methods leads to blurring artifacts. Patch based methods produces high-quality
effects maintaining consistency of local structures.This paper is based on a survey in the area
of video inpainting.
Keywords: video inpainting, texture synthesis, patch based inpainting
203
- 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME
January
I. INTRODUCTION
Inpainting is the process of reconstructing lost or deteriorated parts of images and
videos. The idea of image inpainting commenced from a very long years back and right from
ainting
the birth of computer vision, researchers are looking for a way to carry out this process
automatically. By applying various techniques, they have achieved promising results, even
when the images containing complicated objects. Subsequently, video inpainting has also
attracted a large number of researchers because of its ability to fix or restore damaged videos.
Video inpainting describes the process of filling the missing/damaged parts of a video videoclip
with visually plausible data so that the viewer’s cannot know if the videoclip is automatically
generated or not. Comparatively, video inpainting has a large number of pixels to be
inpainted and the searching space is much more tremendous. Moreover, not only the spatial
not
consistency but also the temporal continuity between the frames must be ensured [12].
Applying image inpainting techniques directly into video inpaintingwithout taking into
account the temporal factors will ultimately lead to failure because it will make the frames
because
inconsistent with each other. These difficulties make video inpainting a muchmore
challenging problems than image inpainting. Depending upon the way the damaged images
are restored, the techniques are classified into three gro
groups: texture synthesis-based methods,
based
partial difference equation-based methods and patch
based patch-based methods. The texture synthesis
process grows a new image outward from an initial seed i.e. one pixel at a time. Whereas, in
terms of PDE-based approaches, gradi
based gradient direction and the grey-scale values are propagated
scale
from the boundary of a missing region towards the center of the region.Both methods can’t
handle cases of general image inpainting.
The rest of the paper is organized as follows. Survey on PDE-based methods is
based
discussed in sectionA, sectionB concentrates on texture synthesis methods, while sectionC
describes patch based methods. Section III explains object based methods. Section IV
contains concluding remarks. While all the methods are summarized in the table. Block
diagram of the video inpainting technique is shown in Fig. 1.
FIGURE 1 Block diagram of the video inpainting technique
204
- 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME
II. A REVIEW OF VIDEO INPAINTING TECHNIQUES
The problem of automatic video restoration in general, andautomatic object removal
and modification in particular, is beginning to attract the attention of many researchers.
According to the way the images and videos are restored, the methods for video inpainting
are reviewed as in the following chapters.
A. PDE-BASED METHODS
G. Spario et. al. proposed a frame by frame PDE based approach, which extended
image inpainting techniques to video sequences in 1999[1]. The first step in this method is
the formulation of a partial differential equation that propagates information (the Laplacian of
the image) in the direction of the isophotes (edges). The proposed algorithm shows that both
the gradient direction (geometry) and the gray-scale values (photometry) of the image should
be propagated inside the region to be filled in, making clear the need for high-order PDEs in
image processing.Thisalgorithm propagates the necessary information by numerically solving
the PDE
ఋ௬
=∆ୄ I*∆I
ఋ௧
for the image intensity I inside the hole—the region to be inpainted. Here, ∆ୄ denotes the
perpendicular gradient ሺെߜ௬ , ߜ௫ ሻand ∆ is the Laplace operator. The goal is to evolve this
equation to a steady-state solution, where
∆ୄ I*∆I = 0
Thereby ensuring that the information is constant in the direction of the
isophotes.They are capable of completing a damaged image in which thin regions are
missing.
B.TEXTURE SYNTHESIS METHODS
Alexei Efros and Thomas K. Leung proposed a texture synthesis based technique by
non-parametric sampling in Sept 1999[2]. In this method, window size W needs to be
specified. It preserves as much as local structures as possible and produces good results for a
wide variety of synthetic & real world textures. But, the problem is that the automatic
window size selection for the textures as well as the method is slow.
M. Bertalmio et. al. proposed an image inpainting technique in Dec 2001 that involves
filling in part of an image or video using information from the surrounding area[3].
Applications include the restoration of damaged photographs and movies and the removal of
selected objects. In this method, they introduced a class of automated methods for digital
inpainting. The approach uses ideas from classical fluid dynamics to propagate isophote lines
continuously from the exterior into the region to be inpainted. The main idea is to think of the
image intensity as a ‘stream function’ for a two-dimensional incompressible flow. The
Laplacian of the image intensity plays the role of the vorticity of the fluid; it is transported
into the region to be inpainted by a vector field defined by the stream function. The resulting
205
- 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME
algorithm is designed to continue isophotes while matching gradient vectors at the
boundary of the inpainting region. The method is directly based on the Navier-Stokes
equations for fluid dynamics, which has the immediate advantage ofwell-developed
theoretical and numerical results. This isa new approach for introducing ideas from
computational fluid dynamics into problems in computer vision and image analysis.
The future scope is working on video inpainting technique to automatically switch
between structure and texture inpainting in required.
A. Chriminisi et. al. proposed a video inpainting technique in Aug 2005. It
deals with automatically filling space–time holes in video sequences left by the
removal of unwanted objects in a scene [4]. They solved it by using texture synthesis,
filling a hole inwards using three steps iteratively:it selects the most promising
targetpixel at the edge of the hole, finds the source fragment most similar to the
known part of the target’s neighborhood, and then merge source and target fragments
to complete the target neighborhood, reducing the size of the hole. Earlier methods
were slow, due to searching the whole video data for source fragments or completing
holes pixel by pixel; they also produced blurred results due to sampling and
smoothing. For speed, this methodtracksmoving objects, allowing us to use a much
smaller search space when seeking source fragments; it also completes holes fragment
by fragment instead of pixel wise. Fine details are maintained by use of a graph cut
algorithm when merging source and target fragments. Further techniques ensure
temporal consistency of hole filling over successive frames, blurred results due to
sampling and smoothing. They wish to extend the work to more complicate and
dynamic scenes, involving, for example, complex camera and object motions in three
dimensions.
C.PATCH-BASED METHODS
Y. Jia et. al. [5] proposed an exemplar-based texture synthesis based method
for simultaneous propagation of structure and texture information in Sept 2004.
Computational efficiency is achieved by a block-based sampling process. Robustness
with respect to the shape of the manually selected target region is also demonstrated.
This work needs to be extended to handle the accurate propagation of curved
structures in still photographs & removing objects from video.
Y. Shen et. al. proposed a novel technique to fill in missing background and
moving foreground of a video captured by a static or moving camera in Aug 2006.
Different from previous efforts which are typically based on processing in the 3D data
volume, they slice the volume along the motion manifold of the moving object, and
therefore reduce the search space from 3D to 2D, while still preserve the spatial and
temporal coherence [7]. In addition to the computational efficiency, based on
geometric video analysis, the proposed approach is also able to handle real videos
under perspective distortion, as well as common camera motions, such as panning,
206
- 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME
tilting, and zooming. The technique needs to be extended to some more general
cameraand foreground motion.
K. A. Patwardhan et. al. [8] proposed a framework in Feb 2007 for inpainting
missing parts of a video sequence recorded with a moving or stationary camera is
presented in this work. The region to be inpainted is general: It may be still or moving,
in the background or in the foreground, it may occlude one object and be occluded by
some other object. The algorithm consists of a simple preprocessing stage and two
steps of video inpainting. The proposed framework has several advantages over state-
of-the-art algorithms that deal with similar types of data and constraints. It permits
some camera motion, is simple to implement, fast, does not require statistical models
of background nor foreground, works well in the presence of rich and cluttered
backgrounds, and the results show that there is no visible blurring or motion artifacts.
This algorithm does not address complete occlusion of the moving object. The
technique needs to be extended towards adapting current technique to such scenarios.
Also to be addressed are the automated selection of parameters (such as patch size,
mosaic size, etc.), and dealing with illumination changes along the sequence.
Y. Wexler et.al. [9] came up with a new framework in March 2007 for the
completion of missing information based on local structures. It poses the task of
completion as a global optimization problem with a well-defined objective function
and derives a new algorithm to optimize it. In this technique, only low resolution
videos can be considered and multi scale nature of the solution may lead to blurring
results due to sampling & smoothing.
T. Shih et. al. [10] designed a technique that automatically restores or
completes removed areas in an image in March 2009. When dealing with a similar
problem in video, not only should a robust tracking algorithm be used, but the
temporal continuity among video frames also needs to be taken into account,
especially when the video has camera motions such as zooming and tilting. In this
method, an exemplar-based image inpainting algorithm is extended by incorporating
an improved patch matching strategy for video inpainting. In our proposed algorithm,
different motion segments with different temporal continuity call for different
candidate patches, which are used to inpaint holes after a selected video object is
tracked and removed. The proposed new video inpainting algorithm produces very
few “ghost shadows,” which were produced by most image inpainting algorithms
directly applied on video. Shadows caused by fixed light sources can be removed by
other techniques. However, it is possible to enlarge the target to some extent such that
the shadow is covered. The challenge is on block matching, which should allow a
block to match another block which is scaled, rotated, or skewed. However, the degree
of scaling and rotation is hard to predict based on the speed of zooming and rotation of
camera. In addition, how to select continuous blocks in a continuous area to inpaint a
target region is another challenging issue.
207
- 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME
TABLE 1 Summary of Video Inpainting Techniques
Sr. Year Authors Technique Features of the current techniques Future scope mentioned in the paper
No.
PDE-Based It propagates the information from the They are only capable of completing a
1. 1999 G.Spario et. al. boundary of the region to the center of damaged image in which thin regions are
method
that region. missing.
Texture Preserves as much local structures a
Sept Alexei Efros & synthesis by possible & produces good results for a 1.Automatic window size selection for
2. 1999 Thomas K. Leung non-parametric wide variety of synthetic & real world texture
sampling textures, parameter window size W 2.Slow
needs to be specified
Texture No other information is required to be Working on video inpainting technique
3. Dec M.Bertalmio et. synthesis, specified only the region to be inpainted based on this framework to automatically
2001 al. Image intensity be marked by the user switch between texture & structure
function inpainting
A novel, efficient & visually pleasing The work to be extended to more
Sept A. Criminisi et. Texture approach to video inpainting, Temporal complicated and dynamic scenes
4. 2004 al. synthesis continuity is preserved, Location of the including complex camera and object
method holes from where the object is to be motion in three dimensions.
removed.
A novel, efficient & visually pleasing The work to be extended to more
Aug Y. T. Jia et. al. Texture approach to video inpainting, Temporal complicated and dynamic scenes
5. 2005 synthesis continuity is preserved, Location of the including complex camera and object
method holes from where the object is to be motion in three dimensions.
removed.
The system works for a subclass of 1. Synthesized objects don’t have a real
May J. Jia et. al. Object based camera motions i.e. rotation about a trajectory as well as only textures are
6. 2006 approach fixed point. The restored video preserves allowed in the background.
the same structure & illumination, 2. Running time needs to be improved.
Temporal consistency is preserved
A novel technique to fill in missing
Aug Patch based background &moving foreground of a More general camera and foreground
7. 2006 Y. Shen et. al. method video. Spatial & temporal coherence is motion needs to be considered.
achieved, as well as periodic motion
patterns are well maintained.
1. Doesn’t address the complete
Patch based It combines motion information
occlusion of moving object.
8. Feb K. A. Patwardhan method Performance of the system is improved
2. Automatic selection of parameters
2007 et. al. Parameters : as the search space is reduced. Fast &
such as patch size, mosaic size, etc.
Patch size, simple.
3. Lackof temporal continuity leading to
mosaic size
flickering artifacts.
1.Only low resolution videos can be
9. Mar Y. Wexler et. al. Patch based Space time completion of large space considered
2007 method “holes” in the video sequences of 2. Multi scale nature of the solution may
complex dynamic scenes. lead to blurring results due to sampling
& smoothing.
The system can deal with different Shadows caused by fixed light sources
10. Mar T. Shih et. al. Patch based camera motion as well as panning & needs to be removed.
2009 inpainting zooming of several types of video clip The challenge is block matching.
An efficient object-based video
1. If the number of postures in the
inpainting technique for dealing with
database is not sufficient, the inpainting
videos recorded by a stationary camera.
result could be unsatisfactory.
Oct S.-C.S. Cheung Object based A fixed size sliding window is defined
2. The method does not provide a
11. 2006 et. al. inpainting to include a set of continuous object
systematic way to identify a good filling
templates. The authors also propose a
position for an object template. This may
similarity function that measures the
cause visually annoying artifacts if the
similarity between two sets of
chosen position is inappropriate.
continuous object templates.
A novel framework for video
Apr Object based completion by reducing the problem of 1.Non-linearity of occluded object
12. 2011 C. H. Ling et. al. inpainting insufficient postures. The method can 2.Variable illumination problem
maintain spatial consistency as well as 3. To synthesize complex postures.
temporal continuity of an object
simultaneously.
208
- 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME
III. OBJECT-BASED METHODS
J. Jia et. al. [6] proposed a complete system in March 2006 that is capable of synthesizing a
large number of pixels that are missing due to occlusion or damage in an uncaliberated input video.
These missing pixels may correspond to the static background or cyclic motions of the captured
scene. This system employs user-assisted video layer segmentation, while the main processing in
video repair is fully automatic. The input video is first decomposed into the color and illumination
videos. The necessary temporal consistency is maintained by tensor voting in the spatio-temporal
domain. Missing colors and illumination of the background are synthesized by applying image
repairing. Finally, the occluded motions are inferred by spatio-temporal alignment of collected
samples at multiple scales. Since this movel does not capture self shadows or moving shadows, we
cannot repair the shadow of a damaged movel. Another limitation is on the incorrect lighting on a
repaired movel. Currently, the techniques do not relight a repaired movel. In the future, the method
needs to be extended into the movels so that better lighting and shadow on the repaired movel scan be
handled. The running time of the system needs to be improved.
Cheung et al. [11] proposed an efficient object-based video inpainting technique in Oct 2006
for dealing with videos recorded by a stationary camera. To inpaint the background, they use the
background pixels that are most compatible with the current frame to fill a missing region; and to
inpaint the foreground, they utilize all available object templates. A fixed size sliding window is
defined to include a set of continuous object templates. The authors also propose a similarity function
that measures the similarity between two sets of continuous object templates. For each missing object,
a sliding window that covers the missing object and its neighboring objects’ templates is used to find
the most similar object template. The corresponding object template is then used to replace the
missing object. However, if the number of postures in the database is not sufficient, the inpainting
result could be unsatisfactory. Moreover, the method does not provide a systematic way to identify a
good filling position for an object template. This may cause visually annoying artifacts if the chosen
position is inappropriate.
Chih-Hung Ling et. al. came up with a novel framework for object completion in a video. To
complete an occluded object, the proposedmethod first samples a 3-D volume of the video into
directional spatio-temporal slices, and performs patch-based image inpaintingto complete the partially
damaged object trajectories in the 2-D slices [12]. The completed slices are then combined to obtain a
sequence of virtual contours of the damaged object. Next, a posture sequence retrieval technique is
applied to the virtual contours to retrieve the most similar sequence of object postures in the available
non-occluded postures. Key-posture selection andindexing are used to reduce the complexity of
posture sequence retrieval. This method also proposed a synthetic posture generation scheme that
enriches the collection of postures so as to reduce the effect of insufficient postures.
IV. CONCLUSION
Patch-based methods often have difficulty handling spatial consistency and temporal
continuity problems. For example, the previous approaches proposed can only maintain spatial
consistency or temporal continuity; they cannot solve both problems simultaneously. On the other
hand, some of the proposed approaches can deal with spatial and temporal information
simultaneously, but they suffer from the over-smoothing artifacts problem. In addition, patch-based
approaches often generate inpainting errors in the foreground. As a result, many researchers have
focused on object-based approaches, which usually generate high-quality visual results. Even so,
some difficult issues still need to be addressed; for example, the unrealistictrajectoryproblem and the
inaccurate representation problem caused by an insufficient number of postures in the database.
209
- 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME
REFERENCES
[1] M. Bertalmio, G. Sapiro, V. Caselles, and C. Ballester, “Image inpainting,” in Proc. ACM
SIGGRAPH, 2000, pp. 417–424.
[2] A. Efros and T. Leung, “Texture synthesis by non-parametric sampling,”in Proc. IEEE
Conf. Comput. Vis., 1999, vol. 2, pp. 1033–1038.
[3] M. Bertalmio, A. L. Bertozzi, and G. Sapiro, “Navier-stokes, fluid dynamics, and image
and video inpainting,” in Proc. IEEE Conf. Comput.Vis. Pattern Recognit. Kauai,, HI, Dec.
2001, pp. 355–362.
[4] A. Criminisi, P. Perez, and K. Toyama, “Region filling and object removal by exemplar
based image inpainting,” IEEE Trans. Image Process., vol. 13, no. 9, pp. 1200–1212, Sep.
2004
[5] Y. T. Jia, S. M. Hu, and R. R. Martin, “Video completion using tracking and fragment
merging,” Visual Comput., vol. 21, no. 8–10,pp. 601–610, Aug. 2005.
[6] J. Jia, Y.-W.Tai, T.-P.Wu, and C.-K. Tang, “Video repairing under variable illumination
using cyclic motions,” IEEE Trans. Pattern Anal.Mach. Intell, vol. 28, no. 5, pp. 832–839,
May 2006.
[7] Y. Shen, F. Lu, X. Cao, and H. Foroosh, “Video completion for perspective camera under
constrained motion,” in Proc. IEEE Conf. Pattern Recognit., Hong Kong, China, Aug. 2006,
pp. 63–66.
[8] K. A. Patwardhan, G. Sapiro, and M. Bertalmío, “Video inpainting under constrained
camera motion,” IEEE Trans. Image Process., vol.16, no. 2, pp. 545–553, Feb. 2007.
[9] Y. Wexler, E. Shechtman, and M. Irani, “Space-time completion of video,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 29, no. 3, pp. 1–14, Mar. 2007.
[10] T. K. Shih, N. C. Tang, and J.-N. Hwang, “Exemplar-based video inpainting without
ghost shadow artifacts by maintaining temporal continuity,”IEEE Trans. Circuits Syst. Video
Technol., vol. 19, no. 3, pp.347–360, Mar. 2009.
[11] S.-C. S. Cheung, J. Zhao, and M. V. Venkatesh, “Efficient object-based video
inpainting,” in Proc. IEEE Conf. Image Process., Atlanta, GA, Oct. 2006, pp. 705–708.
[12] Chih-Hung Ling, Chia-Wen Lin, Senior Member, IEEE, Chih-Wen Su, Yong-Sheng
Chen, Member, IEEE, and Hong-Yuan Mark Liao, Senior Member, IEEE,”Virtual Contour
Guided Video Object InpaintingUsing Posture Mapping and Retrieval”,IEEE Trans. On
Multimedia, vol. 13, no. 2, April 2011.
[13] Abhishek Choubey , Omprakash Firke and Bahgwan Swaroop Sharma, “Rotation And
Illumination Invariant Image Retrieval Using Texture Features” International journal of
Electronics and Communication Engineering &Technology (IJECET), Volume 3, Issue 2,
2012, pp. 48 - 55, Published by IAEME.
[14] Ms.Shaikh Shabnam Shafi Ahmed, Dr.Shah Aqueel Ahmed and Mr.Sayyad Farook
Bashir, “Fast Algorithm For Video Quality Enhancing Using Vision-Based Hand Gesture
Recognition” International journal of Computer Engineering & Technology (IJCET),
Volume 3, Issue 3, 2012, pp. 501 - 509, Published by IAEME.
[15] Reeja S R and Dr. N. P Kavya, “Motion Detection For Video Denoising – The State Of
Art and The Challenges” International journal of Computer Engineering & Technology
(IJCET), Volume 3, Issue 2, 2012, pp. 518 - 525, Published by IAEME.
210