SlideShare une entreprise Scribd logo
1  sur  12
Télécharger pour lire hors ligne
Dhinaharan Nagamalai et al. (Eds) : ACITY, WiMoN, CSIA, AIAA, DPPR, NECO, InWeS - 2014
pp. 307–318, 2014. © CS & IT-CSCP 2014 DOI : 10.5121/csit.2014.4531
AN UNSUPERVISED METHOD FOR REAL
TIME VIDEO SHOT SEGMENTATION
Hrishikesh Bhaumik1
, Siddhartha Bhattacharyya2
and Susanta
Chakraborty3
1,2
Department of Information Technology,RCC
Institute of Information Technology, Kolkata, India
hbhaumik@gmail.com, dr.siddhartha.bhattacharyya@gmail.com
3
Department of Computer Science and Technology,
Bengal Engineering and Science University, Shibpur, Howrah, India
susanta.chak@gmail.com
ABSTRACT
Segmentation of a video into its constituent shots is a fundamental task for indexing and
analysis in content based video retrieval systems. In this paper, a novel approach is presented
for accurately detecting the shot boundaries in real time video streams, without any a priori
knowledge about the content or type of the video. The edges of objects in a video frame are
detected using a spatio-temporal fuzzy hostility index. These edges are treated as features of the
frame. The correlation between the features is computed for successive incoming frames of the
video. The mean and standard deviation of the correlation values obtained are updated as new
video frames are streamed in. This is done to dynamically set the threshold value using the
three-sigma rule for detecting the shot boundary (abrupt transition). A look back mechanism
forms an important part of the proposed algorithm to detect any missed hard cuts, especially
during the start of the video. The proposed method is shown to be applicable for online video
analysis and summarization systems. In an experimental evaluation on a heterogeneous test set,
consisting of videos from sports, movie songs and music albums, the proposed method achieves
99.24% recall and 99.35% precision on the average.
KEYWORDS
Real time video segmentation, spatio-temporal fuzzy hostility index, image correlation, three-
sigma rule
1. INTRODUCTION
There has been a spectacular increase in the amount of multimedia content transmitted and shared
over the internet. The number of users for video-on-demand and Internet Protocol Television (IP-
TV) services are growing at an exponential rate. According to 2012 statistics, there were more
than one million IP-TV streams per month on the Zatoo platform. Textual annotation i.e.
associating a set of keywords for indexing multimedia content was performed to facilitate
searching of relevant information in existing video repositories (e.g. YouTube, DailyMotion etc.).
However, manual annotation of the millions of videos available in such repositories is a
cumbersome task. Content based video analysis techniques [1, 3, 4, 5, 6, 7, 8, 9] do away with
manual annotation of the video data and results in saving time and human effort, with increase in
accuracy. Video segmentation is the preliminary step for analysis of digital video. These
segmentation algorithms can be classified according to the features used, such as pixel-wise
308 Computer Science & Information Technology (CS & IT)
difference [10], histograms [11], standard deviation of pixel intensities [12], edge change ratio
[13] etc. A more daunting task arises for online analysis and indexing of live streaming videos
such as telecast of matches or live performances. Generation of highlights or summarization of
such events is a challenging task as it involves development of algorithms which have good
performance and are adapted to work in real time. Although video edits can be classified into
hard cuts, fades, wipes etc. [9], the live streams mainly contain hard cuts, which is the main focus
of the proposed approach. Several methods for video segmentation [8, 10, 11, 13, 14] exist in the
literature, which have been applied for non-real time videos. However, shot boundary detection
for streaming videos is hard to find in the literature. The difficulties and problems related to
setting thresholds and the necessity of automatic threshold have been discussed in [1]. Hence, the
motivation behind this work was to develop a technique for video segmentation, which is able to
detect with high accuracy, the hard cuts present in live streaming videos. Video segmentation is
performed on the fly, as new video frames are streamed in. Also, in this work, the problem of
setting an automatic threshold has been addressed. The threshold is set dynamically without any a
priori knowledge about the type, content or length of the video. The algorithm incorporates a look
back mechanism to detect any missed hard cuts particularly during the start of the video when the
statistical measures used to set the threshold are unstable. The novelty of the proposed work lies
in its applicability in video summarization tasks of real time events, such as producing highlights
of sports videos.
The reminder of the paper is organized as follows. In section 2, the basic concepts and definitions
are presented. The proposed method for real time video segmentation is described in section 3.
The experimental results and analysis are presented in section 4. The comparison with other
existing approaches is also given in the same section. Finally, the concluding remarks are
mentioned in section 5.
2. BASIC CONCEPTS AND DEFINITIONS
2.1. Application of fuzzy set theory to image processing
A colour image may be considered as a 3D matrix of values where each pixel colour depends on
the combination of RGB intensities. Conversion of the colour image to a gray scale image
involves mapping of the values in the 3D matrix to a 2D matrix, where each pixel value is in the
range [0, 255]. This 2D matrix of values may be scaled to the range [0, 1] by dividing each
element of the matrix by 255. The scaled value represents the membership value of each pixel to
the fuzzy sets labelled as WHITE and BLACK. The number of elements in each fuzzy set is equal
to the number of elements in the said 2D matrix. If a value 0 represents a completely black pixel
and 1 a completely white one, then value of each element depicts the degree of membership
)( iW pµ of the th
i pixel ip to the fuzzy set WHITE. The degree of membership )( iB pµ of the
pixel ip to the set BLACK can be represented as )(1)( iWiB pp µµ −= . Incorporating the
fuzzy set theory, Bhattacharyya et al. [2] have proposed the fuzzy hostility index for detecting the
point of interest in an image, which is useful for high speed target tracking. As an extension of
this concept, a new index is proposed for obtaining the edge map of a video frame as explained in
the next sub-section.
2.2. Edge Map using Spatio-Temporal Fuzzy Hostility Index (STFHI)
Fuzzy hostility index [2] indicates the amount of variation in the pixel neighbourhood with
respect to itself. The pixel hostility has high value if the surrounding pixels have greater
Computer Science & Information Technology (CS & IT) 309
difference of values as compared to the candidate pixel i.e. heterogeneity in its neighbourhood is
more. In an n-order neighbourhood the hostility index )(ζ of a pixel is defined as:-
∑
+
=
+
+++
−
=
1
2
1
1
112
3
n
i qip
qip
n
µµ
µµ
ζ (1)
where pµ is the membership value of the candidate pixel and qiµ ; i =1, 2, 3, . . . , 1
2 +n
are the
membership values of its fuzzy neighbours in a second-order neighbourhood fuzzy subset. The
value of the fuzzy hostility index ζ lies in [0, 1], with 1=ζ signifying maximum
heterogeneity and 0=ζ indicating total homogeneity in the neighbourhood. The concept of
fuzzy hostility index can be effectively extended to accommodate temporal changes in the time
sequenced frames of a video.
(a) (b) (c)
Figure 1. (a) Pre-frame ( 1−if ) (b) Present frame ( if ) (c) Post-frame ( 1+if )
The STFHI ( λ ) of a pixel in the th
i image frame if of a video is a function ω of the fuzzy
hostility index of the candidate pixel in if (marked red in figure 1(b)) and the corresponding
pixels in the previous 1−if and post 1+if frames (marked blue in figures 1(a) and 1(c)), can be
expressed as follows:- ),,( 11 +−
= iiii ffff ζζζωλ (2)
In other words, λ of a pixel is a function of the second order neighbourhood of its corresponding
pixels (marked yellow in figures 1(a) and 1(c)) and itself (marked green in figure 1(b)). ifλ is
computed as the average of 1−ifζ , ifζ and 1+ifζ except for the first and last frames of a video
where 1−ifζ and 1+ifζ are not present respectively. The 2D matrix thus formed by computing the
λ of each pixel will represent the edge map of an image with profound edges of all objects
present in the original image as depicted in figure 3(b).
2.3. Pixel Intensification Function
Pixel intensity scaling is performed to make the edges more prominent compared to other portions
of the image as shown in figure 3(c). The intensity scaling function )(φ used in this work is
shown in figure 2 and mathematically represented as:-
2
)( ijλφ = , if 5.0<ijλ
2/1
)( ijλ= , if 5.0≥ijλ
where ijλ is the STFHI of the pixel at
th
i row and th
j column.
310 Computer Science & Information Technology (CS & IT)
2.4. Edge Dilation
Edge dilation is a technique which is used to enlarge the boundaries of the objects in a graysca
image as depicted in figure 3(d). This may be used to compensate for camera and
movement. A 3×3 square structuring element is used to dilate the edges of the grayscale image so
generated from the fuzzy hostility map. The value of correlation between the similar images is
increased as a result of edge dilation.
(a)
(c)
Figure 3. (a) Original image frame
2.5. Computing Edge Map Similarity
The similarity between two edge maps can be considered as computing the similarity of two 2D
matrices. Therefore, computing the edge map similarity can be achieved by finding the
correlation between the matrices of the edge maps.
coefficient ( YX ,ρ ) between two matrices
Computer Science & Information Technology (CS & IT)
Figure 2. Pixel Intensity Function
Edge dilation is a technique which is used to enlarge the boundaries of the objects in a graysca
(d). This may be used to compensate for camera and
3 square structuring element is used to dilate the edges of the grayscale image so
generated from the fuzzy hostility map. The value of correlation between the similar images is
increased as a result of edge dilation.
(b)
(d)
. (a) Original image frame (b) Edge map using STFHI (c) Pixel intensified frame
(d) Edge dilated frame
Computing Edge Map Similarity
The similarity between two edge maps can be considered as computing the similarity of two 2D
matrices. Therefore, computing the edge map similarity can be achieved by finding the
correlation between the matrices of the edge maps. Computing the Pearson’s co
) between two matrices X and Y of same dimensions, may be represented as:
Edge dilation is a technique which is used to enlarge the boundaries of the objects in a grayscale
(d). This may be used to compensate for camera and object
3 square structuring element is used to dilate the edges of the grayscale image so
generated from the fuzzy hostility map. The value of correlation between the similar images is
(c) Pixel intensified frame
The similarity between two edge maps can be considered as computing the similarity of two 2D
matrices. Therefore, computing the edge map similarity can be achieved by finding the
Computing the Pearson’s correlation
may be represented as:
Computer Science & Information Technology (CS & IT) 311
}
)(
}{
)(
{
})((})({
))((
2
2
2
2
22
,
n
y
y
n
x
x
n
yx
yx
yyxx
yyxx
ij
ij
ij
ij
ijij
ijij
ijij
ijij
YX
∑∑
∑
∑ ∑
∑∑
∑
−−
−
=
−−
−−
=ρ
(3)
where ijx and ijy are the elements in the th
i row and th
j column of matrices X and Y
respectively, x is the mean value of elements of X , y is the mean value of elements of Y and
n is the total number of elements in the matrix under consideration. The correlation is defined
only if both of the standard deviations are finite and both of them are nonzero. The correlation is
1 in the case of an increasing linear relationship, -1 in the case of a decreasing linear relationship,
and some value in between in all other cases, indicating the degree of linear dependence between
the matrices. It is appropriate to mention here that a high value of correlation value indicates high
similarity between the image frames.
2.6. Three Sigma Rule
The standard deviation )(σ of a dataset or probability distribution denotes the variation or
deviation from the arithmetic mean )(M or expected value. The three-sigma rule in statistics is
used to signify the range in which the values of a normal distribution will lie. According to this
rule (refer figure 4), 68.2% values in a normal distribution lie in the range ],[ σσ +− MM , 95.4%
values in ]2,2[ σσ +− MM and 99.6% in the range ]3,3[ σσ +− MM . Hence, this empirical rule may
be reliably used to compute a threshold to detect values which represent abrupt changes. In the
proposed method, three-sigma rule has been used to detect the hard cuts at shot boundaries.
Figure 4. Normal Distribution with three standard deviations from mean
2.7. Real time updating of parameters used for dynamic threshold
The correlation between the edge maps of consecutive image frames provides an indication about
the similarity of the video frames. However, for detection of a video segment, the correlation
gradient is computed from the stream of consecutive correlation values obtained. As mentioned in
the previous sub-section, the threshold is computed from the correlation gradient values using the
three sigma rule. The parameters involved in calculating the threshold are mean )(M and standard
deviation )(σ of the correlation gradient values. It must be noted that these parameters are to be
updated in real time as new video frames are streamed in. The new mean )( newM and )( newσ
standard deviation may be obtained as follows:-
312 Computer Science & Information Technology (CS & IT)
N
C
M
N
i
i∑=
= 1 where, iC is the correlation gradient and N is the number of correlation gradient
values calculated from the frames received. On arrival of a new frame the value of the new mean
will be:-
111
1
1
1
1
1
1
+
+
=
+
+
=
+
= +
+
+
=
+
=
∑∑
N
CMN
N
CC
N
C
M N
N
N
i
i
N
i
i
new 1)
1
1
()
1
( +
+
+
+
=⇒ Nnew C
N
M
N
N
M (4)
The new value of standard deviation )( newσ may be calculated as follows:-
Since, ∑ −=⇒
−Σ
= 22
2
)(
)(
MCN
N
MC
i
i
σσ
∑
+
=
−=+⇒
1
1
22
)()1(
N
i
newinew MCN σ ∑
+
=
+−=+⇒
1
1
222
)1()1(
N
i
newinew MNCN σ
22
1
1
22
1
1
newN
N
i
inew MCC
N
−





+
+
=⇒ +
=
∑σ (5)
Thus, equations (4) and (5) may be used to update the parameters required for calculating the new
threshold. The two equations are significant because the new values of mean and standard
deviation can be calculated in real time from the new correlation value computed and earlier
values of the two parameters, without having to recalculate the parameters over the whole span.
3. PROPOSED METHOD FOR REAL TIME VIDEO SEGMENTATION
The proposed detection mechanism for real time video shot segmentation is shown as a flow
diagram in figure 5. The steps are described in the following subsections.
Figure 5. Flow diagram of the video segment detection process in real time
3.1 Extraction of time sequenced image frames from a video
The streamed video frames are decomposed into its constituent image frames in a time sequenced
manner by a standard codec corresponding to the file type i.e. AVI, MPEG, MP4 etc. The
extracted images are in uncompressed format and are stored as bitmaps for further processing.
Computer Science & Information Technology (CS & IT) 313
3.2 Feature extraction from the image frames
The feature extraction process consists of generating a fuzzy hostility map using the STFHI for
each image frame, as explained in section 2.2. The fuzzy hostility map indicates the amount of
coherence/incoherence in the movement of the objects and is a 2D matrix which is used to
generate the edges of the objects of the gray scale image. Thereafter, an intensity scaling function
is used to make the edges more prominent as explained in section 2.3. To compensate for the
rapid object or camera movement, edge dilation is performed as explained in section 2.4.
3.3 Feature matching based on similarity metric
In this step, the Pearson’s correlation between successive fuzzy hostility maps is used as the
similarity metric as explained in section 2.5. The correlation values thus computed are stored in a
row matrix MC . The shot boundaries occur at points of abrupt change in the correlation values. In
order to detect a shot boundary, the gradient of the correlation values (which is a row vector) is
computed. The correlation gradient plot is depicted in figure 6, consists of steep spikes at the
points of shot boundary.
3.4 Calculation of threshold and updating of threshold parameters
Segments in the streaming video are detected by using the three-sigma rule as explained in
section 2.6. If the correlation gradient exceeds the upper or lower threshold, a video segment is
detected. Since threshold is a function of mean and standard deviation, it has to be updated as
new frames are streamed in real time. The new threshold is calculated using equations (4) and (5)
explained in section 2.7.
Figure 6. Plot of Correlation gradient values.
3.5 Look back N frames to detect missed shot boundaries
In the proposed video shot detection mechanism, the threshold for detection of shot boundary is
set dynamically without any prior knowledge about the type or content of the video. At the initial
stage, the threshold fluctuates due to smaller number of data points owing to the lesser number of
314 Computer Science & Information Technology (CS & IT)
arrived video frames as shown in figure 6. However, the threshold becomes more stable as more
frames arrive in real time. Although the threshold is updated with the arrival of each frame, but
the system looks back only after the passage of N frames, to check if any shot boundaries have
been missed due to the unstable threshold. In this work, N=300 has been taken, i.e. look back will
occur every 12 seconds in a video with frame rate 25 fps or every 10 seconds for 30 fps video.
4. EXPERIMENTAL RESULTS AND ANALYSIS
The proposed method for shot segmentation in real time videos has been tested on a video data set
consisting of eleven videos with varied features (Table I and Table II). All the videos in the test
set have a resolution of 640×360 at 25 fps and are in MP4 format. The performance of the
proposed method is evaluated by taking into consideration two parameters, recall (R) and
precision (P) defined as follows:-
Recall tfd BBB /)( −=
Precision dfd BBB /)( −=
where, dB : Shot boundaries detected by algorithm; fB : False shot boundaries detected and tB :
Actual shot boundaries present in the video
4.1 The Video Dataset
The proposed method has been tested on a dataset consisting of two subsets. The first subset
comprised five videos which were of long duration of average length of more than one hour
(Table 1). The four videos V1 to V4 are documentaries taken from different TV channels. The
video V5 is a highlights video taken from the Cricket World Cup 2011 final match between India
and Sri Lanka.
The second subset composed of short videos (Table 2). The videos V6 and V7 are sports videos
from cricket and tennis. The reason for including this in the dataset was because of the rapid
object movement and small length shots. In contrast, V8 and V9 are movie songs from Hindi
films. V8 has shots taken mostly outdoors in daylight whereas V9 has real life shots are mixed
with some computer generated frames. The average shot duration in V9 is the least among all
videos of the dataset. The video V10 is based on a violin track by Lindsey Stirling which is
characterized by simultaneous movement of the performer as well as camera. The motivation for
including this video is the rapid zoom-in and zoom-out sequences. The video V11 is the official
song of the 2010 FIFA World Cup called “Waka Waka”. The video comprises of varied
background and illumination, intermixed with match sequences taken from FIFA World Cup
history.
Table 1. Test Video Dataset-I
Video V1 V2 V3 V4 V5
Duration (mm:ss) 51:20 28:40 58:06 59:29 111:19
No. of Frames 74020 43018 87150 89225 166990
No. of Hard Cuts 941 406 807 1271 2807
Average no. of frames
in each shot
78.57 105.69 107.85 70.14 59.46
Computer Science & Information Technology (CS & IT) 315
Table 2. Test Video Dataset-II
4.2 Experimental Results
The proposed method for video shot segmentation is found to work accurately on video frames
streamed in at real time. The results obtained by performing the experiments on the video data set
are summarized in Table 3. The shot boundaries obtained are the summation of two phases. Video
segments are detected using the threshold and method as explained in sections 3.2, 3.3 and 3.4.
However some shot boundaries may be missed due to unstable mean and standard deviation.
These missed shot segments are detected by updating the threshold and looking back after every
N frames have elapsed. The methodology has been discussed in section 3.5. Effectiveness of the
proposed method is seen from the high recall and precision values obtained for each of the videos
in the test set.
Table 3. Experimental Results of Test Video Dataset
4.3 Comparison with other existing methods
Several existing methods for video segmentation like Mutual Information (MI) [8], Edge Change
Ratio (ECR) [13] and Color Histogram Differences (CHD) [14] have been applied for non-real
time videos. Shot boundary detection for streaming videos is hard to find in the literature. The
problem of automatic computation of a threshold has been addressed in the literature [1, 15] and
strength of the proposed method lies in the fact that the threshold is computed and updated
automatically without manual intervention, unlike the other existing methods. Hence, this method
can be applied for on-the-fly detection of shot boundaries in real time videos. The comparative
results of the proposed method with its non-real time counterparts are shown in Table 4.
Video V6 V7 V8 V9 V10 V11
Duration (mm:ss) 02:33 02:58 02:42 04:10 03:27 03:31
No. of Frames 3833 4468 4057 6265 4965 5053
No. of Hard Cuts 46 43 70 172 77 138
Average no. of
frames in each shot
81.55 101.54 57.14 36.21 63.65 36.35
Video V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
Hard Cuts
present
941 406 807 1271 2807 46 43 70 172 77 138
Hard Cuts
detected
940 406 806 1269 2796 45 43 67 165 75 136
Detected by
Lookback
1 0 1 2 5 2 0 2 5 1 3
False detection 0 0 0 0 1 2 0 1 1 0 1
Recall (%) 100 100 100 100 99.75 97.82 100 97.14 98.25 98.70 100
Precision (%) 100 100 100 100 99.96 95.74 100 98.55 99.41 100 99.28
316 Computer Science & Information Technology (CS & IT)
Table 4. Comparison with Existing Methods
5. CONCLUSIONS AND REMARKS
The proposed method for real time video segmentation was tested on a diverse video test set. It is
seen to outperform the existing methods in terms of both the recall and precision. As compared to
the state-of-the-art techniques, the proposed method achieves nearly 100% accuracy in terms of
both recall and precision. Also, the number of false detections is very less. The problem of
automatically setting the threshold, without any human interference, has been addressed and
results are very encouraging. The number of false hits is almost negligible as compared to the
other existing methods. A major challenge would be to detect dissolve edits in streaming videos.
ACKNOWLEDGEMENTS
The authors acknowledge the contributions of Surajit Dutta, M.Tech student at RCC Institute of
Information Technology, Kolkata, during the experimentation phase of the proposed method.
REFERENCES
[1] Alan Hanjalic “Shot-Boundary Detection: Unraveled and Resolved,” Circuits and Systems for Video
Technology, IEEE Transactions, Volume:12, Page(s):90-105, February 2002.
[2] Siddhartha Bhattacharyya, Ujjwal Maulik, and Paramartha Dutta “High-speed target tracking by
fuzzy hostility-induced segmentation of optical flow field,” Applied Soft Computing ,Science Direct,
2009.
[3] Hattarge A.M., Bandgar P.A., and Patil V.M. “A Survey on Shot Boundary Detection Algorithms and
Techniques”, International Journal of Emerging Technology and Advanced Engineering, Volume 3,
Issue 2, February 2013.
[4] Biswanath Chakraborty, Siddhartha Bhattacharyya, and Susanta Chakraborty “A Comparative Study
of Unsupervised Video Shot Boundary Detection Techniques Using Probabilistic Fuzzy Entropy
Measures”, DOI: 10.4018/978-1-4666-2518-1.ch009 in Handbook of Research on Computational
Intelligence for Engineering, Science, and Business, 2013.
[5] John S. Boreczky, and Lawrence A. Rowe “Comparison of video shot boundary detection
techniques”, Journal of Electronic Imaging, Page(s):122–128, March,1996.
[6] Swati D. Bendale, and Bijal. J. Talati “Analysis of Popular Video Shot Boundary Detection
Techniques in Uncompressed Domain,” International Journal of Computer Applications (0975 –
8887) ,Volume 60– No.3, IJCA, December, 2012.
Video Proposed Method MI CHD ECR
R P R P R P R P
V1 100% 100% 85.97% 92.98% 75.98% 95.96% 90.96% 92.98%
V2 100% 100% 83% 88.91% 78.07% 87.93% 92.11% 96.05%
V3 100% 100% 91.94% 93.55% 83.02% 88.97% 89.96% 85.99%
V4 100% 100% 87.96% 95.98% 76% 91.03% 95.98% 92.99%
V5 99.75% 99.96% 85.99% 93.97% 81.97% 91.98% 87.99% 94.01%
V6 97.82% 95.74% 86.95% 90.90% 73.91% 80.95% 91.30% 95.45%
V7 100% 100% 88.37% 92.68% 81.57% 96.87% 90.69% 88.63%
V8 97.14% 98.55% 91.42% 94.11% 75.71% 92.98% 95.71% 89.33%
V9 98.25% 99.41% 84.88% 94.80% 77.90% 94.36% 92.44% 88.33%
V10 98.70% 100% 81.81% 92.64% 74.02% 95% 93.50% 90%
V11 100% 99.28% 84.05% 95.08% 80.43% 94.87% 94.92% 91.60%
Computer Science & Information Technology (CS & IT) 317
[7] Ullas Gargi, Rangachar Kasturi, and Susan H. Strayer “Performance Characterization of Video-Shot-
Change Detection Methods,” IEEE Transaction on Circuits and Systems for Video Technology, Vol.
10, No. 1, IEEE, February 2000.
[8] Aarti Kumthekar and Mrs.J.K.Patil “Comparative Analysis Of Video Summarization Methods,”
International Journal of Engineering Sciences & Research Technology, ISSN: 2277-9655,2(1):
Page(s): 15-18, IJESRT January, 2013.
[9] R. Lienhart, S. Pfeiffer, and W. Effelsberg “Scene determination based on video and audio features,”
Proceedings of IEEE International Conference on Multimedia Computing and Systems, Volume:1,
Page(s):685 -690 IEEE, 1999.
[10] H. Zhang, A. Kankanhalli, and S.W. Smoliar “Automatic partitioning of full-motion video,”
Multimedia Systems, Volume: 1, no. 1, Page(s): 10–28, 1993.
[11] A. Nagasaka and Y. Tanaka “Automatic video indexing and full-video search for object appearances,”
Proceedings of IFIP 2nd Working Conference on Visual Database Systems, Page(s): 113-127, 1992
[12] A. Hampapur, R. C. Jain, and T. Weymouth “Production Model Based Digital Video Segmentation,”
Multimedia Tools and Applications, Vol.1, No. 1, Page(s): 9-46, March 1995.
[13] R. Zabih, J. Miller, and K. Mai “A Feature-Based Algorithm for Detecting and Classifying Scene
Breaks,” Proceedings of ACM Multimedia 1995, San Francisco, CA, Page(s):189-200, November,
1995.
[14] Vrutant Hem Thakore “Video Shot Cut Boundary Detection using Histogram,” International Journal
of Engineering Sciences & Research Technology, ISSN: 2277-9655, 2(4): Page(s): 872-875, IJESRT,
April, 2013.
[15] L. Ranathunga, R. Zainuddin, and N. A. Abdullah “Conventional Video Shot Segmentation to
Semantic Shot Segmentation,” 6th IEEE International Conference on Industrial and Information
Systems (ICIIS), Page(s):186-191, August 2011.
AUTHORS
Hrishikesh Bhaumik is currently serving as an Associate Professor and HOD of
Information Technology Department at RCC Institute of Information Technology,
Kolkata, India. He did B.Sc. from Calcutta University in 1997, AMIE in Electronics and
Comm. Engg in 2000 and M.Tech in Information Technology from Bengal Engineering
and Science University, Shibpur in 2004. In 2008 he received sponsorship for working on
a Business Process Sniffer Tool developed at Infosys, Bhubaneswar. He made significant
contributions to the EU-INDIA grid project in 2010 and 2011. His research interests
include Content Based Video Retrieval Systems, Text Mining and High Performance Computing.
Siddhartha Bhattacharyya did his Bachelors in Physics, Bachelors in Optics and
Optoelectronics and Masters in Optics and Optoelectronics (Gold Medal) from University
of Calcutta, India in 1995, 1998 and 2000 respectively. He completed PhD (Engg.) in
Computer Science and Engineering from Jadavpur University, India in 2008. He is
currently an Associate Professor and Dean (R & D) in the Department of Information
Technology of RCC Institute of Information Technology, Kolkata, India. He is a co-author
of two books, co-editor of a book and has more than 90 research publications. He was the
member of the Young Researchers' Committee of the WSC 2008 Online World Conference on Soft
Computing in Industrial Applications. He was the convener of the AICTE-IEEE National Conference on
Computing and Communication Systems (CoCoSys-09) in 2009. He is the Assistant Editor of International
Journal of Pattern Recognition Research since 2010. He is the Associate Editor of International Journal of
BioInfo Soft Computing since 2013. He is the member of the editorial board of International Journal of
Engineering, Science and Technology and the member of the editorial advisory board of HETC Journal of
Computer Engineering and Applications. He is a senior member of IEEE and a member of ACM, IRSS and
IAENG. He is a life member of OSI and ISTE, India.His research interests include soft computing, pattern
recognition and quantum computing.
318 Computer Science & Information Technology (CS & IT)
Dr. Susanta Chakraborty received the B. Tech, M.Tech and Ph.D(Tech) in Computer
Science in 1983, 1985 and 1999 respectively from the University of Calcutta. He is
currently a Professor in the department of Computer Science and Technology at the Bengal
Engineering Science and University, Shibpur, West Bengal, India. Prior to this he served at
University of Kalyani as a Dean of Engineering, Technology and Management faculty.
He has published around 31 research papers in reputed International Journals including
IEEE Transactions on CAD and refereed international conference proceedings of IEEE
Computer Science Press. He was awarded INSA-JSPS Fellowship of Indian National Science Academy
(INSA) in the session 2003-2004. He has collaborated with leading scientists around the world in areas of
Test Generation of Sequential Circuits and Low Power Design, VLSI testing and fault diagnosis, Quantum
circuit and Testable Design of Nano-Circuits and Micro Fluidic Bio-chips. He has more than 25 years of
research experience. He has served as the Publicity Chair, Publication Chair and Program Committee
members of several International Conferences.

Contenu connexe

Tendances

Tracking and counting human in visual surveillance system
Tracking and counting human in visual surveillance systemTracking and counting human in visual surveillance system
Tracking and counting human in visual surveillance system
iaemedu
 

Tendances (14)

A Secure Color Image Steganography in Transform Domain
A Secure Color Image Steganography in Transform Domain A Secure Color Image Steganography in Transform Domain
A Secure Color Image Steganography in Transform Domain
 
Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...
Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...
Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...
 
Tracking and counting human in visual surveillance system
Tracking and counting human in visual surveillance systemTracking and counting human in visual surveillance system
Tracking and counting human in visual surveillance system
 
Reversible Encrypytion and Information Concealment
Reversible Encrypytion and Information ConcealmentReversible Encrypytion and Information Concealment
Reversible Encrypytion and Information Concealment
 
Image Encryption Based on Pixel Permutation and Text Based Pixel Substitution
Image Encryption Based on Pixel Permutation and Text Based Pixel SubstitutionImage Encryption Based on Pixel Permutation and Text Based Pixel Substitution
Image Encryption Based on Pixel Permutation and Text Based Pixel Substitution
 
H1802054851
H1802054851H1802054851
H1802054851
 
Content Based Video Retrieval in Transformed Domain using Fractional Coeffici...
Content Based Video Retrieval in Transformed Domain using Fractional Coeffici...Content Based Video Retrieval in Transformed Domain using Fractional Coeffici...
Content Based Video Retrieval in Transformed Domain using Fractional Coeffici...
 
IMAGE AUTHENTICATION THROUGH ZTRANSFORM WITH LOW ENERGY AND BANDWIDTH (IAZT)
IMAGE AUTHENTICATION THROUGH ZTRANSFORM WITH LOW ENERGY AND BANDWIDTH (IAZT)IMAGE AUTHENTICATION THROUGH ZTRANSFORM WITH LOW ENERGY AND BANDWIDTH (IAZT)
IMAGE AUTHENTICATION THROUGH ZTRANSFORM WITH LOW ENERGY AND BANDWIDTH (IAZT)
 
O017429398
O017429398O017429398
O017429398
 
Visual Cryptography using Image Thresholding
Visual Cryptography using Image ThresholdingVisual Cryptography using Image Thresholding
Visual Cryptography using Image Thresholding
 
Dissertation synopsis for imagedenoising(noise reduction )using non local me...
Dissertation synopsis for  imagedenoising(noise reduction )using non local me...Dissertation synopsis for  imagedenoising(noise reduction )using non local me...
Dissertation synopsis for imagedenoising(noise reduction )using non local me...
 
Video inpainting using backgroung registration
Video inpainting using backgroung registrationVideo inpainting using backgroung registration
Video inpainting using backgroung registration
 
G1802053147
G1802053147G1802053147
G1802053147
 
IRJET- Design and Implementation of ATM Security System using Vibration Senso...
IRJET- Design and Implementation of ATM Security System using Vibration Senso...IRJET- Design and Implementation of ATM Security System using Vibration Senso...
IRJET- Design and Implementation of ATM Security System using Vibration Senso...
 

En vedette

The Future of Flight Simulation (video segment)
The Future of Flight Simulation (video segment)The Future of Flight Simulation (video segment)
The Future of Flight Simulation (video segment)
benhillmedia
 
CV2011-2. Lecture 05. Video segmentation.
CV2011-2. Lecture 05.  Video segmentation.CV2011-2. Lecture 05.  Video segmentation.
CV2011-2. Lecture 05. Video segmentation.
Anton Konushin
 

En vedette (7)

The Future of Flight Simulation (video segment)
The Future of Flight Simulation (video segment)The Future of Flight Simulation (video segment)
The Future of Flight Simulation (video segment)
 
3 video segmentation
3 video segmentation3 video segmentation
3 video segmentation
 
CV2011-2. Lecture 05. Video segmentation.
CV2011-2. Lecture 05.  Video segmentation.CV2011-2. Lecture 05.  Video segmentation.
CV2011-2. Lecture 05. Video segmentation.
 
Segmentation Targeting Positioning
Segmentation Targeting PositioningSegmentation Targeting Positioning
Segmentation Targeting Positioning
 
What's Next in Growth? 2016
What's Next in Growth? 2016What's Next in Growth? 2016
What's Next in Growth? 2016
 
The Outcome Economy
The Outcome EconomyThe Outcome Economy
The Outcome Economy
 
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
 

Similaire à An unsupervised method for real time video shot segmentation

Similaire à An unsupervised method for real time video shot segmentation (20)

Effective Compression of Digital Video
Effective Compression of Digital VideoEffective Compression of Digital Video
Effective Compression of Digital Video
 
A FAST SEARCH ALGORITHM FOR LARGE VIDEO DATABASE USING HOG BASED FEATURES
A FAST SEARCH ALGORITHM FOR LARGE VIDEO DATABASE USING HOG BASED FEATURESA FAST SEARCH ALGORITHM FOR LARGE VIDEO DATABASE USING HOG BASED FEATURES
A FAST SEARCH ALGORITHM FOR LARGE VIDEO DATABASE USING HOG BASED FEATURES
 
A FAST SEARCH ALGORITHM FOR LARGE VIDEO DATABASE USING HOG BASED FEATURES
A FAST SEARCH ALGORITHM FOR LARGE VIDEO DATABASE USING HOG BASED FEATURESA FAST SEARCH ALGORITHM FOR LARGE VIDEO DATABASE USING HOG BASED FEATURES
A FAST SEARCH ALGORITHM FOR LARGE VIDEO DATABASE USING HOG BASED FEATURES
 
76201950
7620195076201950
76201950
 
survey on Scene Detection Techniques on video
survey on Scene Detection Techniques on videosurvey on Scene Detection Techniques on video
survey on Scene Detection Techniques on video
 
AN EMERGING TREND OF FEATURE EXTRACTION METHOD IN VIDEO PROCESSING
AN EMERGING TREND OF FEATURE EXTRACTION METHOD IN VIDEO PROCESSINGAN EMERGING TREND OF FEATURE EXTRACTION METHOD IN VIDEO PROCESSING
AN EMERGING TREND OF FEATURE EXTRACTION METHOD IN VIDEO PROCESSING
 
The Computation Complexity Reduction of 2-D Gaussian Filter
The Computation Complexity Reduction of 2-D Gaussian FilterThe Computation Complexity Reduction of 2-D Gaussian Filter
The Computation Complexity Reduction of 2-D Gaussian Filter
 
A0540106
A0540106A0540106
A0540106
 
IRJET- Comparison and Simulation based Analysis of an Optimized Block Mat...
IRJET-  	  Comparison and Simulation based Analysis of an Optimized Block Mat...IRJET-  	  Comparison and Simulation based Analysis of an Optimized Block Mat...
IRJET- Comparison and Simulation based Analysis of an Optimized Block Mat...
 
IRJET - Steganography based on Discrete Wavelet Transform
IRJET -  	  Steganography based on Discrete Wavelet TransformIRJET -  	  Steganography based on Discrete Wavelet Transform
IRJET - Steganography based on Discrete Wavelet Transform
 
AN ENHANCED CHAOTIC IMAGE ENCRYPTION
AN ENHANCED CHAOTIC IMAGE ENCRYPTIONAN ENHANCED CHAOTIC IMAGE ENCRYPTION
AN ENHANCED CHAOTIC IMAGE ENCRYPTION
 
Review On Fractal Image Compression Techniques
Review On Fractal Image Compression TechniquesReview On Fractal Image Compression Techniques
Review On Fractal Image Compression Techniques
 
C010511420
C010511420C010511420
C010511420
 
IRJET- Contrast Enhancement of Grey Level and Color Image using DWT and SVD
IRJET- Contrast Enhancement of Grey Level and Color Image using DWT and SVDIRJET- Contrast Enhancement of Grey Level and Color Image using DWT and SVD
IRJET- Contrast Enhancement of Grey Level and Color Image using DWT and SVD
 
IRJET- Contrast Enhancement of Grey Level and Color Image using DWT and SVD
IRJET-  	  Contrast Enhancement of Grey Level and Color Image using DWT and SVDIRJET-  	  Contrast Enhancement of Grey Level and Color Image using DWT and SVD
IRJET- Contrast Enhancement of Grey Level and Color Image using DWT and SVD
 
Video Compression Algorithm Based on Frame Difference Approaches
Video Compression Algorithm Based on Frame Difference Approaches Video Compression Algorithm Based on Frame Difference Approaches
Video Compression Algorithm Based on Frame Difference Approaches
 
Secured Data Transmission Using Video Steganographic Scheme
Secured Data Transmission Using Video Steganographic SchemeSecured Data Transmission Using Video Steganographic Scheme
Secured Data Transmission Using Video Steganographic Scheme
 
Fast and Secure Transmission of Image by using Byte Rotation Algorithm in Net...
Fast and Secure Transmission of Image by using Byte Rotation Algorithm in Net...Fast and Secure Transmission of Image by using Byte Rotation Algorithm in Net...
Fast and Secure Transmission of Image by using Byte Rotation Algorithm in Net...
 
IRJET- Image Compression Technique for Quantized Encrypted Images using SVD
IRJET-  	  Image Compression Technique for Quantized Encrypted Images using SVDIRJET-  	  Image Compression Technique for Quantized Encrypted Images using SVD
IRJET- Image Compression Technique for Quantized Encrypted Images using SVD
 
IRJET- Color Image Compression using Canonic Signed Digit and Block based...
IRJET-  	  Color Image Compression using Canonic Signed Digit and Block based...IRJET-  	  Color Image Compression using Canonic Signed Digit and Block based...
IRJET- Color Image Compression using Canonic Signed Digit and Block based...
 

Dernier

Dernier (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

An unsupervised method for real time video shot segmentation

  • 1. Dhinaharan Nagamalai et al. (Eds) : ACITY, WiMoN, CSIA, AIAA, DPPR, NECO, InWeS - 2014 pp. 307–318, 2014. © CS & IT-CSCP 2014 DOI : 10.5121/csit.2014.4531 AN UNSUPERVISED METHOD FOR REAL TIME VIDEO SHOT SEGMENTATION Hrishikesh Bhaumik1 , Siddhartha Bhattacharyya2 and Susanta Chakraborty3 1,2 Department of Information Technology,RCC Institute of Information Technology, Kolkata, India hbhaumik@gmail.com, dr.siddhartha.bhattacharyya@gmail.com 3 Department of Computer Science and Technology, Bengal Engineering and Science University, Shibpur, Howrah, India susanta.chak@gmail.com ABSTRACT Segmentation of a video into its constituent shots is a fundamental task for indexing and analysis in content based video retrieval systems. In this paper, a novel approach is presented for accurately detecting the shot boundaries in real time video streams, without any a priori knowledge about the content or type of the video. The edges of objects in a video frame are detected using a spatio-temporal fuzzy hostility index. These edges are treated as features of the frame. The correlation between the features is computed for successive incoming frames of the video. The mean and standard deviation of the correlation values obtained are updated as new video frames are streamed in. This is done to dynamically set the threshold value using the three-sigma rule for detecting the shot boundary (abrupt transition). A look back mechanism forms an important part of the proposed algorithm to detect any missed hard cuts, especially during the start of the video. The proposed method is shown to be applicable for online video analysis and summarization systems. In an experimental evaluation on a heterogeneous test set, consisting of videos from sports, movie songs and music albums, the proposed method achieves 99.24% recall and 99.35% precision on the average. KEYWORDS Real time video segmentation, spatio-temporal fuzzy hostility index, image correlation, three- sigma rule 1. INTRODUCTION There has been a spectacular increase in the amount of multimedia content transmitted and shared over the internet. The number of users for video-on-demand and Internet Protocol Television (IP- TV) services are growing at an exponential rate. According to 2012 statistics, there were more than one million IP-TV streams per month on the Zatoo platform. Textual annotation i.e. associating a set of keywords for indexing multimedia content was performed to facilitate searching of relevant information in existing video repositories (e.g. YouTube, DailyMotion etc.). However, manual annotation of the millions of videos available in such repositories is a cumbersome task. Content based video analysis techniques [1, 3, 4, 5, 6, 7, 8, 9] do away with manual annotation of the video data and results in saving time and human effort, with increase in accuracy. Video segmentation is the preliminary step for analysis of digital video. These segmentation algorithms can be classified according to the features used, such as pixel-wise
  • 2. 308 Computer Science & Information Technology (CS & IT) difference [10], histograms [11], standard deviation of pixel intensities [12], edge change ratio [13] etc. A more daunting task arises for online analysis and indexing of live streaming videos such as telecast of matches or live performances. Generation of highlights or summarization of such events is a challenging task as it involves development of algorithms which have good performance and are adapted to work in real time. Although video edits can be classified into hard cuts, fades, wipes etc. [9], the live streams mainly contain hard cuts, which is the main focus of the proposed approach. Several methods for video segmentation [8, 10, 11, 13, 14] exist in the literature, which have been applied for non-real time videos. However, shot boundary detection for streaming videos is hard to find in the literature. The difficulties and problems related to setting thresholds and the necessity of automatic threshold have been discussed in [1]. Hence, the motivation behind this work was to develop a technique for video segmentation, which is able to detect with high accuracy, the hard cuts present in live streaming videos. Video segmentation is performed on the fly, as new video frames are streamed in. Also, in this work, the problem of setting an automatic threshold has been addressed. The threshold is set dynamically without any a priori knowledge about the type, content or length of the video. The algorithm incorporates a look back mechanism to detect any missed hard cuts particularly during the start of the video when the statistical measures used to set the threshold are unstable. The novelty of the proposed work lies in its applicability in video summarization tasks of real time events, such as producing highlights of sports videos. The reminder of the paper is organized as follows. In section 2, the basic concepts and definitions are presented. The proposed method for real time video segmentation is described in section 3. The experimental results and analysis are presented in section 4. The comparison with other existing approaches is also given in the same section. Finally, the concluding remarks are mentioned in section 5. 2. BASIC CONCEPTS AND DEFINITIONS 2.1. Application of fuzzy set theory to image processing A colour image may be considered as a 3D matrix of values where each pixel colour depends on the combination of RGB intensities. Conversion of the colour image to a gray scale image involves mapping of the values in the 3D matrix to a 2D matrix, where each pixel value is in the range [0, 255]. This 2D matrix of values may be scaled to the range [0, 1] by dividing each element of the matrix by 255. The scaled value represents the membership value of each pixel to the fuzzy sets labelled as WHITE and BLACK. The number of elements in each fuzzy set is equal to the number of elements in the said 2D matrix. If a value 0 represents a completely black pixel and 1 a completely white one, then value of each element depicts the degree of membership )( iW pµ of the th i pixel ip to the fuzzy set WHITE. The degree of membership )( iB pµ of the pixel ip to the set BLACK can be represented as )(1)( iWiB pp µµ −= . Incorporating the fuzzy set theory, Bhattacharyya et al. [2] have proposed the fuzzy hostility index for detecting the point of interest in an image, which is useful for high speed target tracking. As an extension of this concept, a new index is proposed for obtaining the edge map of a video frame as explained in the next sub-section. 2.2. Edge Map using Spatio-Temporal Fuzzy Hostility Index (STFHI) Fuzzy hostility index [2] indicates the amount of variation in the pixel neighbourhood with respect to itself. The pixel hostility has high value if the surrounding pixels have greater
  • 3. Computer Science & Information Technology (CS & IT) 309 difference of values as compared to the candidate pixel i.e. heterogeneity in its neighbourhood is more. In an n-order neighbourhood the hostility index )(ζ of a pixel is defined as:- ∑ + = + +++ − = 1 2 1 1 112 3 n i qip qip n µµ µµ ζ (1) where pµ is the membership value of the candidate pixel and qiµ ; i =1, 2, 3, . . . , 1 2 +n are the membership values of its fuzzy neighbours in a second-order neighbourhood fuzzy subset. The value of the fuzzy hostility index ζ lies in [0, 1], with 1=ζ signifying maximum heterogeneity and 0=ζ indicating total homogeneity in the neighbourhood. The concept of fuzzy hostility index can be effectively extended to accommodate temporal changes in the time sequenced frames of a video. (a) (b) (c) Figure 1. (a) Pre-frame ( 1−if ) (b) Present frame ( if ) (c) Post-frame ( 1+if ) The STFHI ( λ ) of a pixel in the th i image frame if of a video is a function ω of the fuzzy hostility index of the candidate pixel in if (marked red in figure 1(b)) and the corresponding pixels in the previous 1−if and post 1+if frames (marked blue in figures 1(a) and 1(c)), can be expressed as follows:- ),,( 11 +− = iiii ffff ζζζωλ (2) In other words, λ of a pixel is a function of the second order neighbourhood of its corresponding pixels (marked yellow in figures 1(a) and 1(c)) and itself (marked green in figure 1(b)). ifλ is computed as the average of 1−ifζ , ifζ and 1+ifζ except for the first and last frames of a video where 1−ifζ and 1+ifζ are not present respectively. The 2D matrix thus formed by computing the λ of each pixel will represent the edge map of an image with profound edges of all objects present in the original image as depicted in figure 3(b). 2.3. Pixel Intensification Function Pixel intensity scaling is performed to make the edges more prominent compared to other portions of the image as shown in figure 3(c). The intensity scaling function )(φ used in this work is shown in figure 2 and mathematically represented as:- 2 )( ijλφ = , if 5.0<ijλ 2/1 )( ijλ= , if 5.0≥ijλ where ijλ is the STFHI of the pixel at th i row and th j column.
  • 4. 310 Computer Science & Information Technology (CS & IT) 2.4. Edge Dilation Edge dilation is a technique which is used to enlarge the boundaries of the objects in a graysca image as depicted in figure 3(d). This may be used to compensate for camera and movement. A 3×3 square structuring element is used to dilate the edges of the grayscale image so generated from the fuzzy hostility map. The value of correlation between the similar images is increased as a result of edge dilation. (a) (c) Figure 3. (a) Original image frame 2.5. Computing Edge Map Similarity The similarity between two edge maps can be considered as computing the similarity of two 2D matrices. Therefore, computing the edge map similarity can be achieved by finding the correlation between the matrices of the edge maps. coefficient ( YX ,ρ ) between two matrices Computer Science & Information Technology (CS & IT) Figure 2. Pixel Intensity Function Edge dilation is a technique which is used to enlarge the boundaries of the objects in a graysca (d). This may be used to compensate for camera and 3 square structuring element is used to dilate the edges of the grayscale image so generated from the fuzzy hostility map. The value of correlation between the similar images is increased as a result of edge dilation. (b) (d) . (a) Original image frame (b) Edge map using STFHI (c) Pixel intensified frame (d) Edge dilated frame Computing Edge Map Similarity The similarity between two edge maps can be considered as computing the similarity of two 2D matrices. Therefore, computing the edge map similarity can be achieved by finding the correlation between the matrices of the edge maps. Computing the Pearson’s co ) between two matrices X and Y of same dimensions, may be represented as: Edge dilation is a technique which is used to enlarge the boundaries of the objects in a grayscale (d). This may be used to compensate for camera and object 3 square structuring element is used to dilate the edges of the grayscale image so generated from the fuzzy hostility map. The value of correlation between the similar images is (c) Pixel intensified frame The similarity between two edge maps can be considered as computing the similarity of two 2D matrices. Therefore, computing the edge map similarity can be achieved by finding the Computing the Pearson’s correlation may be represented as:
  • 5. Computer Science & Information Technology (CS & IT) 311 } )( }{ )( { })((})({ ))(( 2 2 2 2 22 , n y y n x x n yx yx yyxx yyxx ij ij ij ij ijij ijij ijij ijij YX ∑∑ ∑ ∑ ∑ ∑∑ ∑ −− − = −− −− =ρ (3) where ijx and ijy are the elements in the th i row and th j column of matrices X and Y respectively, x is the mean value of elements of X , y is the mean value of elements of Y and n is the total number of elements in the matrix under consideration. The correlation is defined only if both of the standard deviations are finite and both of them are nonzero. The correlation is 1 in the case of an increasing linear relationship, -1 in the case of a decreasing linear relationship, and some value in between in all other cases, indicating the degree of linear dependence between the matrices. It is appropriate to mention here that a high value of correlation value indicates high similarity between the image frames. 2.6. Three Sigma Rule The standard deviation )(σ of a dataset or probability distribution denotes the variation or deviation from the arithmetic mean )(M or expected value. The three-sigma rule in statistics is used to signify the range in which the values of a normal distribution will lie. According to this rule (refer figure 4), 68.2% values in a normal distribution lie in the range ],[ σσ +− MM , 95.4% values in ]2,2[ σσ +− MM and 99.6% in the range ]3,3[ σσ +− MM . Hence, this empirical rule may be reliably used to compute a threshold to detect values which represent abrupt changes. In the proposed method, three-sigma rule has been used to detect the hard cuts at shot boundaries. Figure 4. Normal Distribution with three standard deviations from mean 2.7. Real time updating of parameters used for dynamic threshold The correlation between the edge maps of consecutive image frames provides an indication about the similarity of the video frames. However, for detection of a video segment, the correlation gradient is computed from the stream of consecutive correlation values obtained. As mentioned in the previous sub-section, the threshold is computed from the correlation gradient values using the three sigma rule. The parameters involved in calculating the threshold are mean )(M and standard deviation )(σ of the correlation gradient values. It must be noted that these parameters are to be updated in real time as new video frames are streamed in. The new mean )( newM and )( newσ standard deviation may be obtained as follows:-
  • 6. 312 Computer Science & Information Technology (CS & IT) N C M N i i∑= = 1 where, iC is the correlation gradient and N is the number of correlation gradient values calculated from the frames received. On arrival of a new frame the value of the new mean will be:- 111 1 1 1 1 1 1 + + = + + = + = + + + = + = ∑∑ N CMN N CC N C M N N N i i N i i new 1) 1 1 () 1 ( + + + + =⇒ Nnew C N M N N M (4) The new value of standard deviation )( newσ may be calculated as follows:- Since, ∑ −=⇒ −Σ = 22 2 )( )( MCN N MC i i σσ ∑ + = −=+⇒ 1 1 22 )()1( N i newinew MCN σ ∑ + = +−=+⇒ 1 1 222 )1()1( N i newinew MNCN σ 22 1 1 22 1 1 newN N i inew MCC N −      + + =⇒ + = ∑σ (5) Thus, equations (4) and (5) may be used to update the parameters required for calculating the new threshold. The two equations are significant because the new values of mean and standard deviation can be calculated in real time from the new correlation value computed and earlier values of the two parameters, without having to recalculate the parameters over the whole span. 3. PROPOSED METHOD FOR REAL TIME VIDEO SEGMENTATION The proposed detection mechanism for real time video shot segmentation is shown as a flow diagram in figure 5. The steps are described in the following subsections. Figure 5. Flow diagram of the video segment detection process in real time 3.1 Extraction of time sequenced image frames from a video The streamed video frames are decomposed into its constituent image frames in a time sequenced manner by a standard codec corresponding to the file type i.e. AVI, MPEG, MP4 etc. The extracted images are in uncompressed format and are stored as bitmaps for further processing.
  • 7. Computer Science & Information Technology (CS & IT) 313 3.2 Feature extraction from the image frames The feature extraction process consists of generating a fuzzy hostility map using the STFHI for each image frame, as explained in section 2.2. The fuzzy hostility map indicates the amount of coherence/incoherence in the movement of the objects and is a 2D matrix which is used to generate the edges of the objects of the gray scale image. Thereafter, an intensity scaling function is used to make the edges more prominent as explained in section 2.3. To compensate for the rapid object or camera movement, edge dilation is performed as explained in section 2.4. 3.3 Feature matching based on similarity metric In this step, the Pearson’s correlation between successive fuzzy hostility maps is used as the similarity metric as explained in section 2.5. The correlation values thus computed are stored in a row matrix MC . The shot boundaries occur at points of abrupt change in the correlation values. In order to detect a shot boundary, the gradient of the correlation values (which is a row vector) is computed. The correlation gradient plot is depicted in figure 6, consists of steep spikes at the points of shot boundary. 3.4 Calculation of threshold and updating of threshold parameters Segments in the streaming video are detected by using the three-sigma rule as explained in section 2.6. If the correlation gradient exceeds the upper or lower threshold, a video segment is detected. Since threshold is a function of mean and standard deviation, it has to be updated as new frames are streamed in real time. The new threshold is calculated using equations (4) and (5) explained in section 2.7. Figure 6. Plot of Correlation gradient values. 3.5 Look back N frames to detect missed shot boundaries In the proposed video shot detection mechanism, the threshold for detection of shot boundary is set dynamically without any prior knowledge about the type or content of the video. At the initial stage, the threshold fluctuates due to smaller number of data points owing to the lesser number of
  • 8. 314 Computer Science & Information Technology (CS & IT) arrived video frames as shown in figure 6. However, the threshold becomes more stable as more frames arrive in real time. Although the threshold is updated with the arrival of each frame, but the system looks back only after the passage of N frames, to check if any shot boundaries have been missed due to the unstable threshold. In this work, N=300 has been taken, i.e. look back will occur every 12 seconds in a video with frame rate 25 fps or every 10 seconds for 30 fps video. 4. EXPERIMENTAL RESULTS AND ANALYSIS The proposed method for shot segmentation in real time videos has been tested on a video data set consisting of eleven videos with varied features (Table I and Table II). All the videos in the test set have a resolution of 640×360 at 25 fps and are in MP4 format. The performance of the proposed method is evaluated by taking into consideration two parameters, recall (R) and precision (P) defined as follows:- Recall tfd BBB /)( −= Precision dfd BBB /)( −= where, dB : Shot boundaries detected by algorithm; fB : False shot boundaries detected and tB : Actual shot boundaries present in the video 4.1 The Video Dataset The proposed method has been tested on a dataset consisting of two subsets. The first subset comprised five videos which were of long duration of average length of more than one hour (Table 1). The four videos V1 to V4 are documentaries taken from different TV channels. The video V5 is a highlights video taken from the Cricket World Cup 2011 final match between India and Sri Lanka. The second subset composed of short videos (Table 2). The videos V6 and V7 are sports videos from cricket and tennis. The reason for including this in the dataset was because of the rapid object movement and small length shots. In contrast, V8 and V9 are movie songs from Hindi films. V8 has shots taken mostly outdoors in daylight whereas V9 has real life shots are mixed with some computer generated frames. The average shot duration in V9 is the least among all videos of the dataset. The video V10 is based on a violin track by Lindsey Stirling which is characterized by simultaneous movement of the performer as well as camera. The motivation for including this video is the rapid zoom-in and zoom-out sequences. The video V11 is the official song of the 2010 FIFA World Cup called “Waka Waka”. The video comprises of varied background and illumination, intermixed with match sequences taken from FIFA World Cup history. Table 1. Test Video Dataset-I Video V1 V2 V3 V4 V5 Duration (mm:ss) 51:20 28:40 58:06 59:29 111:19 No. of Frames 74020 43018 87150 89225 166990 No. of Hard Cuts 941 406 807 1271 2807 Average no. of frames in each shot 78.57 105.69 107.85 70.14 59.46
  • 9. Computer Science & Information Technology (CS & IT) 315 Table 2. Test Video Dataset-II 4.2 Experimental Results The proposed method for video shot segmentation is found to work accurately on video frames streamed in at real time. The results obtained by performing the experiments on the video data set are summarized in Table 3. The shot boundaries obtained are the summation of two phases. Video segments are detected using the threshold and method as explained in sections 3.2, 3.3 and 3.4. However some shot boundaries may be missed due to unstable mean and standard deviation. These missed shot segments are detected by updating the threshold and looking back after every N frames have elapsed. The methodology has been discussed in section 3.5. Effectiveness of the proposed method is seen from the high recall and precision values obtained for each of the videos in the test set. Table 3. Experimental Results of Test Video Dataset 4.3 Comparison with other existing methods Several existing methods for video segmentation like Mutual Information (MI) [8], Edge Change Ratio (ECR) [13] and Color Histogram Differences (CHD) [14] have been applied for non-real time videos. Shot boundary detection for streaming videos is hard to find in the literature. The problem of automatic computation of a threshold has been addressed in the literature [1, 15] and strength of the proposed method lies in the fact that the threshold is computed and updated automatically without manual intervention, unlike the other existing methods. Hence, this method can be applied for on-the-fly detection of shot boundaries in real time videos. The comparative results of the proposed method with its non-real time counterparts are shown in Table 4. Video V6 V7 V8 V9 V10 V11 Duration (mm:ss) 02:33 02:58 02:42 04:10 03:27 03:31 No. of Frames 3833 4468 4057 6265 4965 5053 No. of Hard Cuts 46 43 70 172 77 138 Average no. of frames in each shot 81.55 101.54 57.14 36.21 63.65 36.35 Video V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 Hard Cuts present 941 406 807 1271 2807 46 43 70 172 77 138 Hard Cuts detected 940 406 806 1269 2796 45 43 67 165 75 136 Detected by Lookback 1 0 1 2 5 2 0 2 5 1 3 False detection 0 0 0 0 1 2 0 1 1 0 1 Recall (%) 100 100 100 100 99.75 97.82 100 97.14 98.25 98.70 100 Precision (%) 100 100 100 100 99.96 95.74 100 98.55 99.41 100 99.28
  • 10. 316 Computer Science & Information Technology (CS & IT) Table 4. Comparison with Existing Methods 5. CONCLUSIONS AND REMARKS The proposed method for real time video segmentation was tested on a diverse video test set. It is seen to outperform the existing methods in terms of both the recall and precision. As compared to the state-of-the-art techniques, the proposed method achieves nearly 100% accuracy in terms of both recall and precision. Also, the number of false detections is very less. The problem of automatically setting the threshold, without any human interference, has been addressed and results are very encouraging. The number of false hits is almost negligible as compared to the other existing methods. A major challenge would be to detect dissolve edits in streaming videos. ACKNOWLEDGEMENTS The authors acknowledge the contributions of Surajit Dutta, M.Tech student at RCC Institute of Information Technology, Kolkata, during the experimentation phase of the proposed method. REFERENCES [1] Alan Hanjalic “Shot-Boundary Detection: Unraveled and Resolved,” Circuits and Systems for Video Technology, IEEE Transactions, Volume:12, Page(s):90-105, February 2002. [2] Siddhartha Bhattacharyya, Ujjwal Maulik, and Paramartha Dutta “High-speed target tracking by fuzzy hostility-induced segmentation of optical flow field,” Applied Soft Computing ,Science Direct, 2009. [3] Hattarge A.M., Bandgar P.A., and Patil V.M. “A Survey on Shot Boundary Detection Algorithms and Techniques”, International Journal of Emerging Technology and Advanced Engineering, Volume 3, Issue 2, February 2013. [4] Biswanath Chakraborty, Siddhartha Bhattacharyya, and Susanta Chakraborty “A Comparative Study of Unsupervised Video Shot Boundary Detection Techniques Using Probabilistic Fuzzy Entropy Measures”, DOI: 10.4018/978-1-4666-2518-1.ch009 in Handbook of Research on Computational Intelligence for Engineering, Science, and Business, 2013. [5] John S. Boreczky, and Lawrence A. Rowe “Comparison of video shot boundary detection techniques”, Journal of Electronic Imaging, Page(s):122–128, March,1996. [6] Swati D. Bendale, and Bijal. J. Talati “Analysis of Popular Video Shot Boundary Detection Techniques in Uncompressed Domain,” International Journal of Computer Applications (0975 – 8887) ,Volume 60– No.3, IJCA, December, 2012. Video Proposed Method MI CHD ECR R P R P R P R P V1 100% 100% 85.97% 92.98% 75.98% 95.96% 90.96% 92.98% V2 100% 100% 83% 88.91% 78.07% 87.93% 92.11% 96.05% V3 100% 100% 91.94% 93.55% 83.02% 88.97% 89.96% 85.99% V4 100% 100% 87.96% 95.98% 76% 91.03% 95.98% 92.99% V5 99.75% 99.96% 85.99% 93.97% 81.97% 91.98% 87.99% 94.01% V6 97.82% 95.74% 86.95% 90.90% 73.91% 80.95% 91.30% 95.45% V7 100% 100% 88.37% 92.68% 81.57% 96.87% 90.69% 88.63% V8 97.14% 98.55% 91.42% 94.11% 75.71% 92.98% 95.71% 89.33% V9 98.25% 99.41% 84.88% 94.80% 77.90% 94.36% 92.44% 88.33% V10 98.70% 100% 81.81% 92.64% 74.02% 95% 93.50% 90% V11 100% 99.28% 84.05% 95.08% 80.43% 94.87% 94.92% 91.60%
  • 11. Computer Science & Information Technology (CS & IT) 317 [7] Ullas Gargi, Rangachar Kasturi, and Susan H. Strayer “Performance Characterization of Video-Shot- Change Detection Methods,” IEEE Transaction on Circuits and Systems for Video Technology, Vol. 10, No. 1, IEEE, February 2000. [8] Aarti Kumthekar and Mrs.J.K.Patil “Comparative Analysis Of Video Summarization Methods,” International Journal of Engineering Sciences & Research Technology, ISSN: 2277-9655,2(1): Page(s): 15-18, IJESRT January, 2013. [9] R. Lienhart, S. Pfeiffer, and W. Effelsberg “Scene determination based on video and audio features,” Proceedings of IEEE International Conference on Multimedia Computing and Systems, Volume:1, Page(s):685 -690 IEEE, 1999. [10] H. Zhang, A. Kankanhalli, and S.W. Smoliar “Automatic partitioning of full-motion video,” Multimedia Systems, Volume: 1, no. 1, Page(s): 10–28, 1993. [11] A. Nagasaka and Y. Tanaka “Automatic video indexing and full-video search for object appearances,” Proceedings of IFIP 2nd Working Conference on Visual Database Systems, Page(s): 113-127, 1992 [12] A. Hampapur, R. C. Jain, and T. Weymouth “Production Model Based Digital Video Segmentation,” Multimedia Tools and Applications, Vol.1, No. 1, Page(s): 9-46, March 1995. [13] R. Zabih, J. Miller, and K. Mai “A Feature-Based Algorithm for Detecting and Classifying Scene Breaks,” Proceedings of ACM Multimedia 1995, San Francisco, CA, Page(s):189-200, November, 1995. [14] Vrutant Hem Thakore “Video Shot Cut Boundary Detection using Histogram,” International Journal of Engineering Sciences & Research Technology, ISSN: 2277-9655, 2(4): Page(s): 872-875, IJESRT, April, 2013. [15] L. Ranathunga, R. Zainuddin, and N. A. Abdullah “Conventional Video Shot Segmentation to Semantic Shot Segmentation,” 6th IEEE International Conference on Industrial and Information Systems (ICIIS), Page(s):186-191, August 2011. AUTHORS Hrishikesh Bhaumik is currently serving as an Associate Professor and HOD of Information Technology Department at RCC Institute of Information Technology, Kolkata, India. He did B.Sc. from Calcutta University in 1997, AMIE in Electronics and Comm. Engg in 2000 and M.Tech in Information Technology from Bengal Engineering and Science University, Shibpur in 2004. In 2008 he received sponsorship for working on a Business Process Sniffer Tool developed at Infosys, Bhubaneswar. He made significant contributions to the EU-INDIA grid project in 2010 and 2011. His research interests include Content Based Video Retrieval Systems, Text Mining and High Performance Computing. Siddhartha Bhattacharyya did his Bachelors in Physics, Bachelors in Optics and Optoelectronics and Masters in Optics and Optoelectronics (Gold Medal) from University of Calcutta, India in 1995, 1998 and 2000 respectively. He completed PhD (Engg.) in Computer Science and Engineering from Jadavpur University, India in 2008. He is currently an Associate Professor and Dean (R & D) in the Department of Information Technology of RCC Institute of Information Technology, Kolkata, India. He is a co-author of two books, co-editor of a book and has more than 90 research publications. He was the member of the Young Researchers' Committee of the WSC 2008 Online World Conference on Soft Computing in Industrial Applications. He was the convener of the AICTE-IEEE National Conference on Computing and Communication Systems (CoCoSys-09) in 2009. He is the Assistant Editor of International Journal of Pattern Recognition Research since 2010. He is the Associate Editor of International Journal of BioInfo Soft Computing since 2013. He is the member of the editorial board of International Journal of Engineering, Science and Technology and the member of the editorial advisory board of HETC Journal of Computer Engineering and Applications. He is a senior member of IEEE and a member of ACM, IRSS and IAENG. He is a life member of OSI and ISTE, India.His research interests include soft computing, pattern recognition and quantum computing.
  • 12. 318 Computer Science & Information Technology (CS & IT) Dr. Susanta Chakraborty received the B. Tech, M.Tech and Ph.D(Tech) in Computer Science in 1983, 1985 and 1999 respectively from the University of Calcutta. He is currently a Professor in the department of Computer Science and Technology at the Bengal Engineering Science and University, Shibpur, West Bengal, India. Prior to this he served at University of Kalyani as a Dean of Engineering, Technology and Management faculty. He has published around 31 research papers in reputed International Journals including IEEE Transactions on CAD and refereed international conference proceedings of IEEE Computer Science Press. He was awarded INSA-JSPS Fellowship of Indian National Science Academy (INSA) in the session 2003-2004. He has collaborated with leading scientists around the world in areas of Test Generation of Sequential Circuits and Low Power Design, VLSI testing and fault diagnosis, Quantum circuit and Testable Design of Nano-Circuits and Micro Fluidic Bio-chips. He has more than 25 years of research experience. He has served as the Publicity Chair, Publication Chair and Program Committee members of several International Conferences.