2. Introduction
Proposed System
Crowd Segmentation
Background Subtraction Evaluation
Crowd Density Estimation
Crowd Tracking
Speed Estimation of Vehicle Crowd
Feature Vector
Experimental Evaluation
Discussion and Conclusions
2
3. Intelligent vision systems for urban traffic
surveillance have been adopted more frequently.
The traditional approaches are based on detection
and counting of individual vehicles.
Basically each vehicle is segmented and tracked,
and its motion trajectory is analyzed to estimate
traffic flow, vehicle speed and parked vehicle.
3
4. AgilityVideo system for vehicle counting. VaxtorSystems for vehicle speed estimation.
I2V system for vehicle counting and classification. 4
VCA system for vehicle detection, tracking and classification.
5. Most of the existing work commonly fails on crowded situations
due to the large occlusion of moving objects.
Alternative methods for dealing with this problem gave rise to a new field of
study called crowd analysis (Junior et al, 2010).
There are two approaches to perform behavioral analysis of crowded scenes:
• Object-based approaches try to infer crowd behavior by analyzing
individual elements of the scene (tracking of some individuals to analyze
the group behavior).
• Holistic approaches evaluate the crowd as an individual entity. 5
6. Holistic approaches try to obtain global information, such as crowd
flows, and skip local information (e.g. single vehicle against the flow).
Input video Vehicle crowd
Related works
Porikli and Li (2004) – DCT and MPEG flow vectors
Chan and Vasconcelos (2005) – ARIMA
Lee and Bovik (2009) – hist. of optical flow vectors
Derpanis and Wildes (2011) – 3D gaussian + fourrier
Some holistic properties can Can be extracted from:
be extracted from crowds
Density
Speed • Background Subtraction
behavior analysis like: • Optical Flow
Saxena et al. (2008),
Direction • Texture, Color and Edge analysis
Zhan et al. (2008;) and Localization • Analysis in Frequency Domain
Junior et al. (2010)
6
7. Commercial system from
ObjectVideo for classification
Classification based on autoregressive of traffic status.
model (Chan and Vasconcelos 2005).
7
8. We propose a method to classify traffic patterns based on holistic approach
The method classifies the traffic into three classes (light, medium or heavy
congestion) by usage of average crowd density and average speed of vehicle crowd.
The heavy congestion is represented by a high crowd density and low (or zero) crowd
speed. Otherwise, when crowd density is low and crowd speed is high, the system
consider that the traffic has light congestion. In intermediate situations, the traffic is
classified as medium congestion. 8
9. Here we estimate the vehicle crowd density by background subtraction process.
Five recently background subtraction methods with ChangeDetection.net video
database are evaluate.
All BGS methods have been set with default parameters defined in each work.
As can be seen in Table 1, the Multi-Layer method proposed by Yao and Odobez (2007) had the
best score.
9
10. The crowd density is determined by counting the number of pixels in foreground
mask obtained by Multi-Layer BGS.
This procedure is performed for each video frame.
Note: The traffic crowd density is
estimated by the average of density
variation in each video.
10
11. To perform the crowd tracking
the traditional KLT (Kanade-Lucas-Tomasi) tracker method was chosen.
Example of feature points tracking. Given two
consecutive frames (a), one extracts a certain
amount of feature points in first frame (b) and
seek for the matching points in the second
frame (c). The filtered out points are shown in
(d).
11
12. To estimate the speed, the average displacement of feature points
along all frames is calculated.
12
13. To train the classifier for predicting the
traffic congestion, a feature vector is
built
for the i th processed video.
i = average crowd density of the i
th video
vi = average crowd speed of the i th video
They are all combined in one train vector
13
14. UCSD highway traffic data set
A set contains 254 videos of daytime high-way traffic in Seattle.
All videos are recorded from a single stationary camera totaling 20 minutes.
The data set includes a diversity of traffic patterns like light, medium and
heavy congestion with variety of weather conditions (e.g., clear, raining and
cloudy).
Each video has 42-52 frames with
320x240 resolution recorded at 10
frames per second (fps).
The data set also provides a hand-
labeled ground truth that describes
each video sequence.
Table 2 shows a summary of UCSD
dataset.
14
16. The same training and testing methodology of Chan e
Vasconcelos (2005) and Derpanis and Wildes (2011) is adopted
here.
The experiment evaluation consists of four trials (T1,...,T4),
where in each trial the data set was split with 75% for training
and cross-validation and 25% for testing.
Feature classification is evaluated using four classifiers (kNN,
Naive Bayes, SVM and ANN-MLP).
The results obtained for each classifier are shown in the next
slides.
16
17. kNN with Euclidean distance was used and the
number of kNN neighbors was evaluated
empirically varying the range in K = [1, …,10].
17
18. Here the NBC assumes a Gaussian distribution.
Table 4 shows the accuracy of NBC.
18
19. Here, the following kernels are selected: linear, polynomial,
radial basis and sigmoid.
The parameters of each kernel function have been adjusted
automatically by 3-fold and 10-fold cross-validation on
training sets.
19
20. The MLP network was
configured as follows:
a) the input layer has 2 neurons
(one for crowd density and the
other to crowd speed);
b) the hidden layer evaluation are
made with 2-5 neurons;
c) the output layer contains 3
neurons, one for each traffic
patterns (light, medium and
heavy).
20
21. The MLP network was configured
as follows:
All neurons use the same activation
functions. Sigmoid function (SIG)
and Gaussian function (GAU) are
selected with standard parameters.
Two training algorithms are
chosen, the traditional gradient
descent back-propagation
(BPROP) and resilient back-
propagation (RPROP).
21
22. The experimental results have shown that the ANN-MLP network has
the best accuracy (94.5%).
The most critical situation of misclassification occurs between medium
and heavy traffic patterns (next slide shows that).
All experiments are performed on a computer running Intel Core i5-
2410m processor.
On UCSD data, the proposed system requires avg. 30ms/frame for
background segmentation and tracking.
In Chan and Vasconcelos (2005), the authors have had 94.5% of
accuracy using SVM classifier
Later, Derpanis and Wildes (2011) have achieved an accuracy of 95.3%
with kNN classifier (Table 9, next slide)
The present proposal has achieved 94.5% of accuracy using ANN-MLP
(Artificial Neural Networks-Multi-Layer Perceptions). 22
24. This paper has presented a system for traffic congestion
classification based on crowd density and crowd speed of
vehicles.
The present approach is based on crowd segmentation and
tracking using robust background subtraction method and
traditional pyramidal KLT feature tracker.
Experimental evaluation on real world data set shows that the
proposed system achieves compatible results of similar
previous work even when using a different approach.
24
25. In previous works Chan and Vasconcelos (2005) and Derpanis
and Wildes (2011), the authors have described that only
dynamical information is insufficient to distinguish empty-
traffic from stopped-traffic (both stationary) since the pixel
dynamics are similar.
Derpanis and Wildes (2011) suggests that one possible solution
is to incorporate spatial appearance information of background
scene to distinguish the presence (or absence) of vehicles.
25
26. In the present work, the BGS method includes both spatial appearance and
dynamical information to build a background model, but the problem of
distinguishing empty-traffic from stopped-traffic (both stationary) still
remains a challenge since:
a) to initialize the background model it is necessary that traffic is not stopped,
otherwise the background model will include stopped vehicles; and
b) even having a reasonable background model, the BGS method needs
updating with a certain learning rate. If the vehicles are stopped for a lengthy
period, the BGS method may include stopped vehicles in the background
model.
So, the problem of empty-traffic and stopped-traffic is not completely
solved here.
Another possible solution for future work research may be the
development of a crowd detector that uses only spatial appearance to
segment the vehicle’s crowd. 26
28. BUCH, N.; VELASTIN, S.; ORWELL, J. A review of computer vision techniques for the analysis of urban
traffic. IEEE Transactions on Intelligent Transportation Systems (ITS'11), v. 12, n. 3, p. 920{939, Sept.
2011.
CHAN, A.; VASCONCELOS, N. Classification and retrieval of traffic video using auto-regressive
stochastic processes. In: IEEE Intelligent Vehicles Symposium, 2005. p. 771-776.
DERPANIS, K. G.; WILDES, R. P. Classification of traffic video based on a spatiotemporal orientation
analysis. In: Proceedings of the 2011 IEEE Workshop on Applications of Computer Vision (WACV).
Washington, DC, USA: IEEE Computer Society, 2011. (WACV'11), p. 606-613. ISBN 978-1-4244-9496-
5.
JUNIOR, J. J.; MUSSE, S.; JUNG, C. Crowd analysis using computer vision techniques. IEEE Signal
Processing Magazine, v. 27, n. 5, p. 6677, sept. 2010.
LUCAS, B. D.; KANADE, T. An iterative image registration technique with an application to stereo
vision. In: Proceedings of the 7th international joint conference on Artificial intelligence - Volume 2.
[S.l.: s.n.], 1981. p. 674-679.
LEE, J.; BOVIK, A. Estimation and analysis of urban traffic flow. In: 16th IEEE International Conference
on Image Processing (ICIP'09). [S.l.: s.n.], 2009. p. 1157-1160. ISSN 1522-4880.
SANTORO, F.; PEDRO, S.; TAN, Z.-H.; MOESLUND, T. B. Crowd analysis by using optical flow and
density based clustering. Proceedings of the European Signal Processing Conference (EUSIPCO),
European Association for Signal Processing (EURASIP), v. 18, p. 269{273, 2010. ISSN 2076-1465. 28
29. SAXENA, S.; BREMOND, F.; THONNAT, M.; MA, R. Crowd behavior recognition for video surveillance.
In: Proceedings of the 10th International Conference on Advanced Concepts for Intelligent Vision
Systems. Berlin, Heidelberg: Springer-Verlag, 2008. (ACIVS'08), p. 970-981. ISBN 978-3-540-88457-6.
SENST, T.; EISELEIN, V.; SIKORA, T. Robust local optical flow for feature tracking. IEEE Transactions on
Circuits and Systems for Video Technology, 2012.
SHI, J.; TOMASI, C. Good features to track. In: IEEE Computer Society Conference on Computer Vision
and Pattern Recognition (CVPR'94). [S.l.: s.n.], 1994. p. 593-600. ISSN 1063-6919.
SOBRAL, A.; OLIVEIRA, L.; SCHNITMAN, L.; SOUZA, F. D. Highway traffic congestion classification using
holistic properties. 10th IASTED International Conference on Signal Processing, Pattern Recognition
and Applications (SPPRA'2013), fev. 2013.
YAO, J.; ODOBEZ, J. Multi-layer background subtraction based on color and texture. In: IEEE
Conference on Computer Vision and Pattern Recognition (CVPR'07), 2007. p. 1{8.
YILMAZ, A.; JAVED, O.; SHAH, M. Object tracking: A survey. ACM Computing Surveys, ACM, New York,
NY, USA, v. 38, n. 4, dec 2006. ISSN 0360-0300.
ZHAN, B.; MONEKOSSO, D. N.; REMAGNINO, P.; VELASTIN, S. A.; XU, L.-Q. Crowd analysis: a survey.
Machine Vision and Applications, v. 19, n. 5-6, p. 345-357, 2008. 29