In the context of Smart Cities, monitoring the dynamic of the presence of people is a crucial aspect for the well-being of an urban area. We use mobile phone data as a proxy for the total number of people (Carpita & Simonetto 2014), with the specific aim of computing spatio-temporal region specific indicators. Telecom Italia Mobile (TIM), which is the largest operator in Italy, thanks to a research agreement with the Statistical Office of the Municipality of Brescia, provided to us about two years (April 2014 to June 2016) of High-Frequency Daily Mobile Phone Density Profiles (DMPDPs) in the form of a regular grid polygon each 15 minutes. Densities have to be rescaled in order to express the total amount of people rather than just TIM users. Separately
for selected regions in the province of Brescia, characterized by being either working or residential areas, we group similar DMPDPs and we characterize groups by their spatial and temporal components. In doing so, we propose a mixed-approach procedure.
Powerful Love Spells in Arkansas, AR (310) 882-6330 Bring Back Lost Lover
Human activity spatio-temporal indicators using mobile phone data
1. Humam
Activity
Indicators
Metulini
Carpita
Context &
Objective
The Approach
Step 1:
Cluster HOG
features
Step 2. Daily
curves
clustering
Step 3.
Functional
Box Plots
Conclusions
References
Supplementary
material
Humam Activity Spatio-Temporal Indicators
by using Mobile Phone Data
Rodolfo Metulini, Maurizio Carpita
Data Methods and Systems Statistical Laboratory - Department of
Economics and Management, University of Brescia
1/15
2. Humam
Activity
Indicators
Metulini
Carpita
Context &
Objective
The Approach
Step 1:
Cluster HOG
features
Step 2. Daily
curves
clustering
Step 3.
Functional
Box Plots
Conclusions
References
Supplementary
material
Context & Objective
• This work use mobile phone data provided by Telecom Italia Mobile
(TIM), which is currently the largest operator in Italy in this sector
(∼ 1/3 of the market).
Similar data used by Carpita & Simonetto (2014) Secchi et al. (2017),
Zanini et al. (2016), Manfredini et al. (2015), Finazzi, Paci (2017, 2018).
• Data are characterized by a 2-D spatial component (i.e. a raster
made of nxn cells) and by a temporal component (i.e. each cell has
repeated values in time, one each 15 minutes).
• The aim is to to find reference daily profiles by clustering similar
days in terms of the spatial and the temporal structure.
2/15
3. Humam
Activity
Indicators
Metulini
Carpita
Context &
Objective
The Approach
Step 1:
Cluster HOG
features
Step 2. Daily
curves
clustering
Step 3.
Functional
Box Plots
Conclusions
References
Supplementary
material
The application
• We select the grid of the city of Brescia (lat/long
[10.18,10.245,45.516,45.564] made of 39 x 39 150 m2
square cells,
• at 15-minutes intervals (quarters) over the period September 1st,
2015 - August 10th, 2016.
• We input missing quarters and remove the full day when they are too
many,
• ending up with a number of 330 days.
3/15
4. Humam
Activity
Indicators
Metulini
Carpita
Context &
Objective
The Approach
Step 1:
Cluster HOG
features
Step 2. Daily
curves
clustering
Step 3.
Functional
Box Plots
Conclusions
References
Supplementary
material
The Approach
Step Action Aim Method Using ..
1 group days find similar
raster images
histogram of ori-
ented gradients
(HOG)
HOG
features
2 group groups
of days
find similar
densities
functional
model-based
clustering
daily
density
profiles
3 characterize
groups
find reference
daily profiles
functional box
plots
daily
density
profiles
4/15
5. Humam
Activity
Indicators
Metulini
Carpita
Context &
Objective
The Approach
Step 1:
Cluster HOG
features
Step 2. Daily
curves
clustering
Step 3.
Functional
Box Plots
Conclusions
References
Supplementary
material
Histogram of Oriented Gradients I
From a nxn raster data ....
93 124 77 ... ...
217 55 94 ... ...
24 77 109 ... ...
... ... ... ... ...
... ... ... ... ...
...to Xt , a matrix representing the
number of people in that cell at time t
1 Define Zt = Xt /max(Xt ) *100;
2 split Zt in 3x3 = 9 matrices Zt,c ;
3 for each Zt,c compute the
matrices of gradients Gx and Gy
using the sobel operator;
4 define each element of the
direction matrix as
g = arctan
gy
gx
;
5 define each element of the
magnitude matrix as
θ = g2
x + g2
y ;
6 assign each value of the direction
matrix to one of the 6 bins of the
histogram using its magnitude as
weight.
5/15
6. Humam
Activity
Indicators
Metulini
Carpita
Context &
Objective
The Approach
Step 1:
Cluster HOG
features
Step 2. Daily
curves
clustering
Step 3.
Functional
Box Plots
Conclusions
References
Supplementary
material
Histogram of Oriented Gradients II
1 Stack into a vector the features
of the 96 quarters of the same
day, producing the matrix H˜t ;
2 K-mean cluster to group days
(H’s columns) in terms of the
HOG features (H’s rows);
3 6 groups, by looking to the
decreasing of the within deviance
total deviance
by
increasing the clusters.
quart. feat. day1 day2 ... day ˜T
1 1 h11,1 h21,1 ... h ˜T1,1
1 2 h11,2 h21,2 ... h ˜T1,2
1 ... ... ... ... ...
1 k h11,k h21,k ... h ˜T1,k
... ... ... ... ... ...
96 k h196,k h296,k ... h ˜T96,k
q
q
q
q
q
q
q q
q
q q
q
q
q q
2 4 6 8 10 12 14
0.40.50.60.70.80.91.0
Number of Clusters
Withingroups/Totalsumofsquares
Advantages:
• It permits to pass from a 2D raster data to a 1D vector by preserving the
“spatial” structure of the data.
• It reduces dimensionality: we describe a raster of 1521 values with 54 HOG
features, with a dimensionality reduction of order 1521/54 = 28.17.
6/15
7. Humam
Activity
Indicators
Metulini
Carpita
Context &
Objective
The Approach
Step 1:
Cluster HOG
features
Step 2. Daily
curves
clustering
Step 3.
Functional
Box Plots
Conclusions
References
Supplementary
material
Cluster HOG features: results
Red: n= 41 weekends from September to Xmas holidays
Orange: n = 90 week days from September to Xmas holidays
Yellow: n = 47 week days of Summer
Green: n =35 Saturdays from January to August
Light blue: n = 81 week days from January to May
Blue: n=36 Sundays (except September & October)
7/15
9. Humam
Activity
Indicators
Metulini
Carpita
Context &
Objective
The Approach
Step 1:
Cluster HOG
features
Step 2. Daily
curves
clustering
Step 3.
Functional
Box Plots
Conclusions
References
Supplementary
material
Step 2. Daily curves clustering
1 Separately for each group, we remove abnormal curves using
functional data outlier detection by likelihood ratio test (LRT) (fda
package), as proposed by Febrero-Bande et al. (2008);
2 we apply the cluster method developed by Bouveyron et al. (2015)
along with funFEM package in R:
• curves are modelled by smoothing a Fourier basis
(basis=9);
• the command automatically choose for the best model
among alternatives applying constraints on the parameters
of the matrix Σk (var-cov matrix of the latent expansion
coefficients of the curves);
• number of groups using BIC over the range [2:7];
• random initial values for the prior probability πk .
Why a Model-Based Function Data Analysis approach?
• It is more flexible.
• Each group corresponds to a distribution with specific parameters.
9/15
10. Humam
Activity
Indicators
Metulini
Carpita
Context &
Objective
The Approach
Step 1:
Cluster HOG
features
Step 2. Daily
curves
clustering
Step 3.
Functional
Box Plots
Conclusions
References
Supplementary
material
Daily curves clustering: results
Estimated daily profiles by group (left) and groups’ centroid (right): Week days
of summer, yellow cluster (2 outliers removed with LRT)
0 20 40 60 80
300000400000500000
time
value
0 20 40 60 80
350000450000
time
value
Estimated daily profiles by group (left) and groups’ centroid (right): Weekends
from September to Xmas, red cluster (4 outliers removed with LRT)
0 20 40 60 80
200000300000400000500000
time
value
0 20 40 60 80
250000350000450000
time
value
10/15
11. Humam
Activity
Indicators
Metulini
Carpita
Context &
Objective
The Approach
Step 1:
Cluster HOG
features
Step 2. Daily
curves
clustering
Step 3.
Functional
Box Plots
Conclusions
References
Supplementary
material
Step 3. Confidence Intervals with
Functional Box Plots
• The analog of the traditional box plot for curves, proposed by Sun &
Genton (2011)
• Curves are ordered using the concept of “band depth”
• A curve is an outlier if it exceed 1.5 * envelope in at least one point.
11/15
12. Humam
Activity
Indicators
Metulini
Carpita
Context &
Objective
The Approach
Step 1:
Cluster HOG
features
Step 2. Daily
curves
clustering
Step 3.
Functional
Box Plots
Conclusions
References
Supplementary
material
Functional Box Plots: results
Functional box plot for daily profiles (in thousands of people). Week days of
summer, yellow cluster.
0 20 40 60 80
303540455055
June
0 20 40 60 80
303540455055
July
0 20 40 60 80
303540455055
August
12/15
13. Humam
Activity
Indicators
Metulini
Carpita
Context &
Objective
The Approach
Step 1:
Cluster HOG
features
Step 2. Daily
curves
clustering
Step 3.
Functional
Box Plots
Conclusions
References
Supplementary
material
Future directions
• To Increase the sample size by adding repeated measure of the
same day, in order to have more robust functional box plots
• To apply a coefficient for TIM’s market share to estimate the total
number of people
13/15
14. Humam
Activity
Indicators
Metulini
Carpita
Context &
Objective
The Approach
Step 1:
Cluster HOG
features
Step 2. Daily
curves
clustering
Step 3.
Functional
Box Plots
Conclusions
References
Supplementary
material
References
1 Bouveyron, C., Come, E., & Jacques, J. (2015). The discriminative functional mixture model for a
comparative analysis of bike sharing systems. The Annals of Applied Statistics, 9(4), 1726-1760.
2 Carpita M., Simonetto A. (2014). Big Data to Monitor Big Social Events: Analysing the mobile
phone signals in the Brescia Smart City. Electronic Journal of Applied Statistical Analysis: Decision
Support Systems, Volume 5, Issue 1, pp 31-41, DOI: 10.1285/i2037-3627v5n1p31
3 Finazzi, F., & Paci, L. (2017, June). Space-time clustering for identifying population patterns from
smartphone data. In SIS 2017 Statistics and Data Science: new challenges, new generations (pp.
423-428). Firenze University Press.
4 Finazzi, F., & Paci, L. (2018). A comparison of statistical methods for estimating individual location
densities from smartphone data. In ITISE 2018-International conference on Time Series and
Forecasting (pp. 1471-1482). Godel Impresiones Digitales SL.
5 Febrero, M., Galeano, P., & Gonzalez-Manteiga, W. (2008). Outlier detection in functional data by
depth measures, with application to identify abnormal NOx levels. Environmetrics: The official
journal of the International Environmetrics Society, 19(4), 331-345.
6 Manfredini, F., Pucci, P., Secchi, P., Tagliolato, P., Vantini, S., & Vitelli, V. (2015). Treelet
decomposition of mobile phone data for deriving city usage and mobility pattern in the Milan urban
region. In Advances in complex data modeling and computational methods in statistics (pp.
133-147). Springer, Cham.
7 Secchi, P., Vantini, S., & Zanini, P. (2017). Analysis of Mobile Phone Data for Deriving City Mobility
Patterns. In Electric Vehicle Sharing Services for Smarter Cities (pp. 37-58). Springer, Cham.
8 Sun, Y., & Genton, M. G. (2011). Functional boxplots. Journal of Computational and Graphical
Statistics, 20(2), 316-334.
9 Tomasi, C. (2012). Histograms of oriented gradients. Computer Vision Sampler, 1-6.
10 Zanini, P., Shen, H., & Truong, Y. (2016). Understanding resident mobility in Milan through
independent component analysis of Telecom Italia mobile usage data. The Annals of Applied
Statistics, 10(2), 812-833.
14/15
15. Humam
Activity
Indicators
Metulini
Carpita
Context &
Objective
The Approach
Step 1:
Cluster HOG
features
Step 2. Daily
curves
clustering
Step 3.
Functional
Box Plots
Conclusions
References
Supplementary
material
FD Model-Based Clustering by
“Bouveyron et al., 2015”
We know values xi on a finite set of ordered times but we do not know the
functional expressions of the observed curves.
We define X(t) =
p
j=1
γj (X)ψj (t) to be a p basis expansion of X.
The aim is to predict the value zi = (zi1, ..., ziK ) of the unobserved random
variable Z = (Z1, ..., ZK ) for each observed curve xi .
The marginal distribution on Γ reads as a mixture of Gaussians:
p(λ) =
K
k=1
πk φ(γ; Uµk , Ut Σk Ut + Ξ)
φ is the Gaussian density function and πk is the prior probability of the k-th group.
Γ = UΛ + .
Λ = λ1, ..., λn is the latent expansion coefficients of the curves x1, ..., xn and,
conditionally on Z, it distributes as a multivariate Gaussian density ∼ N(µk , Σk ),
where µk and Σk are the mean and the covariance matrix of the k-th group.
15/15