Visualizing Spatiotemporal Clusters on Maps

Identifying and Visualizing Spatiotemporal
Clusters on Map Tiles

Markus L¨cher
o

July 2012

Motivation
Spatial Plots in R
Spatiotemporal Clusters

RgoogleMaps
Static Google Maps
Integration of maps into R

Hot Spot Detection
SatScan
SpatialEpi
Unsupervised as Supervised Learning
Trees
PRIM

Outlook
OpenstreetMaps
Oﬄine use and enlarged maps

Motivation II a

The meuse data set gives locations and topsoil heavy metal concentrations, along with a number of soil and landscape variablesat the

observation locations, collected in a ﬂood plain of the river Meuse, near the village of Stein (NL). Heavy metal concentrations are from

Examples: pennLC incidence
County-level (n = 67) population/case data for lung cancer in Pennsylvania in
2002, stratiﬁed on race (white vs non-white), gender and age (Under 40, 40-59,
60-69 and 70+). The data also contain county-speciﬁc smoking rates.

0.0263 − 0.0364
0.0364 − 0.0504
0.0504 − 0.0698
0.0698 − 0.0967
0.0967 − 0.134

Spatiotemporal Clusters
Scoring unusual events in space and time has been an active and
important field of research for decades: How do we
distinguish normal fluctuations in a stochastic count process from
real additive events ?
identify spatiotemporal clusters where the event is most strongly
pronounced ?
efficiently graph these clusters in a map overlay ?
Supervised learning algorithms are proposed as an alternative to the
computationally expensive scan statistic.
The task can be reduced to detecting over-densities in space relative to a
background density.

Motivation III
North-East Visualization and Analytics Center

Geovisual analytics of spatial scan
Benefit: Geovisual
statistics
analytics of spatial scan Combining geovisual analytics with spatial scan statistic
statistics support s spatial
cluster analysis at multiple Spatial scan statistics method s are widely used in identifying and verifying spatial clusters (e.g.,
scales, mitigating scale crime or disease clusters). However, two issues make using the methods and interpreting the
sensitivity of spatial scan results non-trivial: (1) the methods lack cartographic support for understanding the clusters in
statistics and enhancing geographic context and (2) results from the method are sensitive to parameter choices related to
the interpretation of the cluster scaling parameters. Most spatial scan statistics provide no direct support for making
results. these choices. In order to address these issues, we have developed a novel geovisual analytics
approach that coordinates with Kulldorff’s spatial scan statistic as an example. This approach,
which we have implemented in Visual Inquiry Toolkit, mitigates the sensitivity problem and
enhances the interpretation of SaTScan results.
In the example below, U.S. vehicle theft data in 2000 are scanned with scales of 4%, 6%, 8%,
10%, 20%, 30%, 40%, 50% of total population at risk. The map matrix shows instability of
clusters reported at smaller scales. The reliability map displays reliable, high risk clusters of ,
U.S. vehicle theft in 2000.

Multi-scale analysis of spatial clusters with Visual Inquiry Toolkit : Map
matrix (left) displays high risk clusters (as highlighted in orange). Some
clusters are heterogeneous – containing normal or low risk locations ( in
white or blue ). The reliability map below displays reliable, high risk
clusters (in dark green)

RgoogleMaps
Shortcomings of API:
URL Size Restriction: Static Map URLs are restricted to 2048
characters in size. In practice, you will probably not have need for URLs
longer than this, unless you produce complicated maps with a high
number of markers and paths. Note, however, that certain characters may
be URL-encoded by browsers and/or services before sending them oﬀ to
the Static Map service, resulting in increased character usage
Marker and line styles limited
Switching Environments slows down development
The RgoogleMaps package serves two purposes:
Provide a comfortable R interface to query the Google server for static
maps
Use the map as a background image to overlay plots within R. This
requires proper coordinate scaling.
Note: For dynamic map overlays there are other great packages such as
plotGoogleMaps and googleVis.

3 Essential Commands
mapSD = GetMap(center=c(32.7073, -117.162), zoom=10,
destﬁle=’SDconv.png’)

PlotOnStaticMap
PlotOnStaticMap(mapSD, lat=myTrails$lon, lon=myTrails$lon,
col=myTrails$col, cex = myTrails$cex)

PlotPolysOnStaticMap
PlotPolysOnStaticMap(map, shp, lwd=.5, col = shp[,’col’]);
shp=importShapeﬁle(shpFile,projection=’LL’);

Scan Statistic
In this spatial surveillance setting, each day we have data collected for a set of
discrete spatial locations si. For each locationsi , we have a count ci (e.g.
number of disease cases), and an underlying baseline bi . The baseline may
correspond to the underlying population at risk, or may be an estimate of the
expected value of the count (e.g. derived from the time series of previous count
data).
Our goal, then, is to ﬁnd if there is any spatial region S (set of locations si ) for
which the counts are signiﬁcantly higher than expected, given the baselines. For
simplicity, we assume that the locations si are aggregated to a uniform,
two-dimensional, N × N grid G , and we search over the set of all rectangular
regions

CitySense

This type of spatial surveillance is computationally expensive: O(R · N 4 )

Examples: pennLC clusters

Most Likely Cluster

42.5°N
42°N
41.5°N
41°N
40.5°N
40°N
39.5°N

80°W 79°W 78°W 77°W 76°W 75°W

Examples: NYleukemia
Census tract level (n=277) leukemia data for the 8 counties in upstate New
York from 1978 − 1982, paired with population data from the 1980 census.

Examples: NYleukemia

Most Likely Cluster

43.2°N 43.4°N
43°N
42.2°N 42.4°N 42.6°N 42.8°N
42°N

77.5°W 77°W 76.5°W 76°W 75.5°W 75°W 74.5°W

Unsupervised as Supervised Learning
Introduced in Hastie et al for density estimation or association rule
generalizations. Problem must be enlarged with a simulated data set generated
by Monte Carlo techniques

• •• ••
•• • • •• •• • •
•
6
•

6
• •• • • •
• • • • •• • • • •••
•
• •
• • •• • • •• • • • • • • • •• •
•• •• • •• • • • • ••
•• •
4

4
•• • • • • •• •• • • ••
• • • • • • • •• • •• • • • •• •• •
• • • •• • • • •• • • • • • •
• ••• •• • • • •• •• • • • ••• •• • •• •• •• • •• •• •
X2

X2
•• •• • • • •• • • • ••
2

2
• •• •••• •• •• • •
• ••• • • ••• ••• • • • • •
• ••• • •
• • • • •• • ••
• •• • • •
•• ••• •• •••• • • •••••••••• •
• • • • ••• • •• • • • •• • •••• • • •••• •• • • •
• • ••
• • • • • • •• • • • • • ••• •• • • ••• ••••••••••••• •
• •• ••
••• •••• •••• • ••• •• ••••• •••• •• •• •
• • •••• • •• ••••• ••• • •
• • • • ••• • • • ••• • •• ••••• ••• • •
• •• • •
0

0
• •••• •••• • •
• • • • •• • • • ••• ••• •• • • •• •
• ••
• • • •• • • • • • • • • • • • •• • • • • • ••
• • •• • •
• •• •• • •• •••• ••• • • • • • •
•
-2

-2
• • •
• •• • • • • • •• •• ••• • •
•
• •
-1 0 1 2 -1 0 1 2
X1 X1

FIGURE 14.3. Density estimation via classiﬁcation. (Left panel:) Training set
of 200 data points. (Right panel:) Training set plus 200 reference data points,
generated uniformly over the rectangle containing the training data. The training
sample was labeled as class 1, and the reference sample class 0, and a semipara-
metric logistic regression model was ﬁt to the data. Some contours for g (x) are
ˆ
shown.

In epidemiology cases and population naturally provide two classes, for anomaly
detection the background population¨ taken to be some sort of average.
ıs

Detecting simple clusters of overdensity

patient rule induction method
Introduced in Hastie et al: Bump Hunting

Algorithm 9.3 Patient Rule Induction Method.
1. Start with all of the training data, and a maximal box containing all
of the data.

2. Consider shrinking the box by compressing one face, so as to peel oﬀ
the proportion α of observations having either the highest values of
a predictor Xj , or the lowest. Choose the peeling that produces the
highest response mean in the remaining box. (Typically α = 0.05 or
0.10.)

3. Repeat step 2 until some minimal number of observations (say 10)
remain in the box.

4. Expand the box along any face, as long as the resulting box mean
increases.
5. Steps 1–4 give a sequence of boxes, with diﬀerent numbers of obser-
vations in each box. Use cross-validation to choose a member of the
sequence. Call the box B1 .
6. Remove the data in box B1 from the dataset and repeat steps 2–5 to
obtain a second box, and continue to get as many boxes as desired.

PRIM example
1 2 3 4
oo o ooo o oo o ooo o oo o ooo o oo o ooo o
oo o oo o oo o oo o
o o o o o oo o ooo oo o
o o o o o o o o oo o ooo oo o
o o o
o oo o
o o oo o o oo o
o o oo o o oo o
o o oo o o oo o
o o oo o
oo oo o o
o o oo oo oo o o
o o oo oo oo o o
o o oo oo oo o o
o o oo
o
oo o o o oo
o o
oo o o o oo
o o
oo o o o oo
o o
oo o o o oo
o
o o
o o o o o
o o o o o
o o o o o
o o o
o o o oo o
o o oo o
o oo oo o o o oo o
o o oo o
o oo oo o o o oo o
o o oo o
o oo oo o o o oo o
o o oo o
o oo oo
oo
o
o o oo o oo oo o
o o o o o
o o oo
o
o o oo o oo oo o
o o o o o
o o oo
o
o o oo o oo oo o
o o o o o
o o oo
o
o o oo o oo oo o
o o o o o
o o
oo o o oo o o oo o o oo o o
o o oo o o o o oo o o o o oo o o o o oo o o
o o o o oo o o o o oo o o
o oo o o o o o oo o o o o oo o o
o oo o
o o o o oo o o o o o oo o o o o o oo o o o o o oo o
o o o o o o o o o o o o
oo o o o o o o
oo
o oo o o o o o o
oo
o oo o o o o o o
oo
o oo o o o o o o
oo
o
o o o
o o o o o ooo o o o oo
o o o o o o
o o o o o o
o o o o o o
o o o
o o oo o o o o o o o oo o o o o o o o oo o o o o o o o oo o o o o o
o o o o o o o o

5 6 7 8
oo o oo o oo o oo o
o o o
o oo o
o o oo o o oo o
o o oo o o oo o
o o oo o o oo o
o o oo o
oo oo o o
o o oo oo oo o o
o o oo oo oo o o
o o oo oo oo o o
o o oo
o
oo o o o oo
o o
oo o o o oo
o o
oo o o o oo
o o
oo o o o oo
o
o o
o o o o o
o o o o o
o o o o o
o o o
o o o oo o
o o oo o
o oo oo o o o oo o
o o oo o
o oo oo o o o oo o
o o oo o
o oo oo o o o oo o
o o oo o
o oo oo
oo
o
o o oo o oo oo o
o o o o o
o o oo
o
o o oo o oo oo o
o o o o o
o o oo
o
o o oo o oo oo o
o o o o o
o o oo
o
o o oo o oo oo o
o o o o o
o o
oo o o oo o o oo o o oo o o
o oo o
o o o o o o o o o o o o
oo o o o o o o
oo
o oo o o o o o o
oo
o oo o o o o o o
oo
o oo o o o o o o
oo
o
o o o
o o o o o o
o o o o o o
o o o o o o
o o o
o o o o o o o o

12 17 22 27
oo o oo o oo o oo o
o o o
o oo o
o o oo o o oo o
o o oo o o oo o
o o oo o o oo o
o o oo o
oo oo o o
o o oo oo oo o o
o o oo oo oo o o
o o oo oo oo o o
o o oo
o
oo o o o oo
o o
oo o o o oo
o o
oo o o o oo
o o
oo o o o oo
o
o oo o oo o
o o o o o o o
oo o
oo oo o oo o oo o
o o o o o o o
oo o
oo oo o oo o oo o
o o o o o o o
oo o
oo oo o oo o oo o
o o o o o o o
oo o
oo oo
ooo
o ooo o
o o oo o
o o o o
o o ooo
o ooo o
o o oo o
o o o o
o o ooo
o ooo o
o o oo o
o o o o
o o ooo
o ooo o
o o oo o
o o o o
o o
oo o o o oo o o o oo o o o oo o o o
o oo o
o o o o o o o o oo
oo
o
o o o o o o o o o oo
oo
o
oo
o
oo
o
o
o o o
o o o o o o
o o o o o o
o o o o o o
o o o
o o o o o o o o

PRIM boxes found

Uniform Background Clustered Background

Outlook, Issues
Openstreet Maps (OSM)
Local Database of map tiles
Keep size reasonable by limiting number of zoom levels and spatial
extent.
Construction of static map from local tiles

Google Map Math
˜
With lat = π · lat/180, the transformation from lat/lon to pixels is given by:

1 ˜
1 + sin (lat)
˜
Y = log (1)
2π ˜
1 − sin (lat)
˜ ˜
Y = 2zoom−1 ∗ (1 − Y ), X = 2zoom−1 ∗ (X + 1) (2)
The integer part of X , Y speciﬁes the tile, whereas the fractional part times
256 is the pixel coordinate within the Tile itself:

x = 256 ∗ (X − X ), y = 256 ∗ (Y − Y )

Inverting these relationships is rather straightforward. Eq. (1) leads to
˜
exp 2π Y − 1
lat = 2πn ± sin−1
˜ + π, n ∈ Z (3)
˜
exp 2π Y + 1
whereas inverting Eqs. (2) gives
˜ ˜
Y = 1 − Y /2zoom−1 , X = X /2zoom−1 − 1
˜
For longitude, the inverse mapping is much simpler: Since X = lon/180, we get

lon = 180 · X /2zoom−1 − 1 .

Local Database of map tiles
Keep size reasonable by limiting number of zoom levels and spatial
extent.
Construction of static map from local tiles

Visualizing Spatiotemporal Clusters on Maps

Recommandé

Recommandé

Contenu connexe

Similaire à Visualizing Spatiotemporal Clusters on Maps

Similaire à Visualizing Spatiotemporal Clusters on Maps (20)

Dernier

Dernier (20)

Visualizing Spatiotemporal Clusters on Maps