The document discusses methods for identifying spatiotemporal clusters from event count data. It describes several algorithms commonly used for hot spot detection, including the spatial scan statistic (SatScan), which identifies unusually high concentrations of events without prior assumptions about cluster size or shape. Unsupervised learning methods like classification and regression trees (CART) and patient rule induction are also proposed as alternatives to identify overdensities relative to an expected baseline. Examples of applying these techniques to disease outbreak and crime data are provided.
2. Motivation
Spatial Plots in R
Spatiotemporal Clusters
RgoogleMaps
Static Google Maps
Integration of maps into R
Hot Spot Detection
SatScan
SpatialEpi
Unsupervised as Supervised Learning
Trees
PRIM
Outlook
OpenstreetMaps
Offline use and enlarged maps
4. Motivation II a
The meuse data set gives locations and topsoil heavy metal concentrations, along with a number of soil and landscape variablesat the
observation locations, collected in a flood plain of the river Meuse, near the village of Stein (NL). Heavy metal concentrations are from
6. Examples: pennLC incidence
County-level (n = 67) population/case data for lung cancer in Pennsylvania in
2002, stratified on race (white vs non-white), gender and age (Under 40, 40-59,
60-69 and 70+). The data also contain county-specific smoking rates.
0.0263 − 0.0364
0.0364 − 0.0504
0.0504 − 0.0698
0.0698 − 0.0967
0.0967 − 0.134
8. Spatiotemporal Clusters
Scoring unusual events in space and time has been an active and
important field of research for decades: How do we
distinguish normal fluctuations in a stochastic count process from
real additive events ?
identify spatiotemporal clusters where the event is most strongly
pronounced ?
efficiently graph these clusters in a map overlay ?
Supervised learning algorithms are proposed as an alternative to the
computationally expensive scan statistic.
The task can be reduced to detecting over-densities in space relative to a
background density.
9. Motivation III
North-East Visualization and Analytics Center
Geovisual analytics of spatial scan
Benefit: Geovisual
statistics
analytics of spatial scan Combining geovisual analytics with spatial scan statistic
statistics support s spatial
cluster analysis at multiple Spatial scan statistics method s are widely used in identifying and verifying spatial clusters (e.g.,
scales, mitigating scale crime or disease clusters). However, two issues make using the methods and interpreting the
sensitivity of spatial scan results non-trivial: (1) the methods lack cartographic support for understanding the clusters in
statistics and enhancing geographic context and (2) results from the method are sensitive to parameter choices related to
the interpretation of the cluster scaling parameters. Most spatial scan statistics provide no direct support for making
results. these choices. In order to address these issues, we have developed a novel geovisual analytics
approach that coordinates with Kulldorff’s spatial scan statistic as an example. This approach,
which we have implemented in Visual Inquiry Toolkit, mitigates the sensitivity problem and
enhances the interpretation of SaTScan results.
In the example below, U.S. vehicle theft data in 2000 are scanned with scales of 4%, 6%, 8%,
10%, 20%, 30%, 40%, 50% of total population at risk. The map matrix shows instability of
clusters reported at smaller scales. The reliability map displays reliable, high risk clusters of ,
U.S. vehicle theft in 2000.
Multi-scale analysis of spatial clusters with Visual Inquiry Toolkit : Map
matrix (left) displays high risk clusters (as highlighted in orange). Some
clusters are heterogeneous – containing normal or low risk locations ( in
white or blue ). The reliability map below displays reliable, high risk
clusters (in dark green)
15. RgoogleMaps
Shortcomings of API:
URL Size Restriction: Static Map URLs are restricted to 2048
characters in size. In practice, you will probably not have need for URLs
longer than this, unless you produce complicated maps with a high
number of markers and paths. Note, however, that certain characters may
be URL-encoded by browsers and/or services before sending them off to
the Static Map service, resulting in increased character usage
Marker and line styles limited
Switching Environments slows down development
The RgoogleMaps package serves two purposes:
Provide a comfortable R interface to query the Google server for static
maps
Use the map as a background image to overlay plots within R. This
requires proper coordinate scaling.
Note: For dynamic map overlays there are other great packages such as
plotGoogleMaps and googleVis.
19. Scan Statistic
In this spatial surveillance setting, each day we have data collected for a set of
discrete spatial locations si. For each locationsi , we have a count ci (e.g.
number of disease cases), and an underlying baseline bi . The baseline may
correspond to the underlying population at risk, or may be an estimate of the
expected value of the count (e.g. derived from the time series of previous count
data).
Our goal, then, is to find if there is any spatial region S (set of locations si ) for
which the counts are significantly higher than expected, given the baselines. For
simplicity, we assume that the locations si are aggregated to a uniform,
two-dimensional, N × N grid G , and we search over the set of all rectangular
regions
20. CitySense
This type of spatial surveillance is computationally expensive: O(R · N 4 )
22. Examples: NYleukemia
Census tract level (n=277) leukemia data for the 8 counties in upstate New
York from 1978 − 1982, paired with population data from the 1980 census.
27. patient rule induction method
Introduced in Hastie et al: Bump Hunting
Algorithm 9.3 Patient Rule Induction Method.
1. Start with all of the training data, and a maximal box containing all
of the data.
2. Consider shrinking the box by compressing one face, so as to peel off
the proportion α of observations having either the highest values of
a predictor Xj , or the lowest. Choose the peeling that produces the
highest response mean in the remaining box. (Typically α = 0.05 or
0.10.)
3. Repeat step 2 until some minimal number of observations (say 10)
remain in the box.
4. Expand the box along any face, as long as the resulting box mean
increases.
5. Steps 1–4 give a sequence of boxes, with different numbers of obser-
vations in each box. Use cross-validation to choose a member of the
sequence. Call the box B1 .
6. Remove the data in box B1 from the dataset and repeat steps 2–5 to
obtain a second box, and continue to get as many boxes as desired.
28. PRIM example
1 2 3 4
oo o ooo o oo o ooo o oo o ooo o oo o ooo o
oo o oo o oo o oo o
o o o o o oo o ooo oo o
o o o o o o o o oo o ooo oo o
o o o o o o o o oo o ooo oo o
o o o o o o o o oo o ooo oo o
o o o
o oo o
o o oo o o oo o
o o oo o o oo o
o o oo o o oo o
o o oo o
oo oo o o
o o oo oo oo o o
o o oo oo oo o o
o o oo oo oo o o
o o oo
o
oo o o o oo
o o
oo o o o oo
o o
oo o o o oo
o o
oo o o o oo
o
o o
o o o o o
o o o o o
o o o o o
o o o
o o o oo o
o o oo o
o oo oo o o o oo o
o o oo o
o oo oo o o o oo o
o o oo o
o oo oo o o o oo o
o o oo o
o oo oo
oo
o
o o oo o oo oo o
o o o o o
o o oo
o
o o oo o oo oo o
o o o o o
o o oo
o
o o oo o oo oo o
o o o o o
o o oo
o
o o oo o oo oo o
o o o o o
o o
oo o o oo o o oo o o oo o o
o o oo o o o o oo o o o o oo o o o o oo o o
o o o o oo o o o o oo o o
o oo o o o o o oo o o o o oo o o
o oo o o o o o oo o o o o oo o o
o oo o o o o o oo o o o o oo o o
o oo o
o o o o oo o o o o o oo o o o o o oo o o o o o oo o
o o o o o o o o o o o o
oo o o o o o o
oo
o oo o o o o o o
oo
o oo o o o o o o
oo
o oo o o o o o o
oo
o
o o o
o o o o o ooo o o o oo
o o o o o o
o o o o o ooo o o o oo
o o o o o o
o o o o o ooo o o o oo
o o o o o o
o o o o o ooo o o o oo
o o o
o o oo o o o o o o o oo o o o o o o o oo o o o o o o o oo o o o o o
o o o o o o o o
5 6 7 8
oo o ooo o oo o ooo o oo o ooo o oo o ooo o
oo o oo o oo o oo o
o o o o o oo o ooo oo o
o o o o o o o o oo o ooo oo o
o o o o o o o o oo o ooo oo o
o o o o o o o o oo o ooo oo o
o o o
o oo o
o o oo o o oo o
o o oo o o oo o
o o oo o o oo o
o o oo o
oo oo o o
o o oo oo oo o o
o o oo oo oo o o
o o oo oo oo o o
o o oo
o
oo o o o oo
o o
oo o o o oo
o o
oo o o o oo
o o
oo o o o oo
o
o o
o o o o o
o o o o o
o o o o o
o o o
o o o oo o
o o oo o
o oo oo o o o oo o
o o oo o
o oo oo o o o oo o
o o oo o
o oo oo o o o oo o
o o oo o
o oo oo
oo
o
o o oo o oo oo o
o o o o o
o o oo
o
o o oo o oo oo o
o o o o o
o o oo
o
o o oo o oo oo o
o o o o o
o o oo
o
o o oo o oo oo o
o o o o o
o o
oo o o oo o o oo o o oo o o
o o oo o o o o oo o o o o oo o o o o oo o o
o o o o oo o o o o oo o o
o oo o o o o o oo o o o o oo o o
o oo o o o o o oo o o o o oo o o
o oo o o o o o oo o o o o oo o o
o oo o
o o o o oo o o o o o oo o o o o o oo o o o o o oo o
o o o o o o o o o o o o
oo o o o o o o
oo
o oo o o o o o o
oo
o oo o o o o o o
oo
o oo o o o o o o
oo
o
o o o
o o o o o ooo o o o oo
o o o o o o
o o o o o ooo o o o oo
o o o o o o
o o o o o ooo o o o oo
o o o o o o
o o o o o ooo o o o oo
o o o
o o oo o o o o o o o oo o o o o o o o oo o o o o o o o oo o o o o o
o o o o o o o o
12 17 22 27
oo o ooo o oo o ooo o oo o ooo o oo o ooo o
oo o oo o oo o oo o
o o o o o oo o ooo oo o
o o o o o o o o oo o ooo oo o
o o o o o o o o oo o ooo oo o
o o o o o o o o oo o ooo oo o
o o o
o oo o
o o oo o o oo o
o o oo o o oo o
o o oo o o oo o
o o oo o
oo oo o o
o o oo oo oo o o
o o oo oo oo o o
o o oo oo oo o o
o o oo
o
oo o o o oo
o o
oo o o o oo
o o
oo o o o oo
o o
oo o o o oo
o
o oo o oo o
o o o o o o o
oo o
oo oo o oo o oo o
o o o o o o o
oo o
oo oo o oo o oo o
o o o o o o o
oo o
oo oo o oo o oo o
o o o o o o o
oo o
oo oo
ooo
o ooo o
o o oo o
o o o o
o o ooo
o ooo o
o o oo o
o o o o
o o ooo
o ooo o
o o oo o
o o o o
o o ooo
o ooo o
o o oo o
o o o o
o o
oo o o o oo o o o oo o o o oo o o o
o o oo o o o o oo o o o o oo o o o o oo o o
o o o o oo o o o o oo o o
o oo o o o o o oo o o o o oo o o
o oo o o o o o oo o o o o oo o o
o oo o o o o o oo o o o o oo o o
o oo o
o o o o oo o o o o o oo o o o o o oo o o o o o oo o
o o o o o o o o oo
oo
o
o o o o o o o o o oo
oo
o
o o o o o o o o o oo
oo
o
o o o o o o o o o oo
oo
o
o
o o o
o o o o o ooo o o o oo
o o o o o o
o o o o o ooo o o o oo
o o o o o o
o o o o o ooo o o o oo
o o o o o o
o o o o o ooo o o o oo
o o o
o o oo o o o o o o o oo o o o o o o o oo o o o o o o o oo o o o o o
o o o o o o o o
30. Outlook, Issues
Openstreet Maps (OSM)
Offline use and enlarged maps
Local Database of map tiles
Keep size reasonable by limiting number of zoom levels and spatial
extent.
Construction of static map from local tiles
31. Google Map Math
˜
With lat = π · lat/180, the transformation from lat/lon to pixels is given by:
1 ˜
1 + sin (lat)
˜
Y = log (1)
2π ˜
1 − sin (lat)
˜ ˜
Y = 2zoom−1 ∗ (1 − Y ), X = 2zoom−1 ∗ (X + 1) (2)
The integer part of X , Y specifies the tile, whereas the fractional part times
256 is the pixel coordinate within the Tile itself:
x = 256 ∗ (X − X ), y = 256 ∗ (Y − Y )
Inverting these relationships is rather straightforward. Eq. (1) leads to
˜
exp 2π Y − 1
lat = 2πn ± sin−1
˜ + π, n ∈ Z (3)
˜
exp 2π Y + 1
whereas inverting Eqs. (2) gives
˜ ˜
Y = 1 − Y /2zoom−1 , X = X /2zoom−1 − 1
˜
For longitude, the inverse mapping is much simpler: Since X = lon/180, we get
lon = 180 · X /2zoom−1 − 1 .
32. Offline use and enlarged maps
Local Database of map tiles
Keep size reasonable by limiting number of zoom levels and spatial
extent.
Construction of static map from local tiles