4. Identifying Clusters
Why identify clusters?
•Get an understanding of the location pattern in an area
•Compare these patterns with other features, for identifying possible
contributing factors
•Take action on behalf of these identified clusters
Clusters of burglaries Income and emergency calls
5. Using statistics to identify clusters
Conclusions can be drawn when looking at a map (e.g. where is the
cluster), by using statistics it is possible to test the conclusions and
validate them
With statistics each events is counted as an unique occurrence, which is
hard to see on a map;
6. Time period of data
The time period of data can vary a lot, from current conditions
to long time periods
- For vacant parcels you need a snapshot of the current
condition, for crimes or earthquakes, defining a time period
is needed
Vacant houses Crimes Earthquakes
Now 6 month 100 years
Therefore: The time period is different, and has to be defined
7. Distance within clusters
Clusters are usually defined by using Euclidian distance.
Though travel time or cost can also be used.
- Clusters of burglaries can be dependent on driving time
between the crimes.
Because Euclidian distance doesn’t take barriers (such as a
river) into account, the Euclidian distance seems very close,
even though the travel time is long.
8. Identifying clusters - methods
Two methods for identifying clusters:
1. Finding clusters of features
when features are found in close proximity
2. Finding clusters of similar value
when groups of high and low values are found together
(”hot and cold spots”)
10. Nearest neighbour hierarchical clustering (1)
”One method for finding clusters is to specify the distance
features can be from each other, in order to be part of a
cluster, and the minimum number of features that make up
a cluster.”
(Mitchell 2005:152)
Clusters with a specified number of features
within a specified distance
11. Nearest neighbour hierarchical clustering (2)
The method is hierarchical because the routine continues on to
group the clusters into larger clusters (shows several
geographic scales e.g. neighbourhood and citywide for
crimes).
Clusters at small scale(Neighbourhood) Clusters at bigger scale(citywide)
12. Nearest neighbour hierarchical clustering (3)
How nearest neighbour hierarchical clustering works:
A probability level is specified, to calculate the distance
within which features will be considered a cluster
If the distance is greater than the high end of the range, the
features are further apart than you would expect by chance.
For clustering it is opposite, the low end of the scale is
interesting (Confidence interval)
.
The confidence interval is calculated by using the mean
distance that would occur between points in a random
distribution “mean random distance”.
--See page 155 and 156 for calculation
14. Finding clusters of similar values
The GIS looks at the attribute values of each feature and its
neighbours, as well as the proximity of the features.
Then calculates a degree to which nearby features have similar
values for a given attribute.
Percent age 65 or over Percentages of seniors similar to their neighbours
(Blue less similar, red more similar)
15. Identifying clusters of similar values (1)
Where high values are surrounded by high values or low values are
surrounded by low values, the features are similar
16. Identifying clusters of similar values (2)
A statistic is calculated for each feature. It is then possible to
map the features based on this value, to see the locations of
features of similar value
17. Moran’s Ii (1)
A method to identify similar values
Emphasizes how features differ from the values in the study
area as a whole
Compares the value of each feature in a pair to the mean value
for all features in the study area (local variation - the
method looks what’s happening right around each feature)
--Calculation see page 167
18. Moran’s Ii (2)
The value for Moran’s Ii depends on the difference in attribute values,
the number of neighbours with similar values, and the magnitude of
the attribute data
• A high positive value for indicates that the feature is surrounded by
features with similar values, either high or low.
• A Negative value indicates that the feature is surrounded by features
of dissimilar values.
19. Gi statistic
Identifying concentrations (clusters) of high and low values
within a distance
Compares neighbouring within a specified distance
Two versions:
1. Gi statistic
2. Gi*
20. Version 1 - Gi statistic
Is used to find out what’s going on around a feature/or cell,
without taking the target value into account
-Used for dispersion of a certain phenomena in a certain area.
Gi has been used to track down the spreading of AIDS in the
counties in the San Francisco area. It was possible to see
the increase over time and distance
21. Version 2 – Gi*
The value of the target feature is included. Used to find hot or
cold spots.
A distance (search radius) is defined
This distance is based on the knowledge of the features and
their behaviour. Example: how long are people willing to
travel to go to a certain store? (Euclidian dist., travel time
etc.)
23. Analyzing Geographic Relationships
Why Analyze Geographic Relationships?
Analysis of feature distributions.
Analysis of relationships between features.
Understanding of Predict where Examine why
what is going on something is things occur
In a place. likely to occur. where they do.
24. Why Analyze Geographic Relationships?
Understanding what is going on in a place.
Example: Analysis of accidents related to speed limit in highways
25. Why Analyze Geographic Relationships?
Predicting where something is likely to occur.
Example: Analysis of landforms in order to identify artifacts locations.
26. Why Analyze Geographic Relationships?
Examine why things occur where they do.
Example: Improvment of newborns health.
27. Using Statistics to Analyze Relationships
• When we look for relationships we form an opinion about
things based on personal knowledge of phenomena or visual
analysis of the map.
• Statistics allow us to verify those relationships and measure
how strong they are.
• The idea behind using statistics is:
To see in what extent the value of an attribute changes
when an other changes,
measure the relationship between two or more maps
representing the variables (analyze the relationship between
two attribute data).
28. Assigning Variables to Geography
•Variables from different layers must be associated with the same geographic unit.
Case not:
i)Different cell sizes Ratio
ii)Different set of features Combine feautures
iii)Points representing diff. categories of features Sum Features to area
iv)Combine two or more sets of features Raster
Example: Emergency calls and population data.
29. Using Statistics to Analyze Geographic
Relationships
Two statistical assumptions:
•Each value is likely to occur equaly to the sample
•The value of an observation doesn’t affect an other value
In Geography:
•Attribute values vary across a region
Regional trends influence attribute values
30. Using Statistics to Analyze Geographic
Relationships
•Nearby features are more similar than distant ones
Spatial autocorrelation
Violation of observations independance
Smaller units tend to be more similar than bigger.
31. Using Statistics to Analyze Geographic
Relationships
Identifying relationships Vs Analyzing processes
Asking for Relationships Analyzing processes
between (x,y)
Measure the extent of main variables
variation that drives
a process
Take actions predict values
Understand of a variable
32. Identifying Geographic Relationships
How much two attributes vary.
direct relationship inverse relationship
(positive correlation) (negative correlation)
If suspisious about a relationship then:
measure the relationship confirm measure direction
and strenghth
34. Methods for Identifying Geographic Relationships
•Spearman’s Rank Correlation Coefficient
measures the extent to which two lists of ranked
values correspond
35. Identifying Geographic Relationships
What correlation coefficient doesn’t measure
• Can not apply results of correlation e.g. from a county to the
nation.
• Doesn’t measure causation X Y
• Correlation doesn’t explain why there is a relationship.
• Doesn’t measure the form of the relationship just the
dispretion around a straight line.
36. Analyzing Geographic Processes
We analyze geographic processes in order to predict that
something will occur.
Steps
1. Develop a theory as to what is driving the process
2. Analyze the relationships between various atributes of your
data (build a Model)
37. Analyzing Geographic Processes
Linear Regression Analysis
•Plot variables on chart.
•Find the line that passes between all data points (ordinary least
squares method)
39. Analyzing Geographic Processes
Interpreting the results of regression analysis
We can see how our model works by comparing the variance in
the predicted values to the variance in the observed values.
• Perfect fit (all points on line) then R2 = 1
• Any other case with 1>R2 means not perfect fit
Calculate residuals (differences between predicted & observed
values)
40. Using More Than One Independent Variable
Most geographic processes aren’t controlled by a single variable
New Regression Analysis Equation
r2 in multivariate regression describes the variation in y explained by the
combination of independent variables.
41. Using More Than One Independent Variable
Identifying the key variables
Analysis
Test the significance of each variable
t-test
Goal
42. Factors Influencing the Regression Analysis Results
Least squares regression analysis is effective only if the
following are true:
1. Linear relationship between Y,X.
2. Residuals have a Mean of 0.
3. Residuals have a constant Variance.
4. Residuals are randomly aranged along the regression line.
5. Residuals are normaly distributed.
6. Independent variables are not highly correlated.
43. Regretion Analysis & Geographic Data
For geographic data misspesification can result from many sources.
Can Occur When:
Analyze data Missing variables
at the wrong
scale for the process
44. Dealing With Regional Variation
Geographic Weighted Regression (GWR)
• Allows model coefficients to vary regionally.
• Regression runs for each location and not as a whole.
Example: Per capita income.
45. Dealing with Local Trends
Methods to address local trends.
Resampling Spatial filtering
(remove spatial
autocorrelation)
46. Running A Linear Regression Analysis With
Geographic Data.
1. Determine what are you trying to predict.
2. Identify the key independent variables.
3. Examine the distribution of your variables.
4. Run the ordinary least squares regression.
5. Examine the coefficients for each independent variable.
6. Examine the residuals.
• Test for spatial autocorrelation
• Look for missing variables
• Plot y-values against residuals
• Create a frequency curve
Notes de l'éditeur
Example: Burglaries
--Example: Clusters of burglaries on driving time between the crimes, especially if there is a barrier between the crime scenes, such as a river (eucl. Distance is close, but travel time is long).