Sa Presentation 20070917111 Thomas

Vector based spatial analysis

Nikolaos Spyropoulos and Thomas K. Andersen
Institute of Geography

The ESRI Guide to GIS Analysis, Mitchell 2005

• Chapter 4, Identifying Clusters

• Chapter 5, Analyzing Geographic Relationships

Chapter 4, Identifying Clusters

Identifying Clusters

Why identify clusters?

•Get an understanding of the location pattern in an area

•Compare these patterns with other features, for identifying possible
contributing factors

•Take action on behalf of these identified clusters

Clusters of burglaries Income and emergency calls

Using statistics to identify clusters

Conclusions can be drawn when looking at a map (e.g. where is the
cluster), by using statistics it is possible to test the conclusions and
validate them

With statistics each events is counted as an unique occurrence, which is
hard to see on a map;

Time period of data

The time period of data can vary a lot, from current conditions
to long time periods

- For vacant parcels you need a snapshot of the current
condition, for crimes or earthquakes, defining a time period
is needed

Vacant houses Crimes Earthquakes

Now 6 month 100 years

Therefore: The time period is different, and has to be defined

Distance within clusters

Clusters are usually defined by using Euclidian distance.

Though travel time or cost can also be used.

- Clusters of burglaries can be dependent on driving time
between the crimes.
Because Euclidian distance doesn’t take barriers (such as a
river) into account, the Euclidian distance seems very close,
even though the travel time is long.

Identifying clusters - methods

Two methods for identifying clusters:

1. Finding clusters of features

when features are found in close proximity

2. Finding clusters of similar value

when groups of high and low values are found together
(”hot and cold spots”)

Nearest neighbour hierarchical clustering (1)

”One method for finding clusters is to specify the distance
features can be from each other, in order to be part of a
cluster, and the minimum number of features that make up
a cluster.”

(Mitchell 2005:152)

Clusters with a specified number of features
within a specified distance


The method is hierarchical because the routine continues on to
group the clusters into larger clusters (shows several
geographic scales e.g. neighbourhood and citywide for
crimes).

Clusters at small scale(Neighbourhood) Clusters at bigger scale(citywide)


How nearest neighbour hierarchical clustering works:

A probability level is specified, to calculate the distance
within which features will be considered a cluster

If the distance is greater than the high end of the range, the
features are further apart than you would expect by chance.
For clustering it is opposite, the low end of the scale is
interesting (Confidence interval)
.
The confidence interval is calculated by using the mean
distance that would occur between points in a random
distribution “mean random distance”.

--See page 155 and 156 for calculation

Finding clusters of similar value

Finding clusters of similar values

The GIS looks at the attribute values of each feature and its
neighbours, as well as the proximity of the features.
Then calculates a degree to which nearby features have similar
values for a given attribute.

Percent age 65 or over Percentages of seniors similar to their neighbours
(Blue less similar, red more similar)

Identifying clusters of similar values (1)

Where high values are surrounded by high values or low values are
surrounded by low values, the features are similar

Identifying clusters of similar values (2)

A statistic is calculated for each feature. It is then possible to
map the features based on this value, to see the locations of
features of similar value

Moran’s Ii (1)

A method to identify similar values

Emphasizes how features differ from the values in the study
area as a whole

Compares the value of each feature in a pair to the mean value
for all features in the study area (local variation - the
method looks what’s happening right around each feature)

--Calculation see page 167

Moran’s Ii (2)

The value for Moran’s Ii depends on the difference in attribute values,
the number of neighbours with similar values, and the magnitude of
the attribute data

• A high positive value for indicates that the feature is surrounded by
features with similar values, either high or low.

• A Negative value indicates that the feature is surrounded by features
of dissimilar values.

Gi statistic

Identifying concentrations (clusters) of high and low values
within a distance

Compares neighbouring within a specified distance

Two versions:

1. Gi statistic

2. Gi*

Version 1 - Gi statistic

Is used to find out what’s going on around a feature/or cell,
without taking the target value into account

-Used for dispersion of a certain phenomena in a certain area.
Gi has been used to track down the spreading of AIDS in the
counties in the San Francisco area. It was possible to see
the increase over time and distance

Version 2 – Gi*

The value of the target feature is included. Used to find hot or
cold spots.

A distance (search radius) is defined

This distance is based on the knowledge of the features and
their behaviour. Example: how long are people willing to
travel to go to a certain store? (Euclidian dist., travel time
etc.)

Chapter 5, Analyzing Geographic Relationships

Analyzing Geographic Relationships

Why Analyze Geographic Relationships?

Analysis of feature distributions.

Analysis of relationships between features.

Understanding of Predict where Examine why
what is going on something is things occur
In a place. likely to occur. where they do.


Understanding what is going on in a place.

Example: Analysis of accidents related to speed limit in highways


Predicting where something is likely to occur.

Example: Analysis of landforms in order to identify artifacts locations.


Examine why things occur where they do.

Example: Improvment of newborns health.

Using Statistics to Analyze Relationships

• When we look for relationships we form an opinion about
things based on personal knowledge of phenomena or visual
analysis of the map.

• Statistics allow us to verify those relationships and measure
how strong they are.

• The idea behind using statistics is:

To see in what extent the value of an attribute changes
when an other changes,

measure the relationship between two or more maps
representing the variables (analyze the relationship between
two attribute data).

Assigning Variables to Geography

•Variables from different layers must be associated with the same geographic unit.

Case not:
i)Different cell sizes Ratio
ii)Different set of features Combine feautures
iii)Points representing diff. categories of features Sum Features to area
iv)Combine two or more sets of features Raster

Example: Emergency calls and population data.

Using Statistics to Analyze Geographic
Relationships

Two statistical assumptions:
•Each value is likely to occur equaly to the sample
•The value of an observation doesn’t affect an other value

In Geography:

•Attribute values vary across a region

Regional trends influence attribute values

Relationships

•Nearby features are more similar than distant ones

Spatial autocorrelation

Violation of observations independance

Smaller units tend to be more similar than bigger.

Relationships

Identifying relationships Vs Analyzing processes

Asking for Relationships Analyzing processes
between (x,y)

Measure the extent of main variables
variation that drives
a process
Take actions predict values
Understand of a variable

Identifying Geographic Relationships

How much two attributes vary.

direct relationship inverse relationship
(positive correlation) (negative correlation)

If suspisious about a relationship then:

measure the relationship confirm measure direction
and strenghth

Methods for Identifying Geographic Relationships

•Pearson’s Correlation Coefficient

Methods for Identifying Geographic Relationships

•Spearman’s Rank Correlation Coefficient

measures the extent to which two lists of ranked
values correspond

Identifying Geographic Relationships

What correlation coefficient doesn’t measure

• Can not apply results of correlation e.g. from a county to the
nation.
• Doesn’t measure causation X Y

• Correlation doesn’t explain why there is a relationship.

• Doesn’t measure the form of the relationship just the
dispretion around a straight line.

Analyzing Geographic Processes

We analyze geographic processes in order to predict that
something will occur.

Steps
1. Develop a theory as to what is driving the process
2. Analyze the relationships between various atributes of your
data (build a Model)


Linear Regression Analysis

•Plot variables on chart.
•Find the line that passes between all data points (ordinary least
squares method)


Ordinary Least Squares

Example from Wikipedia


Interpreting the results of regression analysis

We can see how our model works by comparing the variance in
the predicted values to the variance in the observed values.

• Perfect fit (all points on line) then R2 = 1

• Any other case with 1>R2 means not perfect fit

Calculate residuals (differences between predicted & observed
values)

Using More Than One Independent Variable

Most geographic processes aren’t controlled by a single variable

New Regression Analysis Equation

r2 in multivariate regression describes the variation in y explained by the
combination of independent variables.

Using More Than One Independent Variable

Identifying the key variables

Analysis

Test the significance of each variable
t-test

Goal

Factors Influencing the Regression Analysis Results

Least squares regression analysis is effective only if the
following are true:

1. Linear relationship between Y,X.
2. Residuals have a Mean of 0.
3. Residuals have a constant Variance.
4. Residuals are randomly aranged along the regression line.
5. Residuals are normaly distributed.
6. Independent variables are not highly correlated.

Regretion Analysis & Geographic Data

For geographic data misspesification can result from many sources.

Can Occur When:

Analyze data Missing variables
at the wrong
scale for the process

Dealing With Regional Variation

Geographic Weighted Regression (GWR)

• Allows model coefficients to vary regionally.

• Regression runs for each location and not as a whole.

Example: Per capita income.

Dealing with Local Trends

Methods to address local trends.

Resampling Spatial filtering
(remove spatial
autocorrelation)

Running A Linear Regression Analysis With
Geographic Data.

1. Determine what are you trying to predict.

2. Identify the key independent variables.

3. Examine the distribution of your variables.

4. Run the ordinary least squares regression.

5. Examine the coefficients for each independent variable.

6. Examine the residuals.
• Test for spatial autocorrelation
• Look for missing variables
• Plot y-values against residuals
• Create a frequency curve

Sa Presentation 20070917111 Thomas

Recommandé

Recommandé

Contenu connexe

Similaire à Sa Presentation 20070917111 Thomas

Similaire à Sa Presentation 20070917111 Thomas (20)

Plus de nspiropo

Plus de nspiropo (11)

Sa Presentation 20070917111 Thomas

Notes de l'éditeur