SlideShare une entreprise Scribd logo
1  sur  46
Vector based spatial analysis


Nikolaos Spyropoulos and Thomas K. Andersen
Institute of Geography
The ESRI Guide to GIS Analysis, Mitchell 2005




•   Chapter 4, Identifying Clusters




•   Chapter 5, Analyzing Geographic Relationships
Chapter 4, Identifying Clusters
Identifying Clusters


Why identify clusters?

•Get an understanding of the location pattern in an area

•Compare these patterns with other features, for identifying possible
contributing factors

•Take action on behalf of these identified clusters




Clusters of burglaries                 Income and emergency calls
Using statistics to identify clusters

Conclusions can be drawn when looking at a map (e.g. where is the
cluster), by using statistics it is possible to test the conclusions and
validate them

With statistics each events is counted as an unique occurrence, which is
hard to see on a map;
Time period of data

The time period of data can vary a lot, from current conditions
   to long time periods

   - For vacant parcels you need a snapshot of the current
   condition, for crimes or earthquakes, defining a time period
   is needed



Vacant houses     Crimes                     Earthquakes

Now               6 month                    100 years



Therefore: The time period is different, and has to be defined
Distance within clusters


Clusters are usually defined by using Euclidian distance.



Though travel time or cost can also be used.

-   Clusters of burglaries can be dependent on driving time
    between the crimes.
    Because Euclidian distance doesn’t take barriers (such as a
    river) into account, the Euclidian distance seems very close,
    even though the travel time is long.
Identifying clusters - methods

Two methods for identifying clusters:



1. Finding clusters of features

    when features are found in close proximity



2. Finding clusters of similar value

   when groups of high and low values are found together
   (”hot and cold spots”)
Finding clusters of features
Nearest neighbour hierarchical clustering (1)

”One method for finding clusters is to specify the distance
   features can be from each other, in order to be part of a
   cluster, and the minimum number of features that make up
   a cluster.”

   (Mitchell 2005:152)




                         Clusters with a specified number of features
                         within a specified distance
Nearest neighbour hierarchical clustering (2)

   The method is hierarchical because the routine continues on to
      group the clusters into larger clusters (shows several
      geographic scales e.g. neighbourhood and citywide for
      crimes).




Clusters at small scale(Neighbourhood)   Clusters at bigger scale(citywide)
Nearest neighbour hierarchical clustering (3)

How nearest neighbour hierarchical clustering works:

    A probability level is specified, to calculate the distance
    within which features will be considered a cluster

    If the distance is greater than the high end of the range, the
    features are further apart than you would expect by chance.
    For clustering it is opposite, the low end of the scale is
    interesting (Confidence interval)
.
    The confidence interval is calculated by using the mean
    distance that would occur between points in a random
    distribution “mean random distance”.

--See page 155 and 156 for calculation
Finding clusters of similar value
Finding clusters of similar values

The GIS looks at the attribute values of each feature and its
   neighbours, as well as the proximity of the features.
Then calculates a degree to which nearby features have similar
   values for a given attribute.




Percent age 65 or over             Percentages of seniors similar to their neighbours
                                   (Blue less similar, red more similar)
Identifying clusters of similar values (1)

  Where high values are surrounded by high values or low values are
  surrounded by low values, the features are similar
Identifying clusters of similar values (2)

A statistic is calculated for each feature. It is then possible to
   map the features based on this value, to see the locations of
   features of similar value
Moran’s Ii (1)

A method to identify similar values

Emphasizes how features differ from the values in the study
  area as a whole

Compares the value of each feature in a pair to the mean value
  for all features in the study area (local variation - the
  method looks what’s happening right around each feature)




--Calculation see page 167
Moran’s Ii (2)

The value for Moran’s Ii depends on the difference in attribute values,
   the number of neighbours with similar values, and the magnitude of
   the attribute data

•   A high positive value for indicates that the feature is surrounded by
    features with similar values, either high or low.

•   A Negative value indicates that the feature is surrounded by features
    of dissimilar values.
Gi statistic

Identifying concentrations (clusters) of high and low values
  within a distance

Compares neighbouring within a specified distance



Two versions:

1. Gi statistic

2. Gi*
Version 1 - Gi statistic

Is used to find out what’s going on around a feature/or cell,
    without taking the target value into account



-Used for dispersion of a certain phenomena in a certain area.
   Gi has been used to track down the spreading of AIDS in the
   counties in the San Francisco area. It was possible to see
   the increase over time and distance
Version 2 – Gi*

The value of the target feature is included. Used to find hot or
   cold spots.

A distance (search radius) is defined

   This distance is based on the knowledge of the features and
   their behaviour. Example: how long are people willing to
   travel to go to a certain store? (Euclidian dist., travel time
   etc.)
Chapter 5, Analyzing Geographic Relationships
Analyzing Geographic Relationships

Why Analyze Geographic Relationships?



               Analysis of feature distributions.



          Analysis of relationships between features.



Understanding of       Predict where           Examine why
what is going on       something is            things occur
In a place.            likely to occur.        where they do.
Why Analyze Geographic Relationships?


Understanding what is going on in a place.

Example: Analysis of accidents related to speed limit in highways
Why Analyze Geographic Relationships?


Predicting where something is likely to occur.

Example: Analysis of landforms in order to identify artifacts locations.
Why Analyze Geographic Relationships?


Examine why things occur where they do.

Example: Improvment of newborns health.
Using Statistics to Analyze Relationships

•   When we look for relationships we form an opinion about
    things based on personal knowledge of phenomena or visual
    analysis of the map.

•   Statistics allow us to verify those relationships and measure
    how strong they are.

•   The idea behind using statistics is:



    To see in what extent the value of an attribute changes
    when an other changes,



    measure the relationship between two or more maps
    representing the variables (analyze the relationship between
    two attribute data).
Assigning Variables to Geography

•Variables from different layers must be associated with the same geographic unit.

Case not:
i)Different cell sizes                                                      Ratio
ii)Different set of features                                   Combine feautures
iii)Points representing diff. categories of features         Sum Features to area
iv)Combine two or more sets of features                                    Raster



Example: Emergency calls and population data.
Using Statistics to Analyze Geographic
Relationships

Two statistical assumptions:
•Each value is likely to occur equaly to the sample
•The value of an observation doesn’t affect an other value

In Geography:

•Attribute values vary across a region

   Regional trends influence attribute values
Using Statistics to Analyze Geographic
Relationships

•Nearby features are more similar than distant ones

                    Spatial autocorrelation

            Violation of observations independance

       Smaller units tend to be more similar than bigger.
Using Statistics to Analyze Geographic
Relationships

Identifying relationships Vs Analyzing processes



Asking for Relationships               Analyzing processes
between (x,y)



Measure the extent of              main variables
variation                          that drives
                                    a process
Take actions                                     predict values
               Understand                         of a variable
Identifying Geographic Relationships

                  How much two attributes vary.



      direct relationship               inverse relationship
     (positive correlation)            (negative correlation)



If suspisious about a relationship then:

measure the relationship       confirm        measure direction
                                              and strenghth
Methods for Identifying Geographic Relationships

•Pearson’s Correlation Coefficient
Methods for Identifying Geographic Relationships

•Spearman’s Rank Correlation Coefficient

  measures the extent to which two lists of ranked
   values correspond
Identifying Geographic Relationships

What correlation coefficient doesn’t measure

•   Can not apply results of correlation e.g. from a county to the
    nation.
•   Doesn’t measure causation X                     Y

•   Correlation doesn’t explain why there is a relationship.

•   Doesn’t measure the form of the relationship just the
    dispretion around a straight line.
Analyzing Geographic Processes


  We analyze geographic processes in order to predict that
  something will occur.

        Steps
   1. Develop a theory as to what is driving the process
   2. Analyze the relationships between various atributes of your
      data (build a Model)
Analyzing Geographic Processes

Linear Regression Analysis

•Plot variables on chart.
•Find the line that passes between all data points (ordinary least
 squares method)
Analyzing Geographic Processes

Ordinary Least Squares




  Example from Wikipedia
Analyzing Geographic Processes

Interpreting the results of regression analysis

We can see how our model works by comparing the variance in
the predicted values to the variance in the observed values.




•   Perfect fit (all points on line) then R2 = 1

•   Any other case with 1>R2 means not perfect fit

Calculate residuals (differences between predicted & observed
   values)
Using More Than One Independent Variable

Most geographic processes aren’t controlled by a single variable

New Regression Analysis Equation




r2 in multivariate regression describes the variation in y explained by the
combination of independent variables.
Using More Than One Independent Variable

Identifying the key variables

                Analysis

 Test the significance of each variable
                  t-test

                  Goal
Factors Influencing the Regression Analysis Results

Least squares regression analysis is effective only if the
following are true:

1.   Linear relationship between Y,X.
2.   Residuals have a Mean of 0.
3.   Residuals have a constant Variance.
4.   Residuals are randomly aranged along the regression line.
5.   Residuals are normaly distributed.
6.   Independent variables are not highly correlated.
Regretion Analysis & Geographic Data

For geographic data misspesification can result from many sources.

                        Can Occur When:




Analyze data                          Missing variables
at the wrong
scale for the process
Dealing With Regional Variation

Geographic Weighted Regression (GWR)

•   Allows model coefficients to vary regionally.

•   Regression runs for each location and not as a whole.

Example: Per capita income.
Dealing with Local Trends

                   Methods to address local trends.



Resampling                                      Spatial filtering
(remove spatial
autocorrelation)
Running A Linear Regression Analysis With
Geographic Data.

1. Determine what are you trying to predict.

2. Identify the key independent variables.

3. Examine the distribution of your variables.

4. Run the ordinary least squares regression.

5. Examine the coefficients for each independent variable.

6.   Examine the residuals.
•    Test for spatial autocorrelation
•    Look for missing variables
•    Plot y-values against residuals
•    Create a frequency curve

Contenu connexe

Similaire à Sa Presentation 20070917111 Thomas

Spatial analysis of house price determinants
Spatial analysis of house price determinantsSpatial analysis of house price determinants
Spatial analysis of house price determinantsLaurent Lacaze Santos
 
Spatial Analysis of House Price Determinants
Spatial Analysis of House Price DeterminantsSpatial Analysis of House Price Determinants
Spatial Analysis of House Price DeterminantsLaurent Lacaze Santos
 
Spatial data analysis 1
Spatial data analysis 1Spatial data analysis 1
Spatial data analysis 1Johan Blomme
 
Updating Ecological Niche Modeling Methodologies
Updating Ecological Niche Modeling MethodologiesUpdating Ecological Niche Modeling Methodologies
Updating Ecological Niche Modeling MethodologiesTown Peterson
 
D1T3 enm workflows updated
D1T3 enm workflows updatedD1T3 enm workflows updated
D1T3 enm workflows updatedTown Peterson
 
Spatial analysis & interpolation in ARC GIS
Spatial analysis & interpolation in ARC GISSpatial analysis & interpolation in ARC GIS
Spatial analysis & interpolation in ARC GISKU Leuven
 
Topological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial SystemsTopological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial SystemsMason Porter
 
Spatial Data Mining : Seminar
Spatial Data Mining : SeminarSpatial Data Mining : Seminar
Spatial Data Mining : SeminarIpsit Dash
 
Spatial association discovery process using frequent subgraph mining
Spatial association discovery process using frequent subgraph miningSpatial association discovery process using frequent subgraph mining
Spatial association discovery process using frequent subgraph miningTELKOMNIKA JOURNAL
 
Spatial statistics presentation Texas A&M Census RDC
Spatial statistics presentation Texas A&M Census RDCSpatial statistics presentation Texas A&M Census RDC
Spatial statistics presentation Texas A&M Census RDCCorey Sparks
 
Descriptive statistics-Skewness-Kurtosis-Correlation.ppt
Descriptive statistics-Skewness-Kurtosis-Correlation.pptDescriptive statistics-Skewness-Kurtosis-Correlation.ppt
Descriptive statistics-Skewness-Kurtosis-Correlation.pptRama Krishna
 
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RFinding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RRevolution Analytics
 
Read first few slides cluster analysis
Read first few slides cluster analysisRead first few slides cluster analysis
Read first few slides cluster analysisKritika Jain
 
Clusteranalysis 121206234137-phpapp01
Clusteranalysis 121206234137-phpapp01Clusteranalysis 121206234137-phpapp01
Clusteranalysis 121206234137-phpapp01deepti gupta
 
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier DetectionReverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection1crore projects
 

Similaire à Sa Presentation 20070917111 Thomas (20)

Spatial analysis of house price determinants
Spatial analysis of house price determinantsSpatial analysis of house price determinants
Spatial analysis of house price determinants
 
Spatial Analysis of House Price Determinants
Spatial Analysis of House Price DeterminantsSpatial Analysis of House Price Determinants
Spatial Analysis of House Price Determinants
 
Spatial data analysis 1
Spatial data analysis 1Spatial data analysis 1
Spatial data analysis 1
 
Climate Extremes Workshop - Networks and Extremes: Review and Further Studies...
Climate Extremes Workshop - Networks and Extremes: Review and Further Studies...Climate Extremes Workshop - Networks and Extremes: Review and Further Studies...
Climate Extremes Workshop - Networks and Extremes: Review and Further Studies...
 
Updating Ecological Niche Modeling Methodologies
Updating Ecological Niche Modeling MethodologiesUpdating Ecological Niche Modeling Methodologies
Updating Ecological Niche Modeling Methodologies
 
D1T3 enm workflows updated
D1T3 enm workflows updatedD1T3 enm workflows updated
D1T3 enm workflows updated
 
Spatial analysis & interpolation in ARC GIS
Spatial analysis & interpolation in ARC GISSpatial analysis & interpolation in ARC GIS
Spatial analysis & interpolation in ARC GIS
 
Topological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial SystemsTopological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial Systems
 
GEOSTATISTICAL_ANALYST
GEOSTATISTICAL_ANALYSTGEOSTATISTICAL_ANALYST
GEOSTATISTICAL_ANALYST
 
Spatial Data Mining : Seminar
Spatial Data Mining : SeminarSpatial Data Mining : Seminar
Spatial Data Mining : Seminar
 
Spatial association discovery process using frequent subgraph mining
Spatial association discovery process using frequent subgraph miningSpatial association discovery process using frequent subgraph mining
Spatial association discovery process using frequent subgraph mining
 
01_AJMS_310_21.pdf
01_AJMS_310_21.pdf01_AJMS_310_21.pdf
01_AJMS_310_21.pdf
 
AJMS_5(2)_21.pdf
AJMS_5(2)_21.pdfAJMS_5(2)_21.pdf
AJMS_5(2)_21.pdf
 
Spatial statistics presentation Texas A&M Census RDC
Spatial statistics presentation Texas A&M Census RDCSpatial statistics presentation Texas A&M Census RDC
Spatial statistics presentation Texas A&M Census RDC
 
Descriptive statistics-Skewness-Kurtosis-Correlation.ppt
Descriptive statistics-Skewness-Kurtosis-Correlation.pptDescriptive statistics-Skewness-Kurtosis-Correlation.ppt
Descriptive statistics-Skewness-Kurtosis-Correlation.ppt
 
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RFinding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
 
Read first few slides cluster analysis
Read first few slides cluster analysisRead first few slides cluster analysis
Read first few slides cluster analysis
 
Clusteranalysis 121206234137-phpapp01
Clusteranalysis 121206234137-phpapp01Clusteranalysis 121206234137-phpapp01
Clusteranalysis 121206234137-phpapp01
 
Clusteranalysis
Clusteranalysis Clusteranalysis
Clusteranalysis
 
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier DetectionReverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
 

Plus de nspiropo

Urban Sprawl
Urban SprawlUrban Sprawl
Urban Sprawlnspiropo
 
New Building
New BuildingNew Building
New Buildingnspiropo
 
Presentation Industrial & Labour Geography
Presentation Industrial & Labour GeographyPresentation Industrial & Labour Geography
Presentation Industrial & Labour Geographynspiropo
 
Globalization
GlobalizationGlobalization
Globalizationnspiropo
 
City & Community
City & CommunityCity & Community
City & Communitynspiropo
 
Anagnostopoulos, Vavatsikos, Kraias, Spyropoulos
Anagnostopoulos, Vavatsikos, Kraias, SpyropoulosAnagnostopoulos, Vavatsikos, Kraias, Spyropoulos
Anagnostopoulos, Vavatsikos, Kraias, Spyropoulosnspiropo
 
Modifiable Area Unit Problem
Modifiable Area Unit ProblemModifiable Area Unit Problem
Modifiable Area Unit Problemnspiropo
 
Locate-Alocate a Potential New School in Tarnby Commune-CPH
Locate-Alocate a Potential New School in Tarnby Commune-CPHLocate-Alocate a Potential New School in Tarnby Commune-CPH
Locate-Alocate a Potential New School in Tarnby Commune-CPHnspiropo
 
Analysis of Workplace Accessibility in Denmark
Analysis of Workplace Accessibility in DenmarkAnalysis of Workplace Accessibility in Denmark
Analysis of Workplace Accessibility in Denmarknspiropo
 
Analysis of Residence Value in North Sjaelland
Analysis of Residence Value in North SjaellandAnalysis of Residence Value in North Sjaelland
Analysis of Residence Value in North Sjaellandnspiropo
 

Plus de nspiropo (11)

Ikteo
IkteoIkteo
Ikteo
 
Urban Sprawl
Urban SprawlUrban Sprawl
Urban Sprawl
 
New Building
New BuildingNew Building
New Building
 
Presentation Industrial & Labour Geography
Presentation Industrial & Labour GeographyPresentation Industrial & Labour Geography
Presentation Industrial & Labour Geography
 
Globalization
GlobalizationGlobalization
Globalization
 
City & Community
City & CommunityCity & Community
City & Community
 
Anagnostopoulos, Vavatsikos, Kraias, Spyropoulos
Anagnostopoulos, Vavatsikos, Kraias, SpyropoulosAnagnostopoulos, Vavatsikos, Kraias, Spyropoulos
Anagnostopoulos, Vavatsikos, Kraias, Spyropoulos
 
Modifiable Area Unit Problem
Modifiable Area Unit ProblemModifiable Area Unit Problem
Modifiable Area Unit Problem
 
Locate-Alocate a Potential New School in Tarnby Commune-CPH
Locate-Alocate a Potential New School in Tarnby Commune-CPHLocate-Alocate a Potential New School in Tarnby Commune-CPH
Locate-Alocate a Potential New School in Tarnby Commune-CPH
 
Analysis of Workplace Accessibility in Denmark
Analysis of Workplace Accessibility in DenmarkAnalysis of Workplace Accessibility in Denmark
Analysis of Workplace Accessibility in Denmark
 
Analysis of Residence Value in North Sjaelland
Analysis of Residence Value in North SjaellandAnalysis of Residence Value in North Sjaelland
Analysis of Residence Value in North Sjaelland
 

Sa Presentation 20070917111 Thomas

  • 1. Vector based spatial analysis Nikolaos Spyropoulos and Thomas K. Andersen Institute of Geography
  • 2. The ESRI Guide to GIS Analysis, Mitchell 2005 • Chapter 4, Identifying Clusters • Chapter 5, Analyzing Geographic Relationships
  • 4. Identifying Clusters Why identify clusters? •Get an understanding of the location pattern in an area •Compare these patterns with other features, for identifying possible contributing factors •Take action on behalf of these identified clusters Clusters of burglaries Income and emergency calls
  • 5. Using statistics to identify clusters Conclusions can be drawn when looking at a map (e.g. where is the cluster), by using statistics it is possible to test the conclusions and validate them With statistics each events is counted as an unique occurrence, which is hard to see on a map;
  • 6. Time period of data The time period of data can vary a lot, from current conditions to long time periods - For vacant parcels you need a snapshot of the current condition, for crimes or earthquakes, defining a time period is needed Vacant houses Crimes Earthquakes Now 6 month 100 years Therefore: The time period is different, and has to be defined
  • 7. Distance within clusters Clusters are usually defined by using Euclidian distance. Though travel time or cost can also be used. - Clusters of burglaries can be dependent on driving time between the crimes. Because Euclidian distance doesn’t take barriers (such as a river) into account, the Euclidian distance seems very close, even though the travel time is long.
  • 8. Identifying clusters - methods Two methods for identifying clusters: 1. Finding clusters of features when features are found in close proximity 2. Finding clusters of similar value when groups of high and low values are found together (”hot and cold spots”)
  • 10. Nearest neighbour hierarchical clustering (1) ”One method for finding clusters is to specify the distance features can be from each other, in order to be part of a cluster, and the minimum number of features that make up a cluster.” (Mitchell 2005:152) Clusters with a specified number of features within a specified distance
  • 11. Nearest neighbour hierarchical clustering (2) The method is hierarchical because the routine continues on to group the clusters into larger clusters (shows several geographic scales e.g. neighbourhood and citywide for crimes). Clusters at small scale(Neighbourhood) Clusters at bigger scale(citywide)
  • 12. Nearest neighbour hierarchical clustering (3) How nearest neighbour hierarchical clustering works: A probability level is specified, to calculate the distance within which features will be considered a cluster If the distance is greater than the high end of the range, the features are further apart than you would expect by chance. For clustering it is opposite, the low end of the scale is interesting (Confidence interval) . The confidence interval is calculated by using the mean distance that would occur between points in a random distribution “mean random distance”. --See page 155 and 156 for calculation
  • 13. Finding clusters of similar value
  • 14. Finding clusters of similar values The GIS looks at the attribute values of each feature and its neighbours, as well as the proximity of the features. Then calculates a degree to which nearby features have similar values for a given attribute. Percent age 65 or over Percentages of seniors similar to their neighbours (Blue less similar, red more similar)
  • 15. Identifying clusters of similar values (1) Where high values are surrounded by high values or low values are surrounded by low values, the features are similar
  • 16. Identifying clusters of similar values (2) A statistic is calculated for each feature. It is then possible to map the features based on this value, to see the locations of features of similar value
  • 17. Moran’s Ii (1) A method to identify similar values Emphasizes how features differ from the values in the study area as a whole Compares the value of each feature in a pair to the mean value for all features in the study area (local variation - the method looks what’s happening right around each feature) --Calculation see page 167
  • 18. Moran’s Ii (2) The value for Moran’s Ii depends on the difference in attribute values, the number of neighbours with similar values, and the magnitude of the attribute data • A high positive value for indicates that the feature is surrounded by features with similar values, either high or low. • A Negative value indicates that the feature is surrounded by features of dissimilar values.
  • 19. Gi statistic Identifying concentrations (clusters) of high and low values within a distance Compares neighbouring within a specified distance Two versions: 1. Gi statistic 2. Gi*
  • 20. Version 1 - Gi statistic Is used to find out what’s going on around a feature/or cell, without taking the target value into account -Used for dispersion of a certain phenomena in a certain area. Gi has been used to track down the spreading of AIDS in the counties in the San Francisco area. It was possible to see the increase over time and distance
  • 21. Version 2 – Gi* The value of the target feature is included. Used to find hot or cold spots. A distance (search radius) is defined This distance is based on the knowledge of the features and their behaviour. Example: how long are people willing to travel to go to a certain store? (Euclidian dist., travel time etc.)
  • 22. Chapter 5, Analyzing Geographic Relationships
  • 23. Analyzing Geographic Relationships Why Analyze Geographic Relationships? Analysis of feature distributions. Analysis of relationships between features. Understanding of Predict where Examine why what is going on something is things occur In a place. likely to occur. where they do.
  • 24. Why Analyze Geographic Relationships? Understanding what is going on in a place. Example: Analysis of accidents related to speed limit in highways
  • 25. Why Analyze Geographic Relationships? Predicting where something is likely to occur. Example: Analysis of landforms in order to identify artifacts locations.
  • 26. Why Analyze Geographic Relationships? Examine why things occur where they do. Example: Improvment of newborns health.
  • 27. Using Statistics to Analyze Relationships • When we look for relationships we form an opinion about things based on personal knowledge of phenomena or visual analysis of the map. • Statistics allow us to verify those relationships and measure how strong they are. • The idea behind using statistics is: To see in what extent the value of an attribute changes when an other changes, measure the relationship between two or more maps representing the variables (analyze the relationship between two attribute data).
  • 28. Assigning Variables to Geography •Variables from different layers must be associated with the same geographic unit. Case not: i)Different cell sizes Ratio ii)Different set of features Combine feautures iii)Points representing diff. categories of features Sum Features to area iv)Combine two or more sets of features Raster Example: Emergency calls and population data.
  • 29. Using Statistics to Analyze Geographic Relationships Two statistical assumptions: •Each value is likely to occur equaly to the sample •The value of an observation doesn’t affect an other value In Geography: •Attribute values vary across a region Regional trends influence attribute values
  • 30. Using Statistics to Analyze Geographic Relationships •Nearby features are more similar than distant ones Spatial autocorrelation Violation of observations independance Smaller units tend to be more similar than bigger.
  • 31. Using Statistics to Analyze Geographic Relationships Identifying relationships Vs Analyzing processes Asking for Relationships Analyzing processes between (x,y) Measure the extent of main variables variation that drives a process Take actions predict values Understand of a variable
  • 32. Identifying Geographic Relationships How much two attributes vary. direct relationship inverse relationship (positive correlation) (negative correlation) If suspisious about a relationship then: measure the relationship confirm measure direction and strenghth
  • 33. Methods for Identifying Geographic Relationships •Pearson’s Correlation Coefficient
  • 34. Methods for Identifying Geographic Relationships •Spearman’s Rank Correlation Coefficient measures the extent to which two lists of ranked values correspond
  • 35. Identifying Geographic Relationships What correlation coefficient doesn’t measure • Can not apply results of correlation e.g. from a county to the nation. • Doesn’t measure causation X Y • Correlation doesn’t explain why there is a relationship. • Doesn’t measure the form of the relationship just the dispretion around a straight line.
  • 36. Analyzing Geographic Processes We analyze geographic processes in order to predict that something will occur. Steps 1. Develop a theory as to what is driving the process 2. Analyze the relationships between various atributes of your data (build a Model)
  • 37. Analyzing Geographic Processes Linear Regression Analysis •Plot variables on chart. •Find the line that passes between all data points (ordinary least squares method)
  • 38. Analyzing Geographic Processes Ordinary Least Squares Example from Wikipedia
  • 39. Analyzing Geographic Processes Interpreting the results of regression analysis We can see how our model works by comparing the variance in the predicted values to the variance in the observed values. • Perfect fit (all points on line) then R2 = 1 • Any other case with 1>R2 means not perfect fit Calculate residuals (differences between predicted & observed values)
  • 40. Using More Than One Independent Variable Most geographic processes aren’t controlled by a single variable New Regression Analysis Equation r2 in multivariate regression describes the variation in y explained by the combination of independent variables.
  • 41. Using More Than One Independent Variable Identifying the key variables Analysis Test the significance of each variable t-test Goal
  • 42. Factors Influencing the Regression Analysis Results Least squares regression analysis is effective only if the following are true: 1. Linear relationship between Y,X. 2. Residuals have a Mean of 0. 3. Residuals have a constant Variance. 4. Residuals are randomly aranged along the regression line. 5. Residuals are normaly distributed. 6. Independent variables are not highly correlated.
  • 43. Regretion Analysis & Geographic Data For geographic data misspesification can result from many sources. Can Occur When: Analyze data Missing variables at the wrong scale for the process
  • 44. Dealing With Regional Variation Geographic Weighted Regression (GWR) • Allows model coefficients to vary regionally. • Regression runs for each location and not as a whole. Example: Per capita income.
  • 45. Dealing with Local Trends Methods to address local trends. Resampling Spatial filtering (remove spatial autocorrelation)
  • 46. Running A Linear Regression Analysis With Geographic Data. 1. Determine what are you trying to predict. 2. Identify the key independent variables. 3. Examine the distribution of your variables. 4. Run the ordinary least squares regression. 5. Examine the coefficients for each independent variable. 6. Examine the residuals. • Test for spatial autocorrelation • Look for missing variables • Plot y-values against residuals • Create a frequency curve

Notes de l'éditeur

  1. Example: Burglaries
  2. --Example: Clusters of burglaries on driving time between the crimes, especially if there is a barrier between the crime scenes, such as a river (eucl. Distance is close, but travel time is long).
  3. --example AIDS in SF.
  4. --example: new location of pet stores