SlideShare une entreprise Scribd logo
1  sur  15
Geospatial Analytics using SAS and
Publically Available Information
Rob Hall
WASUP – October 2012
Overview
Visualisation of geographic data is essential to
understanding the distribution of attributes across
Australia.
Retail Credit Risk has developed software to enable easy
generation of maps for any desired statistic through SAS
Enterprise Guide.
Once the data is prepared based on postcode, a fully
automated process will generate a desired map in under a
minute.
The program can be easily extended enabling analysts to
view data from many angles.
It is an essential tool which can be easily added to existing
MI packs.
SUCCESS STORY:
One of our reports rarely received feedback from Executives. Once we
placed a map visualising the data, we immediately received requests for
more information!
Visual inspection of data is essential to understanding regional trends and hotspots
Visualising Data
Large result tables are difficult to place into perspective,
and don’t allow the reader to see the big picture. The
ability to visualise data adds meaning to reports and
facilitates the communication of a message. Comparing
the map above with the table, we can easily see where
areas are experiencing stress. This information is lost in
the table. It is difficult to place all the results in one table
and communicate the intended message.
Area
Number of
Accounts
Delinquent
Accounts
Percentage
Delinquent
Ave
Delinquent
LVR
Ave
Delinquent
Total Limits
Ave
Delinquent
EOM
Balance
Ave
Delinquent
Amount
TUGUN - QLD 35 6 17.10% 66 262,803 -284,825 -22,022
ADELAIDE - SA 85 7 8.24% 81 501,052 -519,162 -18,110
AUSTINVILLE - QLD 143 11 7.69% 84 259,973 -232,103 27,870
BLACKSTONE HEIGHTS - TAS 94 7 7.45% 84 174,336 -180,224 -5,888
BERKSHIRE PARK - NSW 74 5 6.76% 56 257,738 -275,449 -17,711
ASCOT - QLD 85 5 5.88% 83 261,538 -268,410 -6,872
BARANGAROO - NSW 139 8 5.76% 69 570,543 -893,595 -65,000
BADGIN - WA 98 5 5.10% 66 175,961 -183,021 -7,061
BENOWA - QLD 139 7 5.04% 88 273,636 -281,545 -7,909
ABERCROMBIE - NSW 127 6 4.72% 82 221,895 -227,025 -5,131
BALLIMORE - NSW 147 6 4.08% 86 206,946 -215,589 -8,643
AROONA - QLD 202 8 3.96% 90 315,982 -326,191 -10,209
BAMARANG - NSW 102 4 3.92% 55 224,548 -239,354 -14,806
COLLINGWOOD PARK - QLD 104 4 3.85% 68 288,195 -302,357 -14,162
BRANDITT - VIC 131 5 3.82% 84 375,463 -389,270 -13,808
CLEAR ISLAND WATERS - QLD 131 5 3.82% 83 422,059 -437,013 -14,954
MACGREGOR - QLD 162 6 3.70% 85 261,405 -265,832 -4,427
BASIN POCKET - QLD 191 7 3.66% 71 212,036 -220,254 -8,219
BELGRAVIA - NSW 221 8 3.62% 73 244,405 -249,963 -5,558
ASHCROFT - NSW 196 7 3.57% 80 316,575 -323,112 -6,536
ARUNDEL - QLD 198 7 3.54% 71 246,028 -254,343 -8,315
BROADBEACH - QLD 143 5 3.50% 79 319,676 -334,795 -15,119
CONISTON - NSW 116 4 3.45% 58 222,348 -227,042 -4,693
AMBERLEY - QLD 119 4 3.36% 65 234,315 -247,949 -13,634
CRESTMEAD - QLD 125 4 3.20% 51 173,490 -177,129 -3,639
FRENCHS FOREST - NSW 161 5 3.11% 78 247,681 -253,731 -6,050
NORMANHURST - NSW 198 6 3.03% 80 323,569 -342,623 -19,054
COLLAROY - NSW 133 4 3.01% 85 283,678 -291,542 -7,864
HMAS PLATYPUS - NSW 100 3 3.00% 80 929,211 -965,459 -36,248
GILBERTON - QLD 100 3 3.00% 92 420,057 -445,315 -25,258
DINGLEY VILLAGE - VIC 137 4 2.92% 45 373,832 -380,187 -6,355
BURLEIGH DC - QLD 138 4 2.90% 84 384,508 -889,992 -70,000
PADSTOW - NSW 138 4 2.90% 58 148,621 -155,076 -6,455
COOLAROO - VIC 138 4 2.90% 80 243,244 -249,687 -6,443
KINGSTON - QLD 141 4 2.84% 85 345,028 -359,125 -14,097
PARALOWIE - SA 211 6 2.84% 82 178,641 -192,291 -13,649
ALBURY - NSW 142 4 2.82% 69 260,287 -267,349 -7,062
ALBION - VIC 287 8 2.79% 81 200,088 -207,630 -7,542
KUNYUNG - VIC 109 3 2.75% 0 90,000 -120,629 -30,629
ABBOTSFORD - QLD 291 8 2.75% 56 198,639 -203,405 -4,766
CANNON HILL - QLD 146 4 2.74% 79 444,927 -471,691 -26,764
GARDEN CITY - VIC 147 4 2.72% 73 706,434 -758,754 -52,320
ADVANCETOWN - QLD 331 9 2.72% 79 332,900 -341,796 -8,897
CALAMVALE - QLD 148 4 2.70% 71 381,881 -394,281 -12,400
AMAMOOR - QLD 112 3 2.68% 27 221,188 -229,098 -7,910
MAWSON LAKES - SA 112 3 2.68% 92 156,710 -159,847 -3,137
MILDURA - VIC 149 4 2.68% 85 210,203 -214,821 -4,618
ALFREDTOWN - NSW 150 4 2.67% 67 242,365 -253,479 -11,114
ALFRED COVE - WA 374 1 0.27% 60 20,000 -20,278 -278
CLOVERDALE - WA 480 1 0.21% 70 267,700 -270,841 -3,141
KINGSLEY - WA 1206 3 0.21% 54 229,127 -233,828 -4,701
/// Another 500 Lines ///
Development of SAS Geospatial software allows easy mapping any data containing
Australian Postcodes
SAS Geospatial Software
The development of geospatial software was created in-
house. Based on mapping information freely available
from the ABS and Australia Post, we have developed a user
friendly approach to very quickly produce an Australian
map many different statistics.
The analyst only needs to produce a table containing
postcode level data, including the information required for
charting. Once this data is linked into the software, it takes
less that 1 minute to produce a map.
Maps can be displayed at State, Fitch Region or Postcode
level. Different maps can also be displayed for each state.
Sales Targets By Postcode
Sales Targets By Fitch Region
Example: Distribution by Fitch Region
Maps can be easy generated
To create a map, all that is required is two columns of
data - the Postcode, and the attribute you wish to map.
The attribute need not represent numerical data, but
can also include categorical information.
Examples of Maps Include:
• Product sales or application volumes
• Credit Card Fraud
• Default Rates
• Scorecard Results
• Cross Sale Volume
• Loan Utilization
Australian Delinquency Rates By Fitch Region
Data can be summarised at various levels and the analyst can easily create
options to create different desired maps.
Customisation Through SAS Enterprise Guide
An Enterprise Guide project can easily be set up
to handle any segmentation or statistics.
In this case, the project has been created for
easy re-use.
The analyst can select the map attributes,
including the statistic (mean, count, min, max),
the display level (Postcode, State or Fitch Region)
and a few other options to create the map
desired.
This gives the analyst a chance to consider
results from many perspectives.
The process of producing maps can then be
performed without changing any code, making a
short learning curve for new starters.
Allows zooming in of maps dynamically to see greater detail
Zooming In
Dynamic maps can be produced that enable the analyst to zoom in on regions for finer detail. This is done
through a web browser using the Graph Toolbar.
Australia Zoom to Central East NSW Zoom Further to Sydney Area
Auto RAG Vs User Defined RAG Status
RAG (Red Amber Green, or Traffic Lights) Status
RAG status can be set to predefined values in order to line
up with existing triggers. However, often it is necessary to
assign a meaningful RAG status for data we have not
visualised before.
The software can automatically generate RAG status based
on percentiles.
White: 0 observations
Green: Less that 75th percentile
Amber: Less than the 90th percentile
Red: Higher than the 90th percentile
Once the analyst has reviewed the outputs, the RAG status
can be manually set at the levels desired with little
programming.
Auto RAG
User-Defined RAG
Information Sources:
ABS: http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/1270.0.55.003July%202011?OpenDocument
Google Maps API: http://maps.googleapis.com/maps/api/geocode...
The MapInfo Interchange File (MIF): Format Specification, October 1999, (Mapinfo_Mif.pdf), MapInfo Corporation
Australia Post for Postcode Mappings: http://auspost.com.au
What we need to do:
• Decode the mapinfo file
• Link in the Australia Post postal zones
• Create a valid GMAP dataset
• Create a DENSITY variable
• Link in our postcode data we wish to plot
• Run proc gmap
WHY NOT USE PROC MAPIMPORT TO PROCESS THE MAPINFO FILE?
PROC MAPIMPORT will import native Shapefiles. However, there are limits on the size of the input data it will accept.
The ABS map files are too big!
Therefore we must reverse engineer the files and create our own data.
SAS GMap Data Format
We use PROC GMAP to create our maps. SAS Expects the mapping data in a particular format.
• Boundaries are represented by polygons
• The polygon coordinates must be in the order in which they are drawn. I created an index variable, which allows me
to sort the polygon data regardless of the subset chosen.
• Boundaries between joining polygons must align
• Polygon boundaries must end with X and Y = missing
• See SAS Help – The GMAP procedure for further specifications.
GREDUCE is used to determine common boundaries and create a density variable
which enables you to specify the level of detail of your map.
1= very granular, 6 = high detail.
• The input map data set must be a traditional map data set and contain these variables:
• a numeric variable named X that contains the horizontal coordinates of the map boundary points.
• a numeric variable named Y that contains the vertical coordinates of the map boundary points.
• X and Y correspond to the Latitude and longitude of the map
• one or more identification variables that uniquely identify the unit areas in the map. These variables are listed in the ID
statement.
It also can contain:
• one or more variables that identify groups of unit areas (for BY-group processing)
• the variable SEGMENT, which distinguishes noncoterminous segments of the unit areas.
• Any other variables in the input map data set do not affect the GREDUCE procedure.
Source: SAS Help – GREDUCE Procedure
Perfect
alignment here
Mapinfo Data Format
VERSION 450
DELIMITER ","
CoordSys Earth Projection 1, 116 Bounds (96,-45) (160,-8)
COLUMNS 4
SSC_CODE_2011 char(5)
SSC_NAME_2011 char(45)
CONF_VALUE char(12)
AREA_ALBERS_SQKM float
DATA
BRUSH(1,0)
REGION 1
6
149.849264992 -36.6253990025
149.848946016 -36.6258159925
149.848335008 -36.626610993
149.84918 -36.6242170005
149.849251008 -36.624721995
149.849264992 -36.6253990025
BRUSH(1,0)
REGION 2
6
148.934595296 -36.2314241865
148.934515232 -36.231766048
148.935468832 -36.233815737
153.342782016 -29.480945996
153.342276 -29.481940001
153.342455008 -29.4828790055
BRUSH(1,0)
REGION 3
5
152.113704064 -32.7755481515
152.114273696 -32.775713449
152.114248096 -32.774133364
152.11393104 -32.7749443485
152.113704064 -32.7755481515
5
152.080612992 -32.789413994
152.080636 -32.7895320055
152.080721984 -32.7893500025
152.080646016 -32.789302994
152.080612992 -32.789413994
3
152.079690016 -32.789695009
152.07966 -32.7895769975
152.079690016 -32.789695009
BRUSH(1,0)
NONE
NONE
There are 2 Mapinfo files - .mid and .mif. We are only
interested in the .mif. This is a text file which consists
of:
• Header
• Various pen commands (control colour etc)
• Commands to designate each region
• The polygon definitions for each region – each
region can have several polygons (e.g. there may
be a lake inside the main polygon).
• A number on a line by its own, designating the
number of points in the polygon.
• Ends with a NONE command
To process the file we:
• Ignore the header and BRUSH commands
• SCAN each line
• If we find a REGION label, increment out REGION
counter
• If we find an integer on its own, increment a
POLYGON counter, but keep it associated with the
current region
• Any other case, extract the LAT and LONG
coordinates
• Stop processing when we find the NONE label.
These regions
only have 1
polygon
This region
consists of 3
polygons
Creating Your Map Using the GMAP procedure
PROC SORT DATA = WORK.MAP_DATA
OUT = WORK.MAP_DATA_SORTED;
BY STATE REGION INDEX;
RUN;
PROC GMAP MAP = WORK.MAP_DATA_SORTED ;
ID REGION;
CHORO MAP_DATA_FORMATED /
MISSING
COUTLINE= GREY;
BY STATE
;
RUN;
Standard Map
With a little extra coding you
can created animated maps
Melbourne
Using Google Maps API to plot street addresses
If you don’t have the coordinate data for customer addresses, you can feed it to Google Maps . You can then overlay the points on a
map;
Advantages: Free, the API will generally interpret misspelling correctly
Disadvantages: Can only download 1,200 addresses per day – subject to a misuse policy.
Sample address to submit to Google API:
http://maps.googleapis.com/maps/api/geocode/xml?address=5 jessel place,duncraig,wa,AUSTRALIA&sensor=true
Returns XML data containing the Lat and Long of the address
<?xml version="1.0" encoding="UTF-8" ?>
- <GeocodeResponse>
<status>OK</status>
- <result>
<type>street_address</type>
<formatted_address>5 Jessel Pl, Duncraig WA 6023,
Australia</formatted_address>
- <address_component>
<long_name>5</long_name>
<short_name>5</short_name>
<type>street_number</type>
</address_component>
BLAH BLAH BLAH
- <southwest>
<lat>-31.8431920</lat>
<lng>115.7773300</lng>
</southwest>
- <northeast>
<lat>-31.8404940</lat>
<lng>115.7800280</lng>
</northeast>
</viewport>
</geometry>
</result>
</GeocodeResponse>
Using Google Maps API
Using Excel and VBA to download addresses:
Using VBA to automate the download of addresses is straight forward using the QueryTables collection object. The results can be fed into a SAS
table and overlayed on a map using the ANNOTATE option to GMAP.
Sub getGeoData()
For Each q In Sheets("google").QueryTables
Debug.Print q.Name
q.Delete
Next q
startRow = 26510
resultsStart = 27763
Address = Data.Range("H" & startRow)
conn = "URL;http://maps.googleapis.com/maps/api/geocode/xml?address=" & Address &
"&sensor=true"
With Google.QueryTables.Add(Connection:=conn, Destination:=Google.Range("B1"))
.Refresh (False)
End With
For r = startRow To 30000
If Data.Range("H" & r) <> Data.Range("H" & r - 1) Then
Address = Data.Range("H" & r)
conn = "URL;http://maps.googleapis.com/maps/api/geocode/xml?address=" & Address &
"&sensor=true"
With Google.QueryTables(1)
.Connection = "URL;http://maps.googleapis.com/maps/api/geocode/xml?address=" & Address
& "&sensor=true"
.Refresh (False)
End With
If Google.Cells(3, 2).Value = "OVER_QUERY_LIMIT" Then Exit Sub
If Google.Cells(3, 2).Value <> "ZERO_RESULTS" Then
Results.Cells(resultsStart, 1) = Data.Cells(r, 1)
Results.Cells(resultsStart, 2) = Google.Cells(50, 1)
Results.Cells(resultsStart, 3) = Google.Cells(51, 1)
resultsStart = resultsStart + 1
End If
Google.Cells(52, 1) = r
Google.Cells(53, 1) = r - startRow
End If
DoEvents
Next r
End Sub
%MACRO dot( x1, y1, rad, colin, fill );
/*--------------------------------------------------------------------------*/
/* Draw a circle with center at ( X1,Y1 ) of radius RAD. */
/*--------------------------------------------------------------------------*/
X = &x1;
Y = &y1;
LINE = 0;
ANGLE = 0.00;
ROTATE = 360.00;
SIZE = &rad;
STYLE = "&fill";
IF "&colin" =: '*' THEN ; ELSE color = "&colin";
FUNCTION = "PIE"; output;
%MEND dot;
%let size_cur = 0.05;
%let size_30 = 0.09;
%let size_60 = 0.11;
%let size_90 = 0.15;
data ANNO;
length function style color $ 8 position $ 1 ;
retain xsys ysys "2" hsys "3" when "a" ;
set WORK.CORD_MARKER;
position = "E";
segment = 1;
if days_buck = -1 then do;
%dot( X, Y, &size_cur , green, solid );
end;
else if days_buck = 1 then do;
%dot( X, Y, &size_30 , orange, solid );
end;
else if days_buck = 30 then do;
%dot( X, Y, &size_30 , blue, solid );
end;
else if Days_Buck = 60 then do;
%dot( X, Y, &size_60 , purple, solid );
end;
else do;
%dot( X, Y, &size_90 , red, solid );
end;
run;
Issues:
- Gaps in map where there is a national park (e.g. 1/3 of Tasmania missing!)
- Some SSD to Postcode mappings are incorrect
- Some postcodes overlap states
- No feature maps
How much work was involved
While the build took place over 2 years (not full time!), development is quite straight forward. The hardest part was finding
the best map file on ABS, then linking that data to the Australia Post postcode file. There are inconsistencies in the ABS data
which had to be rectified (e.g. postcodes in the wrong position or overlapping).
Once the base geospatial files were finalised, it is a relatively short process to create a semi-automated solution in Enterprise
Guide.

Contenu connexe

En vedette (6)

PhD_10_2011_Abhijeet_Paul
PhD_10_2011_Abhijeet_PaulPhD_10_2011_Abhijeet_Paul
PhD_10_2011_Abhijeet_Paul
 
IEDM-2006_Abhijeet_Paul_new
IEDM-2006_Abhijeet_Paul_newIEDM-2006_Abhijeet_Paul_new
IEDM-2006_Abhijeet_Paul_new
 
Depression’s Impact on Relationships and Relationships’ Impact on Depression
Depression’s Impact on Relationships and Relationships’ Impact on DepressionDepression’s Impact on Relationships and Relationships’ Impact on Depression
Depression’s Impact on Relationships and Relationships’ Impact on Depression
 
Вікторина «Любов до пізнання»
Вікторина «Любов до пізнання» Вікторина «Любов до пізнання»
Вікторина «Любов до пізнання»
 
Organizing Music Events - How to Plan & Make Your Plan Works
Organizing Music Events - How to Plan & Make Your Plan WorksOrganizing Music Events - How to Plan & Make Your Plan Works
Organizing Music Events - How to Plan & Make Your Plan Works
 
Genética general | Gatos
Genética general | GatosGenética general | Gatos
Genética general | Gatos
 

Similaire à GeospatialPresentationWASUP_RobHall

Activity-Based Costing
Activity-Based CostingActivity-Based Costing
Activity-Based Costing
rexcris
 
Recommending a Strategy
Recommending a StrategyRecommending a Strategy
Recommending a Strategy
Bill Sims
 
Using OGC Standards To Link BI and Spatial
Using OGC Standards To Link BI and SpatialUsing OGC Standards To Link BI and Spatial
Using OGC Standards To Link BI and Spatial
MISNet - Integeo SE Asia
 
Final Presentation from Chester Group Rev 0
Final Presentation from Chester Group Rev 0Final Presentation from Chester Group Rev 0
Final Presentation from Chester Group Rev 0
Steven Quenzel
 

Similaire à GeospatialPresentationWASUP_RobHall (20)

Activity-Based Costing
Activity-Based CostingActivity-Based Costing
Activity-Based Costing
 
Presentation.pptx (2)
Presentation.pptx (2)Presentation.pptx (2)
Presentation.pptx (2)
 
Case sharing budget allocation
Case sharing budget allocationCase sharing budget allocation
Case sharing budget allocation
 
cellivery 268600 Algorithm Investment Report
cellivery 268600 Algorithm Investment Reportcellivery 268600 Algorithm Investment Report
cellivery 268600 Algorithm Investment Report
 
isuabxis 086890 Algorithm Investment Report
isuabxis 086890 Algorithm Investment Reportisuabxis 086890 Algorithm Investment Report
isuabxis 086890 Algorithm Investment Report
 
Recommending a Strategy
Recommending a StrategyRecommending a Strategy
Recommending a Strategy
 
datasolution 263800 Algorithm Investment Report
datasolution 263800 Algorithm Investment Reportdatasolution 263800 Algorithm Investment Report
datasolution 263800 Algorithm Investment Report
 
kakao 035720 Algorithm Investment Report
kakao 035720 Algorithm Investment Reportkakao 035720 Algorithm Investment Report
kakao 035720 Algorithm Investment Report
 
t-robotics 117730 Algorithm Investment Report
t-robotics 117730 Algorithm Investment Reportt-robotics 117730 Algorithm Investment Report
t-robotics 117730 Algorithm Investment Report
 
Using OGC Standards To Link BI and Spatial
Using OGC Standards To Link BI and SpatialUsing OGC Standards To Link BI and Spatial
Using OGC Standards To Link BI and Spatial
 
Metric Guide 2.0
Metric Guide 2.0Metric Guide 2.0
Metric Guide 2.0
 
Metric Guide 2.0
Metric Guide 2.0Metric Guide 2.0
Metric Guide 2.0
 
Annual Results and Impact Evaluation Workshop for RBF - Day Five - Adept-RBF ...
Annual Results and Impact Evaluation Workshop for RBF - Day Five - Adept-RBF ...Annual Results and Impact Evaluation Workshop for RBF - Day Five - Adept-RBF ...
Annual Results and Impact Evaluation Workshop for RBF - Day Five - Adept-RBF ...
 
Visualize Your Data
Visualize Your DataVisualize Your Data
Visualize Your Data
 
2015 Broadband Tech Summit - Todd Westberg UPS Presentation
2015 Broadband Tech Summit - Todd Westberg UPS Presentation2015 Broadband Tech Summit - Todd Westberg UPS Presentation
2015 Broadband Tech Summit - Todd Westberg UPS Presentation
 
14 - IDNOG03 - George Michaelson (APNIC) - IPV6-in-2016-IDNOG
14 - IDNOG03 - George Michaelson (APNIC) - IPV6-in-2016-IDNOG14 - IDNOG03 - George Michaelson (APNIC) - IPV6-in-2016-IDNOG
14 - IDNOG03 - George Michaelson (APNIC) - IPV6-in-2016-IDNOG
 
Final Presentation from Chester Group Rev 0
Final Presentation from Chester Group Rev 0Final Presentation from Chester Group Rev 0
Final Presentation from Chester Group Rev 0
 
NC soft 036570 Algorithm Investment Report
NC soft 036570 Algorithm Investment ReportNC soft 036570 Algorithm Investment Report
NC soft 036570 Algorithm Investment Report
 
Oil and Gas data management with GDSWARE - Oil and Gas Software
Oil and Gas data management with GDSWARE - Oil and Gas SoftwareOil and Gas data management with GDSWARE - Oil and Gas Software
Oil and Gas data management with GDSWARE - Oil and Gas Software
 
kakaogames 293490 Algorithm Investment Report
kakaogames 293490 Algorithm Investment Reportkakaogames 293490 Algorithm Investment Report
kakaogames 293490 Algorithm Investment Report
 

GeospatialPresentationWASUP_RobHall

  • 1. Geospatial Analytics using SAS and Publically Available Information Rob Hall WASUP – October 2012
  • 2. Overview Visualisation of geographic data is essential to understanding the distribution of attributes across Australia. Retail Credit Risk has developed software to enable easy generation of maps for any desired statistic through SAS Enterprise Guide. Once the data is prepared based on postcode, a fully automated process will generate a desired map in under a minute. The program can be easily extended enabling analysts to view data from many angles. It is an essential tool which can be easily added to existing MI packs. SUCCESS STORY: One of our reports rarely received feedback from Executives. Once we placed a map visualising the data, we immediately received requests for more information!
  • 3. Visual inspection of data is essential to understanding regional trends and hotspots Visualising Data Large result tables are difficult to place into perspective, and don’t allow the reader to see the big picture. The ability to visualise data adds meaning to reports and facilitates the communication of a message. Comparing the map above with the table, we can easily see where areas are experiencing stress. This information is lost in the table. It is difficult to place all the results in one table and communicate the intended message. Area Number of Accounts Delinquent Accounts Percentage Delinquent Ave Delinquent LVR Ave Delinquent Total Limits Ave Delinquent EOM Balance Ave Delinquent Amount TUGUN - QLD 35 6 17.10% 66 262,803 -284,825 -22,022 ADELAIDE - SA 85 7 8.24% 81 501,052 -519,162 -18,110 AUSTINVILLE - QLD 143 11 7.69% 84 259,973 -232,103 27,870 BLACKSTONE HEIGHTS - TAS 94 7 7.45% 84 174,336 -180,224 -5,888 BERKSHIRE PARK - NSW 74 5 6.76% 56 257,738 -275,449 -17,711 ASCOT - QLD 85 5 5.88% 83 261,538 -268,410 -6,872 BARANGAROO - NSW 139 8 5.76% 69 570,543 -893,595 -65,000 BADGIN - WA 98 5 5.10% 66 175,961 -183,021 -7,061 BENOWA - QLD 139 7 5.04% 88 273,636 -281,545 -7,909 ABERCROMBIE - NSW 127 6 4.72% 82 221,895 -227,025 -5,131 BALLIMORE - NSW 147 6 4.08% 86 206,946 -215,589 -8,643 AROONA - QLD 202 8 3.96% 90 315,982 -326,191 -10,209 BAMARANG - NSW 102 4 3.92% 55 224,548 -239,354 -14,806 COLLINGWOOD PARK - QLD 104 4 3.85% 68 288,195 -302,357 -14,162 BRANDITT - VIC 131 5 3.82% 84 375,463 -389,270 -13,808 CLEAR ISLAND WATERS - QLD 131 5 3.82% 83 422,059 -437,013 -14,954 MACGREGOR - QLD 162 6 3.70% 85 261,405 -265,832 -4,427 BASIN POCKET - QLD 191 7 3.66% 71 212,036 -220,254 -8,219 BELGRAVIA - NSW 221 8 3.62% 73 244,405 -249,963 -5,558 ASHCROFT - NSW 196 7 3.57% 80 316,575 -323,112 -6,536 ARUNDEL - QLD 198 7 3.54% 71 246,028 -254,343 -8,315 BROADBEACH - QLD 143 5 3.50% 79 319,676 -334,795 -15,119 CONISTON - NSW 116 4 3.45% 58 222,348 -227,042 -4,693 AMBERLEY - QLD 119 4 3.36% 65 234,315 -247,949 -13,634 CRESTMEAD - QLD 125 4 3.20% 51 173,490 -177,129 -3,639 FRENCHS FOREST - NSW 161 5 3.11% 78 247,681 -253,731 -6,050 NORMANHURST - NSW 198 6 3.03% 80 323,569 -342,623 -19,054 COLLAROY - NSW 133 4 3.01% 85 283,678 -291,542 -7,864 HMAS PLATYPUS - NSW 100 3 3.00% 80 929,211 -965,459 -36,248 GILBERTON - QLD 100 3 3.00% 92 420,057 -445,315 -25,258 DINGLEY VILLAGE - VIC 137 4 2.92% 45 373,832 -380,187 -6,355 BURLEIGH DC - QLD 138 4 2.90% 84 384,508 -889,992 -70,000 PADSTOW - NSW 138 4 2.90% 58 148,621 -155,076 -6,455 COOLAROO - VIC 138 4 2.90% 80 243,244 -249,687 -6,443 KINGSTON - QLD 141 4 2.84% 85 345,028 -359,125 -14,097 PARALOWIE - SA 211 6 2.84% 82 178,641 -192,291 -13,649 ALBURY - NSW 142 4 2.82% 69 260,287 -267,349 -7,062 ALBION - VIC 287 8 2.79% 81 200,088 -207,630 -7,542 KUNYUNG - VIC 109 3 2.75% 0 90,000 -120,629 -30,629 ABBOTSFORD - QLD 291 8 2.75% 56 198,639 -203,405 -4,766 CANNON HILL - QLD 146 4 2.74% 79 444,927 -471,691 -26,764 GARDEN CITY - VIC 147 4 2.72% 73 706,434 -758,754 -52,320 ADVANCETOWN - QLD 331 9 2.72% 79 332,900 -341,796 -8,897 CALAMVALE - QLD 148 4 2.70% 71 381,881 -394,281 -12,400 AMAMOOR - QLD 112 3 2.68% 27 221,188 -229,098 -7,910 MAWSON LAKES - SA 112 3 2.68% 92 156,710 -159,847 -3,137 MILDURA - VIC 149 4 2.68% 85 210,203 -214,821 -4,618 ALFREDTOWN - NSW 150 4 2.67% 67 242,365 -253,479 -11,114 ALFRED COVE - WA 374 1 0.27% 60 20,000 -20,278 -278 CLOVERDALE - WA 480 1 0.21% 70 267,700 -270,841 -3,141 KINGSLEY - WA 1206 3 0.21% 54 229,127 -233,828 -4,701 /// Another 500 Lines ///
  • 4. Development of SAS Geospatial software allows easy mapping any data containing Australian Postcodes SAS Geospatial Software The development of geospatial software was created in- house. Based on mapping information freely available from the ABS and Australia Post, we have developed a user friendly approach to very quickly produce an Australian map many different statistics. The analyst only needs to produce a table containing postcode level data, including the information required for charting. Once this data is linked into the software, it takes less that 1 minute to produce a map. Maps can be displayed at State, Fitch Region or Postcode level. Different maps can also be displayed for each state. Sales Targets By Postcode Sales Targets By Fitch Region
  • 5. Example: Distribution by Fitch Region Maps can be easy generated To create a map, all that is required is two columns of data - the Postcode, and the attribute you wish to map. The attribute need not represent numerical data, but can also include categorical information. Examples of Maps Include: • Product sales or application volumes • Credit Card Fraud • Default Rates • Scorecard Results • Cross Sale Volume • Loan Utilization Australian Delinquency Rates By Fitch Region
  • 6. Data can be summarised at various levels and the analyst can easily create options to create different desired maps. Customisation Through SAS Enterprise Guide An Enterprise Guide project can easily be set up to handle any segmentation or statistics. In this case, the project has been created for easy re-use. The analyst can select the map attributes, including the statistic (mean, count, min, max), the display level (Postcode, State or Fitch Region) and a few other options to create the map desired. This gives the analyst a chance to consider results from many perspectives. The process of producing maps can then be performed without changing any code, making a short learning curve for new starters.
  • 7. Allows zooming in of maps dynamically to see greater detail Zooming In Dynamic maps can be produced that enable the analyst to zoom in on regions for finer detail. This is done through a web browser using the Graph Toolbar. Australia Zoom to Central East NSW Zoom Further to Sydney Area
  • 8. Auto RAG Vs User Defined RAG Status RAG (Red Amber Green, or Traffic Lights) Status RAG status can be set to predefined values in order to line up with existing triggers. However, often it is necessary to assign a meaningful RAG status for data we have not visualised before. The software can automatically generate RAG status based on percentiles. White: 0 observations Green: Less that 75th percentile Amber: Less than the 90th percentile Red: Higher than the 90th percentile Once the analyst has reviewed the outputs, the RAG status can be manually set at the levels desired with little programming. Auto RAG User-Defined RAG
  • 9. Information Sources: ABS: http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/1270.0.55.003July%202011?OpenDocument Google Maps API: http://maps.googleapis.com/maps/api/geocode... The MapInfo Interchange File (MIF): Format Specification, October 1999, (Mapinfo_Mif.pdf), MapInfo Corporation Australia Post for Postcode Mappings: http://auspost.com.au What we need to do: • Decode the mapinfo file • Link in the Australia Post postal zones • Create a valid GMAP dataset • Create a DENSITY variable • Link in our postcode data we wish to plot • Run proc gmap WHY NOT USE PROC MAPIMPORT TO PROCESS THE MAPINFO FILE? PROC MAPIMPORT will import native Shapefiles. However, there are limits on the size of the input data it will accept. The ABS map files are too big! Therefore we must reverse engineer the files and create our own data.
  • 10. SAS GMap Data Format We use PROC GMAP to create our maps. SAS Expects the mapping data in a particular format. • Boundaries are represented by polygons • The polygon coordinates must be in the order in which they are drawn. I created an index variable, which allows me to sort the polygon data regardless of the subset chosen. • Boundaries between joining polygons must align • Polygon boundaries must end with X and Y = missing • See SAS Help – The GMAP procedure for further specifications. GREDUCE is used to determine common boundaries and create a density variable which enables you to specify the level of detail of your map. 1= very granular, 6 = high detail. • The input map data set must be a traditional map data set and contain these variables: • a numeric variable named X that contains the horizontal coordinates of the map boundary points. • a numeric variable named Y that contains the vertical coordinates of the map boundary points. • X and Y correspond to the Latitude and longitude of the map • one or more identification variables that uniquely identify the unit areas in the map. These variables are listed in the ID statement. It also can contain: • one or more variables that identify groups of unit areas (for BY-group processing) • the variable SEGMENT, which distinguishes noncoterminous segments of the unit areas. • Any other variables in the input map data set do not affect the GREDUCE procedure. Source: SAS Help – GREDUCE Procedure Perfect alignment here
  • 11. Mapinfo Data Format VERSION 450 DELIMITER "," CoordSys Earth Projection 1, 116 Bounds (96,-45) (160,-8) COLUMNS 4 SSC_CODE_2011 char(5) SSC_NAME_2011 char(45) CONF_VALUE char(12) AREA_ALBERS_SQKM float DATA BRUSH(1,0) REGION 1 6 149.849264992 -36.6253990025 149.848946016 -36.6258159925 149.848335008 -36.626610993 149.84918 -36.6242170005 149.849251008 -36.624721995 149.849264992 -36.6253990025 BRUSH(1,0) REGION 2 6 148.934595296 -36.2314241865 148.934515232 -36.231766048 148.935468832 -36.233815737 153.342782016 -29.480945996 153.342276 -29.481940001 153.342455008 -29.4828790055 BRUSH(1,0) REGION 3 5 152.113704064 -32.7755481515 152.114273696 -32.775713449 152.114248096 -32.774133364 152.11393104 -32.7749443485 152.113704064 -32.7755481515 5 152.080612992 -32.789413994 152.080636 -32.7895320055 152.080721984 -32.7893500025 152.080646016 -32.789302994 152.080612992 -32.789413994 3 152.079690016 -32.789695009 152.07966 -32.7895769975 152.079690016 -32.789695009 BRUSH(1,0) NONE NONE There are 2 Mapinfo files - .mid and .mif. We are only interested in the .mif. This is a text file which consists of: • Header • Various pen commands (control colour etc) • Commands to designate each region • The polygon definitions for each region – each region can have several polygons (e.g. there may be a lake inside the main polygon). • A number on a line by its own, designating the number of points in the polygon. • Ends with a NONE command To process the file we: • Ignore the header and BRUSH commands • SCAN each line • If we find a REGION label, increment out REGION counter • If we find an integer on its own, increment a POLYGON counter, but keep it associated with the current region • Any other case, extract the LAT and LONG coordinates • Stop processing when we find the NONE label. These regions only have 1 polygon This region consists of 3 polygons
  • 12. Creating Your Map Using the GMAP procedure PROC SORT DATA = WORK.MAP_DATA OUT = WORK.MAP_DATA_SORTED; BY STATE REGION INDEX; RUN; PROC GMAP MAP = WORK.MAP_DATA_SORTED ; ID REGION; CHORO MAP_DATA_FORMATED / MISSING COUTLINE= GREY; BY STATE ; RUN; Standard Map With a little extra coding you can created animated maps Melbourne
  • 13. Using Google Maps API to plot street addresses If you don’t have the coordinate data for customer addresses, you can feed it to Google Maps . You can then overlay the points on a map; Advantages: Free, the API will generally interpret misspelling correctly Disadvantages: Can only download 1,200 addresses per day – subject to a misuse policy. Sample address to submit to Google API: http://maps.googleapis.com/maps/api/geocode/xml?address=5 jessel place,duncraig,wa,AUSTRALIA&sensor=true Returns XML data containing the Lat and Long of the address <?xml version="1.0" encoding="UTF-8" ?> - <GeocodeResponse> <status>OK</status> - <result> <type>street_address</type> <formatted_address>5 Jessel Pl, Duncraig WA 6023, Australia</formatted_address> - <address_component> <long_name>5</long_name> <short_name>5</short_name> <type>street_number</type> </address_component> BLAH BLAH BLAH - <southwest> <lat>-31.8431920</lat> <lng>115.7773300</lng> </southwest> - <northeast> <lat>-31.8404940</lat> <lng>115.7800280</lng> </northeast> </viewport> </geometry> </result> </GeocodeResponse>
  • 14. Using Google Maps API Using Excel and VBA to download addresses: Using VBA to automate the download of addresses is straight forward using the QueryTables collection object. The results can be fed into a SAS table and overlayed on a map using the ANNOTATE option to GMAP. Sub getGeoData() For Each q In Sheets("google").QueryTables Debug.Print q.Name q.Delete Next q startRow = 26510 resultsStart = 27763 Address = Data.Range("H" & startRow) conn = "URL;http://maps.googleapis.com/maps/api/geocode/xml?address=" & Address & "&sensor=true" With Google.QueryTables.Add(Connection:=conn, Destination:=Google.Range("B1")) .Refresh (False) End With For r = startRow To 30000 If Data.Range("H" & r) <> Data.Range("H" & r - 1) Then Address = Data.Range("H" & r) conn = "URL;http://maps.googleapis.com/maps/api/geocode/xml?address=" & Address & "&sensor=true" With Google.QueryTables(1) .Connection = "URL;http://maps.googleapis.com/maps/api/geocode/xml?address=" & Address & "&sensor=true" .Refresh (False) End With If Google.Cells(3, 2).Value = "OVER_QUERY_LIMIT" Then Exit Sub If Google.Cells(3, 2).Value <> "ZERO_RESULTS" Then Results.Cells(resultsStart, 1) = Data.Cells(r, 1) Results.Cells(resultsStart, 2) = Google.Cells(50, 1) Results.Cells(resultsStart, 3) = Google.Cells(51, 1) resultsStart = resultsStart + 1 End If Google.Cells(52, 1) = r Google.Cells(53, 1) = r - startRow End If DoEvents Next r End Sub %MACRO dot( x1, y1, rad, colin, fill ); /*--------------------------------------------------------------------------*/ /* Draw a circle with center at ( X1,Y1 ) of radius RAD. */ /*--------------------------------------------------------------------------*/ X = &x1; Y = &y1; LINE = 0; ANGLE = 0.00; ROTATE = 360.00; SIZE = &rad; STYLE = "&fill"; IF "&colin" =: '*' THEN ; ELSE color = "&colin"; FUNCTION = "PIE"; output; %MEND dot; %let size_cur = 0.05; %let size_30 = 0.09; %let size_60 = 0.11; %let size_90 = 0.15; data ANNO; length function style color $ 8 position $ 1 ; retain xsys ysys "2" hsys "3" when "a" ; set WORK.CORD_MARKER; position = "E"; segment = 1; if days_buck = -1 then do; %dot( X, Y, &size_cur , green, solid ); end; else if days_buck = 1 then do; %dot( X, Y, &size_30 , orange, solid ); end; else if days_buck = 30 then do; %dot( X, Y, &size_30 , blue, solid ); end; else if Days_Buck = 60 then do; %dot( X, Y, &size_60 , purple, solid ); end; else do; %dot( X, Y, &size_90 , red, solid ); end; run;
  • 15. Issues: - Gaps in map where there is a national park (e.g. 1/3 of Tasmania missing!) - Some SSD to Postcode mappings are incorrect - Some postcodes overlap states - No feature maps How much work was involved While the build took place over 2 years (not full time!), development is quite straight forward. The hardest part was finding the best map file on ABS, then linking that data to the Australia Post postcode file. There are inconsistencies in the ABS data which had to be rectified (e.g. postcodes in the wrong position or overlapping). Once the base geospatial files were finalised, it is a relatively short process to create a semi-automated solution in Enterprise Guide.