2. Overview
Visualisation of geographic data is essential to
understanding the distribution of attributes across
Australia.
Retail Credit Risk has developed software to enable easy
generation of maps for any desired statistic through SAS
Enterprise Guide.
Once the data is prepared based on postcode, a fully
automated process will generate a desired map in under a
minute.
The program can be easily extended enabling analysts to
view data from many angles.
It is an essential tool which can be easily added to existing
MI packs.
SUCCESS STORY:
One of our reports rarely received feedback from Executives. Once we
placed a map visualising the data, we immediately received requests for
more information!
3. Visual inspection of data is essential to understanding regional trends and hotspots
Visualising Data
Large result tables are difficult to place into perspective,
and don’t allow the reader to see the big picture. The
ability to visualise data adds meaning to reports and
facilitates the communication of a message. Comparing
the map above with the table, we can easily see where
areas are experiencing stress. This information is lost in
the table. It is difficult to place all the results in one table
and communicate the intended message.
Area
Number of
Accounts
Delinquent
Accounts
Percentage
Delinquent
Ave
Delinquent
LVR
Ave
Delinquent
Total Limits
Ave
Delinquent
EOM
Balance
Ave
Delinquent
Amount
TUGUN - QLD 35 6 17.10% 66 262,803 -284,825 -22,022
ADELAIDE - SA 85 7 8.24% 81 501,052 -519,162 -18,110
AUSTINVILLE - QLD 143 11 7.69% 84 259,973 -232,103 27,870
BLACKSTONE HEIGHTS - TAS 94 7 7.45% 84 174,336 -180,224 -5,888
BERKSHIRE PARK - NSW 74 5 6.76% 56 257,738 -275,449 -17,711
ASCOT - QLD 85 5 5.88% 83 261,538 -268,410 -6,872
BARANGAROO - NSW 139 8 5.76% 69 570,543 -893,595 -65,000
BADGIN - WA 98 5 5.10% 66 175,961 -183,021 -7,061
BENOWA - QLD 139 7 5.04% 88 273,636 -281,545 -7,909
ABERCROMBIE - NSW 127 6 4.72% 82 221,895 -227,025 -5,131
BALLIMORE - NSW 147 6 4.08% 86 206,946 -215,589 -8,643
AROONA - QLD 202 8 3.96% 90 315,982 -326,191 -10,209
BAMARANG - NSW 102 4 3.92% 55 224,548 -239,354 -14,806
COLLINGWOOD PARK - QLD 104 4 3.85% 68 288,195 -302,357 -14,162
BRANDITT - VIC 131 5 3.82% 84 375,463 -389,270 -13,808
CLEAR ISLAND WATERS - QLD 131 5 3.82% 83 422,059 -437,013 -14,954
MACGREGOR - QLD 162 6 3.70% 85 261,405 -265,832 -4,427
BASIN POCKET - QLD 191 7 3.66% 71 212,036 -220,254 -8,219
BELGRAVIA - NSW 221 8 3.62% 73 244,405 -249,963 -5,558
ASHCROFT - NSW 196 7 3.57% 80 316,575 -323,112 -6,536
ARUNDEL - QLD 198 7 3.54% 71 246,028 -254,343 -8,315
BROADBEACH - QLD 143 5 3.50% 79 319,676 -334,795 -15,119
CONISTON - NSW 116 4 3.45% 58 222,348 -227,042 -4,693
AMBERLEY - QLD 119 4 3.36% 65 234,315 -247,949 -13,634
CRESTMEAD - QLD 125 4 3.20% 51 173,490 -177,129 -3,639
FRENCHS FOREST - NSW 161 5 3.11% 78 247,681 -253,731 -6,050
NORMANHURST - NSW 198 6 3.03% 80 323,569 -342,623 -19,054
COLLAROY - NSW 133 4 3.01% 85 283,678 -291,542 -7,864
HMAS PLATYPUS - NSW 100 3 3.00% 80 929,211 -965,459 -36,248
GILBERTON - QLD 100 3 3.00% 92 420,057 -445,315 -25,258
DINGLEY VILLAGE - VIC 137 4 2.92% 45 373,832 -380,187 -6,355
BURLEIGH DC - QLD 138 4 2.90% 84 384,508 -889,992 -70,000
PADSTOW - NSW 138 4 2.90% 58 148,621 -155,076 -6,455
COOLAROO - VIC 138 4 2.90% 80 243,244 -249,687 -6,443
KINGSTON - QLD 141 4 2.84% 85 345,028 -359,125 -14,097
PARALOWIE - SA 211 6 2.84% 82 178,641 -192,291 -13,649
ALBURY - NSW 142 4 2.82% 69 260,287 -267,349 -7,062
ALBION - VIC 287 8 2.79% 81 200,088 -207,630 -7,542
KUNYUNG - VIC 109 3 2.75% 0 90,000 -120,629 -30,629
ABBOTSFORD - QLD 291 8 2.75% 56 198,639 -203,405 -4,766
CANNON HILL - QLD 146 4 2.74% 79 444,927 -471,691 -26,764
GARDEN CITY - VIC 147 4 2.72% 73 706,434 -758,754 -52,320
ADVANCETOWN - QLD 331 9 2.72% 79 332,900 -341,796 -8,897
CALAMVALE - QLD 148 4 2.70% 71 381,881 -394,281 -12,400
AMAMOOR - QLD 112 3 2.68% 27 221,188 -229,098 -7,910
MAWSON LAKES - SA 112 3 2.68% 92 156,710 -159,847 -3,137
MILDURA - VIC 149 4 2.68% 85 210,203 -214,821 -4,618
ALFREDTOWN - NSW 150 4 2.67% 67 242,365 -253,479 -11,114
ALFRED COVE - WA 374 1 0.27% 60 20,000 -20,278 -278
CLOVERDALE - WA 480 1 0.21% 70 267,700 -270,841 -3,141
KINGSLEY - WA 1206 3 0.21% 54 229,127 -233,828 -4,701
/// Another 500 Lines ///
4. Development of SAS Geospatial software allows easy mapping any data containing
Australian Postcodes
SAS Geospatial Software
The development of geospatial software was created in-
house. Based on mapping information freely available
from the ABS and Australia Post, we have developed a user
friendly approach to very quickly produce an Australian
map many different statistics.
The analyst only needs to produce a table containing
postcode level data, including the information required for
charting. Once this data is linked into the software, it takes
less that 1 minute to produce a map.
Maps can be displayed at State, Fitch Region or Postcode
level. Different maps can also be displayed for each state.
Sales Targets By Postcode
Sales Targets By Fitch Region
5. Example: Distribution by Fitch Region
Maps can be easy generated
To create a map, all that is required is two columns of
data - the Postcode, and the attribute you wish to map.
The attribute need not represent numerical data, but
can also include categorical information.
Examples of Maps Include:
• Product sales or application volumes
• Credit Card Fraud
• Default Rates
• Scorecard Results
• Cross Sale Volume
• Loan Utilization
Australian Delinquency Rates By Fitch Region
6. Data can be summarised at various levels and the analyst can easily create
options to create different desired maps.
Customisation Through SAS Enterprise Guide
An Enterprise Guide project can easily be set up
to handle any segmentation or statistics.
In this case, the project has been created for
easy re-use.
The analyst can select the map attributes,
including the statistic (mean, count, min, max),
the display level (Postcode, State or Fitch Region)
and a few other options to create the map
desired.
This gives the analyst a chance to consider
results from many perspectives.
The process of producing maps can then be
performed without changing any code, making a
short learning curve for new starters.
7. Allows zooming in of maps dynamically to see greater detail
Zooming In
Dynamic maps can be produced that enable the analyst to zoom in on regions for finer detail. This is done
through a web browser using the Graph Toolbar.
Australia Zoom to Central East NSW Zoom Further to Sydney Area
8. Auto RAG Vs User Defined RAG Status
RAG (Red Amber Green, or Traffic Lights) Status
RAG status can be set to predefined values in order to line
up with existing triggers. However, often it is necessary to
assign a meaningful RAG status for data we have not
visualised before.
The software can automatically generate RAG status based
on percentiles.
White: 0 observations
Green: Less that 75th percentile
Amber: Less than the 90th percentile
Red: Higher than the 90th percentile
Once the analyst has reviewed the outputs, the RAG status
can be manually set at the levels desired with little
programming.
Auto RAG
User-Defined RAG
9. Information Sources:
ABS: http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/1270.0.55.003July%202011?OpenDocument
Google Maps API: http://maps.googleapis.com/maps/api/geocode...
The MapInfo Interchange File (MIF): Format Specification, October 1999, (Mapinfo_Mif.pdf), MapInfo Corporation
Australia Post for Postcode Mappings: http://auspost.com.au
What we need to do:
• Decode the mapinfo file
• Link in the Australia Post postal zones
• Create a valid GMAP dataset
• Create a DENSITY variable
• Link in our postcode data we wish to plot
• Run proc gmap
WHY NOT USE PROC MAPIMPORT TO PROCESS THE MAPINFO FILE?
PROC MAPIMPORT will import native Shapefiles. However, there are limits on the size of the input data it will accept.
The ABS map files are too big!
Therefore we must reverse engineer the files and create our own data.
10. SAS GMap Data Format
We use PROC GMAP to create our maps. SAS Expects the mapping data in a particular format.
• Boundaries are represented by polygons
• The polygon coordinates must be in the order in which they are drawn. I created an index variable, which allows me
to sort the polygon data regardless of the subset chosen.
• Boundaries between joining polygons must align
• Polygon boundaries must end with X and Y = missing
• See SAS Help – The GMAP procedure for further specifications.
GREDUCE is used to determine common boundaries and create a density variable
which enables you to specify the level of detail of your map.
1= very granular, 6 = high detail.
• The input map data set must be a traditional map data set and contain these variables:
• a numeric variable named X that contains the horizontal coordinates of the map boundary points.
• a numeric variable named Y that contains the vertical coordinates of the map boundary points.
• X and Y correspond to the Latitude and longitude of the map
• one or more identification variables that uniquely identify the unit areas in the map. These variables are listed in the ID
statement.
It also can contain:
• one or more variables that identify groups of unit areas (for BY-group processing)
• the variable SEGMENT, which distinguishes noncoterminous segments of the unit areas.
• Any other variables in the input map data set do not affect the GREDUCE procedure.
Source: SAS Help – GREDUCE Procedure
Perfect
alignment here
11. Mapinfo Data Format
VERSION 450
DELIMITER ","
CoordSys Earth Projection 1, 116 Bounds (96,-45) (160,-8)
COLUMNS 4
SSC_CODE_2011 char(5)
SSC_NAME_2011 char(45)
CONF_VALUE char(12)
AREA_ALBERS_SQKM float
DATA
BRUSH(1,0)
REGION 1
6
149.849264992 -36.6253990025
149.848946016 -36.6258159925
149.848335008 -36.626610993
149.84918 -36.6242170005
149.849251008 -36.624721995
149.849264992 -36.6253990025
BRUSH(1,0)
REGION 2
6
148.934595296 -36.2314241865
148.934515232 -36.231766048
148.935468832 -36.233815737
153.342782016 -29.480945996
153.342276 -29.481940001
153.342455008 -29.4828790055
BRUSH(1,0)
REGION 3
5
152.113704064 -32.7755481515
152.114273696 -32.775713449
152.114248096 -32.774133364
152.11393104 -32.7749443485
152.113704064 -32.7755481515
5
152.080612992 -32.789413994
152.080636 -32.7895320055
152.080721984 -32.7893500025
152.080646016 -32.789302994
152.080612992 -32.789413994
3
152.079690016 -32.789695009
152.07966 -32.7895769975
152.079690016 -32.789695009
BRUSH(1,0)
NONE
NONE
There are 2 Mapinfo files - .mid and .mif. We are only
interested in the .mif. This is a text file which consists
of:
• Header
• Various pen commands (control colour etc)
• Commands to designate each region
• The polygon definitions for each region – each
region can have several polygons (e.g. there may
be a lake inside the main polygon).
• A number on a line by its own, designating the
number of points in the polygon.
• Ends with a NONE command
To process the file we:
• Ignore the header and BRUSH commands
• SCAN each line
• If we find a REGION label, increment out REGION
counter
• If we find an integer on its own, increment a
POLYGON counter, but keep it associated with the
current region
• Any other case, extract the LAT and LONG
coordinates
• Stop processing when we find the NONE label.
These regions
only have 1
polygon
This region
consists of 3
polygons
12. Creating Your Map Using the GMAP procedure
PROC SORT DATA = WORK.MAP_DATA
OUT = WORK.MAP_DATA_SORTED;
BY STATE REGION INDEX;
RUN;
PROC GMAP MAP = WORK.MAP_DATA_SORTED ;
ID REGION;
CHORO MAP_DATA_FORMATED /
MISSING
COUTLINE= GREY;
BY STATE
;
RUN;
Standard Map
With a little extra coding you
can created animated maps
Melbourne
13. Using Google Maps API to plot street addresses
If you don’t have the coordinate data for customer addresses, you can feed it to Google Maps . You can then overlay the points on a
map;
Advantages: Free, the API will generally interpret misspelling correctly
Disadvantages: Can only download 1,200 addresses per day – subject to a misuse policy.
Sample address to submit to Google API:
http://maps.googleapis.com/maps/api/geocode/xml?address=5 jessel place,duncraig,wa,AUSTRALIA&sensor=true
Returns XML data containing the Lat and Long of the address
<?xml version="1.0" encoding="UTF-8" ?>
- <GeocodeResponse>
<status>OK</status>
- <result>
<type>street_address</type>
<formatted_address>5 Jessel Pl, Duncraig WA 6023,
Australia</formatted_address>
- <address_component>
<long_name>5</long_name>
<short_name>5</short_name>
<type>street_number</type>
</address_component>
BLAH BLAH BLAH
- <southwest>
<lat>-31.8431920</lat>
<lng>115.7773300</lng>
</southwest>
- <northeast>
<lat>-31.8404940</lat>
<lng>115.7800280</lng>
</northeast>
</viewport>
</geometry>
</result>
</GeocodeResponse>
14. Using Google Maps API
Using Excel and VBA to download addresses:
Using VBA to automate the download of addresses is straight forward using the QueryTables collection object. The results can be fed into a SAS
table and overlayed on a map using the ANNOTATE option to GMAP.
Sub getGeoData()
For Each q In Sheets("google").QueryTables
Debug.Print q.Name
q.Delete
Next q
startRow = 26510
resultsStart = 27763
Address = Data.Range("H" & startRow)
conn = "URL;http://maps.googleapis.com/maps/api/geocode/xml?address=" & Address &
"&sensor=true"
With Google.QueryTables.Add(Connection:=conn, Destination:=Google.Range("B1"))
.Refresh (False)
End With
For r = startRow To 30000
If Data.Range("H" & r) <> Data.Range("H" & r - 1) Then
Address = Data.Range("H" & r)
conn = "URL;http://maps.googleapis.com/maps/api/geocode/xml?address=" & Address &
"&sensor=true"
With Google.QueryTables(1)
.Connection = "URL;http://maps.googleapis.com/maps/api/geocode/xml?address=" & Address
& "&sensor=true"
.Refresh (False)
End With
If Google.Cells(3, 2).Value = "OVER_QUERY_LIMIT" Then Exit Sub
If Google.Cells(3, 2).Value <> "ZERO_RESULTS" Then
Results.Cells(resultsStart, 1) = Data.Cells(r, 1)
Results.Cells(resultsStart, 2) = Google.Cells(50, 1)
Results.Cells(resultsStart, 3) = Google.Cells(51, 1)
resultsStart = resultsStart + 1
End If
Google.Cells(52, 1) = r
Google.Cells(53, 1) = r - startRow
End If
DoEvents
Next r
End Sub
%MACRO dot( x1, y1, rad, colin, fill );
/*--------------------------------------------------------------------------*/
/* Draw a circle with center at ( X1,Y1 ) of radius RAD. */
/*--------------------------------------------------------------------------*/
X = &x1;
Y = &y1;
LINE = 0;
ANGLE = 0.00;
ROTATE = 360.00;
SIZE = &rad;
STYLE = "&fill";
IF "&colin" =: '*' THEN ; ELSE color = "&colin";
FUNCTION = "PIE"; output;
%MEND dot;
%let size_cur = 0.05;
%let size_30 = 0.09;
%let size_60 = 0.11;
%let size_90 = 0.15;
data ANNO;
length function style color $ 8 position $ 1 ;
retain xsys ysys "2" hsys "3" when "a" ;
set WORK.CORD_MARKER;
position = "E";
segment = 1;
if days_buck = -1 then do;
%dot( X, Y, &size_cur , green, solid );
end;
else if days_buck = 1 then do;
%dot( X, Y, &size_30 , orange, solid );
end;
else if days_buck = 30 then do;
%dot( X, Y, &size_30 , blue, solid );
end;
else if Days_Buck = 60 then do;
%dot( X, Y, &size_60 , purple, solid );
end;
else do;
%dot( X, Y, &size_90 , red, solid );
end;
run;
15. Issues:
- Gaps in map where there is a national park (e.g. 1/3 of Tasmania missing!)
- Some SSD to Postcode mappings are incorrect
- Some postcodes overlap states
- No feature maps
How much work was involved
While the build took place over 2 years (not full time!), development is quite straight forward. The hardest part was finding
the best map file on ABS, then linking that data to the Australia Post postcode file. There are inconsistencies in the ABS data
which had to be rectified (e.g. postcodes in the wrong position or overlapping).
Once the base geospatial files were finalised, it is a relatively short process to create a semi-automated solution in Enterprise
Guide.