SlideShare une entreprise Scribd logo
1  sur  9
Télécharger pour lire hors ligne
Integrating SAS® and Geographic Information Systems for Regional Land Use Planning
                              Bill Bass, Houston-Galveston Area Council, Houston, Tx

ABSTRACT

The Houston-Galveston Area Council (H-GAC) provides regional socio-economic and land-use forecasting analysis for the
13-counties surrounding the Houston metropolitan area. Forecasting efforts require the integration of geographic data and
large amounts of tabular data from various sources, such as parcel boundary datasets and county appraisal records. H-GAC
uses SAS® in conjunction with ESRI® ArcGIS® geographic information systems (GIS) software to produce a comprehensive
land-use database for the 13-county region. The integrated process involves millions of appraisal data records as well as
large volumes of geographic data. Through the combined use of SAS® and GIS, H-GAC is able to streamline the data
development process, over using other SQL and desktop database technologies.

INTRODUCTION

H-GAC is the region-wide voluntary association of local governments in the 13-county Gulf Coast Planning region of Texas. It
is one of several Council of Government organizations (COGs) in the State of Texas, and services 12,500 square miles with
more than 5.7 million people. H-GAC is governed by a Board of Directors composed of local elected officials who serve on
the governing bodies of member local governments. There are 35 members on the H GAC Board. H-GAC provides many
tools, information, region-wide plans, and services to support municipalities, districts, and non-profit organizations. H-GAC's
mission is to serve as the instrument of local government cooperation, promoting the region's orderly development and the
safety and welfare of its citizens (H-GAC 2008). One of H-GAC’s programs includes regional socio-economic modeling.

The Socioeconomic Modeling group is an information and research hub in the Community and Environmental Planning
department that gathers, processes, generates, analyzes, and disseminates information on the past, present, and future
land use, economy, and population of our region in order to support comprehensive regional operations and planning (H-
GAC 2008). The primary purpose of forecasting efforts within the socio-economic group is to support Travel Demand
Modeling which is used in Regional Transportation Planning (RTP). However, H-GAC also uses socio-economic products for
other long range planning purposes that involve environmental conservation, water quality, and urban planning.

Due to the large amount and complexity of the data obtained for use in socio-economic modeling, SAS® is used for a variety
of functions including: data development, data organization, statistical analysis, and integration of data across multiple
databases. This paper will explain how H-GAC’s Socio-Economic Modeling group uses SAS® in conjunction with GIS to
develop regional land use data, which is one component of the overall regional modeling framework employed at H-GAC.

PROCESSING OF COUNTY PARCEL BOUNDARY AND APPRAISAL DISTRICT DATA

H-GAC obtains appraisal data from each of the 13 County Appraisal District (CAD) offices where data is electronically
available. Appraisal data are typically very large datasets that cover a wide variety of attributes regarding parcels (real
property) within each county. Some of the data attributes included in appraisal roll datasets are:

         Valuation of land and improvements (e.g. buildings)
         Land usage through the State Classification Coding framework
         Ownership and legal descriptions of property
         Taxing entities and exemptions
         Square footage and structural amenities



                                                                1
In addition, H-GAC obtains parcel boundary datasets from the counties that compliment the appraisal roll data. Parcel
boundaries are typically provided in industry standard shapefile formats that can be viewed in GIS software, such as ESRI®
ArcView® or ArcInfo® products. In many cases, the parcel boundary data and appraisal roll data are not related in a manner
that allows for usage as a relational database system; although they do have common fields in both datasets, such as
Account Number. Furthermore, data schemas for datasets are not standardized across county appraisal systems, and thus
yield a variety of source data layouts and structures with a variety of field naming conventions.

ISSUES AND CHALLENGES IN WORKING WITH APPRAISAL DATA

Through H-GAC’s efforts in working with appraisal roll data, a number of challenges have been identified and overcome in
order to develop a comprehensive regional appraisal database. These challenges exist in both the appraisal roll dataset that
contain the property attribute data, as well as in the GIS parcel boundary datasets. Challenges for working with appraisal
roll data include:

         Multiple datasets stored within a single text file, each with their own unique data schema
         The need to convert data imported as character format to numeric, and numeric data to character
         Cleanup of data entry errors such as leading and trailing spaces for primary key fields
         Replacing zero values with NULL values to prevent errors when analyzing data

There are also challenges in working with the appraisal parcel boundary data due to the nature in which data is stored
within the county GIS systems. For instance, it is typical for a parcel to have one or more account numbers affiliated with
each parcel (multiple-owners), or multiple accounts to a parcel such as with a high-rise condominium complex. Instances
such as these are typically stored through a means of “stacking” identical parcels on top of one another within the GIS, but
giving each parcel feature a different Account Number. Although this may provide for an effective end product for viewing
ownership at the parcel level using a single table format, it does not support the establishment of a topologically integrated
geographic database, where a single parcel of land can have one or more owners, which is typically represented through a
more relational database structure.

In the following section, these issues and challenges will be explained in detail, as well as how GIS and SAS® are used
together to develop standardized data for the region.

APPRAISAL ROLL DATA DEVELOPMENT

Writing SAS® INFILE statements can be lengthy when setting up SAS® code to import data files, and appraisal data is no
exception. It is common for an appraisal roll dataset to contain more than 100 fields that each need to be listed in the
INFILE statement. Therefore an Excel® spreadsheet is used to help generate SAS® code that can be imported into the SAS®
editor file. Through the use of Excel® formulas and hard-coded text strings, a list of field names can be loaded into an Excel®
spreadsheet, and from there used to generate the INFILE, LENGTH, and INPUT portions of the DATA STEP statement. This
method reduces data entry errors as field names are copied, not typed, and saves time.

Once data is imported into SAS® dataset format, additional SAS® code is written to clean-up and standardize the datasets
into a common data structure for datasets from all counties in the region. Through the use of standardized dataset and
fieldnames and formats, the development of data is greatly simplified and aids in the data being used more efficiently when
doing analysis of appraisal data. The following are some examples of how SAS® is used to clean-up and standardize the
appraisal attribute data.

Attribute data is typically provide in either one or several flat-file layouts. These are typically delimited text files using either
comma or tab delimiters. In some cases multiple types of dataset are stored in a single text file, and thus, SAS® is used to


                                                                  2
determine which records to read. For example, it is common for not only the appraisal data that includes ownership,
valuation, and land use to exist in one file, but also for summary data that aggregates valuations by subdivision to be in the
same file. Through the use of SAS®, a statement such as the one illustrated below can process the file, only importing the
records that represent the appraisal roll data. In many cases, multiple import statements are used, so that each type of data
can be loaded into a separate SAS® table. The following is an illustration of a conditional import statement that only loads
records that have a Record Type of ‘4’ in the source file.

Data Appraisal_Data Other_Data;
      Infile 'Input_file.txt'                                  *Name of flat file to load;
      MISSOVER
      lrecl=5000;

         *Following code specifies field attributes used in conditional processing;
         Length Record_Type $ 1;             *Initializes record type field;
         Input Record_Type $ 61-61 @;        *Defines location of record type value in
                                             flat file, @ forces SAS to use buffer to
                                             evaluate condition and prevents skipped
                                             records;
         *Following code is conditional processing to only load certain record types;
         If Record_Type ='4' Then Do;        *Only loads record types with a value of
                                             ‘4’;
               Length                        *Initializes and defines other fields in
                                             flat file to import;
                      First_Field $ 10
                      Second_Field $ 50;     *Notice that the Record_Type variable is
                                             not used here;
               Input
                      First_Field $ 1-10
                      Second_Field $ 11-60
                      Record_Type $61;
               Output Appraisal_Data;        *Name of dataset to write data;
               End;
         Else Output Other_Data;             *Puts all other records into a scratch
                                             dataset, not used;
Run;

Once data is loaded into SAS®, additional SAS® statements are used to assist in further cleaning the data. For instance, it is
common for some fields to be initially imported as text formats, when in fact they should be defined as numeric. The same
holds true for some attributes that are imported as numeric when they should be text (e.g. numbers that have leading
zeros). The following are two examples of code that are used in SAS® DATA STEP statements to handle these conversion
scenarios.

*Code for converting values from Numeric (N) to Character                               (C);
C = Strip(Put(N,10.));                    *Where ‘10.’ is                               the desired character
                                          length;
*Code for converting values From Character (C) to Numeric                               (N);
N=Input(C,8.0);                           *Where ‘8.0’ is                               a numeric informat;

Another example of data cleanup that is performed on SAS® datasets is that of replacing zero values with NULL values. For
appraisal data, it is typically not sufficient to note some values as being zero. Consider the value of land and improvement.

                                                               3
These items, if they exist, have a value associated with them. If the value is not known, then it should not be zero, but
rather NULL so as to not skew statistical analysis. For land values in the appraisal roll data that contain a value of ‘0’, those
are changed to be NULL, as all land has a value. The same holds true for improvement values, where if an improvement
exists, it should have a value greater than zero, so any values of zero are changed to NULL. These changes are performed
using a simple IF THEN statement in SAS® to look for zero value and modify the value to be NULL.

*Replaces zero values with NULL values;
If Land_Value = 0 Then Land_Value = .;
If Improvement_Value = 0 Then Improvement_Value = .;

In some instances, data such as zip codes are provides as either aggregated values (e.g. 77027-1234) or separated values in
their own fields (e.g. 77027, 1234). H-GAC chooses to store zip code data as two separate fields, so for some counties
where the data is only provided in an aggregated format, the SAS® SUBSTR statement is used. The following is an example
of how two separate zip code field are created from a single aggregate zip code field.

*Code separates 5-digit zip prefix from 4-digit suffix;
Zip_Code = Substr(Orig_Zip,1,5);          *Reads and stores values of positions 1
                                          thru 5;
Zip_Code_Plus4 = Substr(Oriz_Zip,7,4);    *Reads and stores values of positions 7
                                          thru 10;

Finally, in some instances primary key fields and field with formatted codes are missing characters or proceeded by spaces.
This can cause issues when trying to join data in multiple tables, as SQL typically views spaces as valid characters, thus a
value of ‘R1234’ in one table is not the same as a value preceded by a space such as ‘ R1234’ in another table, with the
latter value being a data entry error. To resolve these issues, a DATA STEP statement is used to remove spaces from fields
as in the example provides. The following is an example of such as statement.

*Removes leading and trailing spaces from account number field;
Acct_Num = Strip(Acct_Num);

The end result of using SAS® to process appraisal roll data, is a standardized set of SAS® datasets for each county that have
common fields and naming conventions for attributes such as owners, legal descriptions, land value, improvement value,
and state classification code. Although each set of appraisal data from the county includes far more than just the
standardized fields used by H-GAC, these additional fields are not dropped. Instead they are appended to the end of the
common variables. From this point, analysis can be run against the SAS® appraisal roll datasets and reports generated, and
if needed, exported to other formats such as Excel®, DBF, or delimited files.

GIS PARCEL BOUNDARY DATA DEVELOPMENT

In additional to performing quality review on attribute data, SAS® is also used to assist in the cleanup of geographic parcel
boundary data. Depending upon the type of parcel (residential, commercial, mixed use, etc), parcel features in the GIS
dataset may involve multiple features ‘stacked’ on top of one another, with each feature having a corresponding account
numbers. For instance, if there were two owners of a single parcel of land, each with their own account number for a
single-family residential property, there may be two spatially and geometrically identical polygon features, each with the
account number for the corresponding owner for which it represents. Therefore H-GAC uses ESRI® ArcGIS® in conjunction
with SAS® to create a single polygon for these features, but retain the multiple account number assignments. In effect, the
flat file structure of the appraisal GIS dataset is transformed into a more extensive relational database system, capable of
supporting complex analysis.



                                                                 4
First, GIS is used to calculate a centroid value for each polygon in the original parcel dataset, which is expressed as an X/Y
coordinate. Think of an X/Y value as being latitude and longitude values, and if two geometrically identical parcels are
stacked on top of one another in the same geographic space, both will have the same X/Y coordinate value, or centroid
location.

Next, the parcel dataset is then processed using a method called Dissolving, where each polygon is grouped and simplified
based on some common value, in this case the X/Y coordinate. The result of the dissolve process is a new dataset that
contains only one parcel boundary to a defined space, where before there may have been multiple parcels stacked on top
one another. This new dataset also retains the X/Y coordinate value of the final aggregated polygons. If a parcel is not
stacked on top of another parcel to begin with, then the dissolve process merely takes the single parcel and places it into
the new dataset.

What exists at this point are two GIS datasets:

         The original parcel boundaries, which contain stacked and non-stacked parcels, each with their respective account
         numbers and X/Y coordinate; and,
         The dissolved parcel boundaries, which contains only one parcel to an area of land and an X/Y coordinate of each
         parcel

For the newly created dissolved parcel boundaries dataset, each parcel is given a unique parcel identification code, or
Parcel ID. The Parcel ID field serves as the primary key for this dataset. Then both parcel datasets are exported to a
shapefile format, which stores attribute data such as X/Y coordinate, Account Number, and Parcel ID in a DBF data table.

At this point, this is where SAS® assists in the integration of the two datasets into a relational database structure. Due to
the large amount of data to be processed for each county, sometimes upwards of 1 million parcels, SAS® is very efficient in
handling this volume of data. Using SAS® IMPORT statements as illustrated below, both DBF tables are loaded into SAS®.

*Loads original parcels data table containing the Account Number, Parcel ID, and
X/Y coordinate of each parcel;
Proc Import Out=Original_Parcels Replace
Datafile= 'c:Original_Parcels.dbf';
Run;

*Loads dissolved parcels data table containing the X/Y coordinate of each parcel;
Proc Import Out=Dissolved_Parcels Replace
Datafile= 'c:Dissolved_Parcels.dbf';
Run;

Next, the two datasets are joined using a PROC SQL LEFT JOIN statement as illustrated below.

*Joins dissolved parcels dataset to original parcels dataset to obtain account
numbers affiliated with each dissolved parcel;
Proc SQL;
      Create Table Parcel_ID_to_Account_Number as
      SELECT X.Parcel_ID, X.XY_Coord, Y.Account_Number
      From    Diss_Parcels AS X
      Left Join
      Orig_Parcels AS Y
      On X.XY_Coord = Y.XY_Coord;
Quit;

                                                                5
The above SAS® statement creates a dataset that contains all Parcel IDs from the dissolved parcels dataset, and their
affiliated Account Numbers from the original dataset. The Parcel ID to Account Number table becomes a critical link
between the parcel boundary GIS data, and the Appraisal Roll property attribute data. Specifically, it allows for the relating
of a single parcel of land to one or more accounts affiliated with that parcel, and then each account to it corresponding
record of detail in the Appraisal Roll dataset. The following section will illustrate how having such a table allows H-GAC to
produce parcel level land use data for the region.

Determination of Land Use from Appraisal Roll Databases

H-GAC uses appraisal data as a basis for determining land use in the 13-county region surrounding the Houston
Metropolitan area. To process large amounts of appraisal data, H-GAC organizes appraisal records by parcel, which can
number upwards of 1 million records for a county, and over 3 million for the region.

However, not just appraisal data is used in the land use determination process, as H-GAC also acquires a variety of other
data related to land use, such as locations of schools, government buildings, infrastructure, and environmental
conservation and park areas. This additional information is used in conjunction with the appraisal roll data to obtain a more
accurate land use determination, where none may exist.

The first step in the process is to assign each appraisal roll record a Parcel ID. As discussed in the prior section, SAS® was
used to process data from the H-GAC GIS to determine parcel assignments for each appraisal account. Using a PROC SQL
LEFT JOIN statement illustrated below, each appraisal roll record is assigned to a parcel.

*Joins appraisal roll to Parcel ID based on Account Number assigned to parcels;
Proc SQL;
      Create Table Appraisal_Roll_Parcel_ID as
      SELECT X.Account_Number, X.Owner_Name, X.Legal, X.State_Class_Code,
Y.Parcel_ID
      From    Harris_Appraisal_Roll AS X
      Left Join
      Parcel_ID_to_Account_Number AS Y
      On X.Account_Number = Y.Account_Number;
Quit;

The result of the query is a table that can be used as the basis for the land use model to determine land use and ownership
by parcel. Since the process is primarily focused on land use, only a few of the many fields available in the Appraisal Roll
dataset are retained for further processing. In order to determine land use, the State_Class_Code field will be the field of
focus, as this field contains two-digit codes that denote the type of property (e.g. single-family residential, commercial,
industrial, etc).

The next step in the process is to determine land use of each parcel based on the State Class Code attribute retained in the
prior query. Each record in the Appraisal Roll dataset is aggregated by the combined values of the Parcel_ID and
State_Class_Code fields. This prevents two different accounts with the same Parcel ID and State Class Code from being
listed more than once. For instance, if account ‘R12345’ had as State Class Code of ‘A1’, and account ‘R45678’ has a State
Class Code of ‘A1’, and both were assigned to Parcel Id ‘HR890’, then all that is needed is a record that lists parcel HR890 as
having a State Class Code of ‘A1’. Alternatively, if one of the State Class Codes for the above two accounts was different, say
‘A2’ for account R45678, then two records would be produced for parcel HR890, one with a State Class Code value of ‘A1’,
and another with a value of ‘A2’. The following is an illustration of the PROC SQL code used for this step in the process.




                                                                6
*Keeps only unique Parcel ID and State Class Code combinations;
Proc SQL;
      Create Table Unique_Parcels_SC AS
      SELECT Distinct(Parcel_ID) AS Unique_Parcel_ID, State_Class_Code,
      Count(State_Class_Code) AS NumberOfDups
      From Appraisal_Roll_Parcel_ID
      GROUP BY Parcel_ID, State_Class_Code
      Having NumberOfDups >= 0;
Quit;

As the next step, two SAS® procedures are used to transpose the vertical records for each parcel, whether it is a single State
Class Code or multiple, into columns. Next those columns are then merged to create a single State Class Code field or SSC.

*Creates counter to identify first Parcel ID record;
Data Unique_Parcels_SC_N (Rename =(Unique_Parcel_ID = Parcel_ID));
      Retain Counter;
      Set Unique_Parcels_SC (Drop = NumberOfDups);
      By Unique_Parcel_ID;
      If First.Unique_Parcel_ID Then Counter = 1;
            Else Counter = Counter +1;
Run;

The result of the above statement is a dataset that numbers each Parcel ID observation in order starting with a value of ‘1’
for the first instance, and then ‘2’, ‘3’, etc if there are additional observations for that Parcel ID. This dataset is then used as
input to the PROC TRANSPOSE statement below.

*Transposes based on Parcel ID for each State Class Code value;
Proc Transpose
      Data =Unique_Parcels_SC_N
      Out = Parcels_SC_Horiz (Drop = _Name_);
      By Parcel_ID;
      Var State_Class_Code;
      ID Counter;
Run;

The result of the above statement is a table that lists each Parcel ID as a record with one or more values in horizontal
attribute columns. Some parcels may have only one State Class Code value, whereas other may have several, and thus the
dataset may have anywhere from one to seven attribute field for each transposed value. Those multiple values are then
merged into a single State Class Code field as illustrated below.

*Creates final transposed parcel to state class code dataset;
Data Parcel_SSC (Keep = Parcel_ID State_Class_Code);
      Set Parcels_SC_Horiz;
      Length State_Class_Code $10;        *Set field size to be sum of all
variables
                                          being merged;
      State_Class_Code = Strip(Strip(_1)||' '||Strip(_2)); *Merges multiple values;
Run;




                                                                 7
The above statement creates a two column table that contains a field for Parcel ID and the merged State Class Code value
stored as SSC. Also, the Strip command is used to remove any leading or trailing spaces as a result of merging fields that
may be empty.

Next the Parcel_SSC table then joined with a Land Use to State Class Code lookup table to assign a land use code for each
parcel. H-GAC has defined approximately 70 land use types and has grouped them into 8 Land Use Categories. The Land Use
to State Class Code lookup table includes the following fields: Land Use Code, Land Use Category, and State Class Code.
Using a PROC SQL LEFT JOIN statement, the Parcel_SSC table is joined to the Land Use to State Class Code lookup table to
obtain the corresponding Land Use Code and Land Use Category information for that parcel based on its State Class Code
value.

At this point, a baseline land use determination is established for each parcel. However, as previously mentioned, H-GAC
has additional information that can supplement the appraisal data to determine a more accurate land use classification.
This supplemental information is helpful, as many appraisal roll records have Exempt status for their State Class Code
values. Exempt properties are typically schools, religious entities, government property, public infrastructure, and natural
areas that are not typically taxed as non-exempt properties. As a separate initiative, H-GAC uses GIS to overlay source data
representing these types of properties on top of the parcel boundary framework, in order to obtain Parcel IDs for each of
these entities. That information for each geographic dataset is then aggregated and place into a single Land Use Overrides
table that contains fields for the Parcel ID and the Land Use Code determined by the nature of the source geographic data
(e.g. school, religious, government owned, park, etc).

As a final step to creating a regional land use dataset, the baseline land use data developed in SAS® is then joined with the
Land Use Overrides table using as series of SAS® statements. This series of statements evaluates each parcel’s override table
value to determine if it is the same as the parcel’s baseline value, and if it is, then the override value is ignored and the
existing land use value determined from the appraisal roll data is retained. This allows for a more accurate tracking of how
land use was determined, and helps to gauge the accuracy of appraisal data over time. Furthermore, if there are any
conflicting values in the override table for a parcel, such as a parcel being listed as both a commercial facility and an
industrial facility, those override records are ignored as well, and an error report table is produced so that the override
values can be investigated further and corrected. What remains following the override audit steps are a final list of land use
codes that should replace the existing baseline land use determination values. The override values are then joined to the
baseline land use table and a final land use code is determined for each parcel, where a valid override value exists, and for
those parcels that do not have a match with the override table, they retain their baseline value.

As a final output of the land use model, SAS® is used to create land use datasets that can be joined with GIS datasets using
the Parcel ID value. This allows for a simplified method in which to produce regional land use maps. Furthermore, SAS® is
used to summarize the land use table by land use type to determine the amount of acreage in the region for each land use
type. This is accomplished by joining the land use table to a table that lists each parcel and its acreage.

Conclusion

As discussed in this paper, H-GAC uses SAS® as a critical component to determining land use for the region. The regional
land use efforts are not a process that can be accomplished through the use of a single technology or software platform,
but rather by integrating two separate software products. By using the best capabilities of two different systems, ESRI®
ArcGIS® and SAS®, an integrated process has been developed. This process assists in overcoming challenges such as large
volume datasets, quality review/control of variables, and relating multiple datasets from different sources together to
create a comprehensive regional database. Furthermore, it allows H-GAC to conduct regional analysis by standardizing data
across all county geographies.


                                                              8
References

H-GAC (Houston-Galveston Area Council). 2008. www.h-gac.com.

Contact Information

Bill Bass, GISP
Houston-Galveston Area Council
Socio-Economic Modeling
3555 Timmons Lane
Suite 120
Houston, Texas 77027
(713) 499-6687
William.Bass@h-gac.com




                                                        9

Contenu connexe

Dernier

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 

Dernier (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

En vedette

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Integrating SAS and Geographic Information Systems for Regional Land Use Planning

  • 1. Integrating SAS® and Geographic Information Systems for Regional Land Use Planning Bill Bass, Houston-Galveston Area Council, Houston, Tx ABSTRACT The Houston-Galveston Area Council (H-GAC) provides regional socio-economic and land-use forecasting analysis for the 13-counties surrounding the Houston metropolitan area. Forecasting efforts require the integration of geographic data and large amounts of tabular data from various sources, such as parcel boundary datasets and county appraisal records. H-GAC uses SAS® in conjunction with ESRI® ArcGIS® geographic information systems (GIS) software to produce a comprehensive land-use database for the 13-county region. The integrated process involves millions of appraisal data records as well as large volumes of geographic data. Through the combined use of SAS® and GIS, H-GAC is able to streamline the data development process, over using other SQL and desktop database technologies. INTRODUCTION H-GAC is the region-wide voluntary association of local governments in the 13-county Gulf Coast Planning region of Texas. It is one of several Council of Government organizations (COGs) in the State of Texas, and services 12,500 square miles with more than 5.7 million people. H-GAC is governed by a Board of Directors composed of local elected officials who serve on the governing bodies of member local governments. There are 35 members on the H GAC Board. H-GAC provides many tools, information, region-wide plans, and services to support municipalities, districts, and non-profit organizations. H-GAC's mission is to serve as the instrument of local government cooperation, promoting the region's orderly development and the safety and welfare of its citizens (H-GAC 2008). One of H-GAC’s programs includes regional socio-economic modeling. The Socioeconomic Modeling group is an information and research hub in the Community and Environmental Planning department that gathers, processes, generates, analyzes, and disseminates information on the past, present, and future land use, economy, and population of our region in order to support comprehensive regional operations and planning (H- GAC 2008). The primary purpose of forecasting efforts within the socio-economic group is to support Travel Demand Modeling which is used in Regional Transportation Planning (RTP). However, H-GAC also uses socio-economic products for other long range planning purposes that involve environmental conservation, water quality, and urban planning. Due to the large amount and complexity of the data obtained for use in socio-economic modeling, SAS® is used for a variety of functions including: data development, data organization, statistical analysis, and integration of data across multiple databases. This paper will explain how H-GAC’s Socio-Economic Modeling group uses SAS® in conjunction with GIS to develop regional land use data, which is one component of the overall regional modeling framework employed at H-GAC. PROCESSING OF COUNTY PARCEL BOUNDARY AND APPRAISAL DISTRICT DATA H-GAC obtains appraisal data from each of the 13 County Appraisal District (CAD) offices where data is electronically available. Appraisal data are typically very large datasets that cover a wide variety of attributes regarding parcels (real property) within each county. Some of the data attributes included in appraisal roll datasets are: Valuation of land and improvements (e.g. buildings) Land usage through the State Classification Coding framework Ownership and legal descriptions of property Taxing entities and exemptions Square footage and structural amenities 1
  • 2. In addition, H-GAC obtains parcel boundary datasets from the counties that compliment the appraisal roll data. Parcel boundaries are typically provided in industry standard shapefile formats that can be viewed in GIS software, such as ESRI® ArcView® or ArcInfo® products. In many cases, the parcel boundary data and appraisal roll data are not related in a manner that allows for usage as a relational database system; although they do have common fields in both datasets, such as Account Number. Furthermore, data schemas for datasets are not standardized across county appraisal systems, and thus yield a variety of source data layouts and structures with a variety of field naming conventions. ISSUES AND CHALLENGES IN WORKING WITH APPRAISAL DATA Through H-GAC’s efforts in working with appraisal roll data, a number of challenges have been identified and overcome in order to develop a comprehensive regional appraisal database. These challenges exist in both the appraisal roll dataset that contain the property attribute data, as well as in the GIS parcel boundary datasets. Challenges for working with appraisal roll data include: Multiple datasets stored within a single text file, each with their own unique data schema The need to convert data imported as character format to numeric, and numeric data to character Cleanup of data entry errors such as leading and trailing spaces for primary key fields Replacing zero values with NULL values to prevent errors when analyzing data There are also challenges in working with the appraisal parcel boundary data due to the nature in which data is stored within the county GIS systems. For instance, it is typical for a parcel to have one or more account numbers affiliated with each parcel (multiple-owners), or multiple accounts to a parcel such as with a high-rise condominium complex. Instances such as these are typically stored through a means of “stacking” identical parcels on top of one another within the GIS, but giving each parcel feature a different Account Number. Although this may provide for an effective end product for viewing ownership at the parcel level using a single table format, it does not support the establishment of a topologically integrated geographic database, where a single parcel of land can have one or more owners, which is typically represented through a more relational database structure. In the following section, these issues and challenges will be explained in detail, as well as how GIS and SAS® are used together to develop standardized data for the region. APPRAISAL ROLL DATA DEVELOPMENT Writing SAS® INFILE statements can be lengthy when setting up SAS® code to import data files, and appraisal data is no exception. It is common for an appraisal roll dataset to contain more than 100 fields that each need to be listed in the INFILE statement. Therefore an Excel® spreadsheet is used to help generate SAS® code that can be imported into the SAS® editor file. Through the use of Excel® formulas and hard-coded text strings, a list of field names can be loaded into an Excel® spreadsheet, and from there used to generate the INFILE, LENGTH, and INPUT portions of the DATA STEP statement. This method reduces data entry errors as field names are copied, not typed, and saves time. Once data is imported into SAS® dataset format, additional SAS® code is written to clean-up and standardize the datasets into a common data structure for datasets from all counties in the region. Through the use of standardized dataset and fieldnames and formats, the development of data is greatly simplified and aids in the data being used more efficiently when doing analysis of appraisal data. The following are some examples of how SAS® is used to clean-up and standardize the appraisal attribute data. Attribute data is typically provide in either one or several flat-file layouts. These are typically delimited text files using either comma or tab delimiters. In some cases multiple types of dataset are stored in a single text file, and thus, SAS® is used to 2
  • 3. determine which records to read. For example, it is common for not only the appraisal data that includes ownership, valuation, and land use to exist in one file, but also for summary data that aggregates valuations by subdivision to be in the same file. Through the use of SAS®, a statement such as the one illustrated below can process the file, only importing the records that represent the appraisal roll data. In many cases, multiple import statements are used, so that each type of data can be loaded into a separate SAS® table. The following is an illustration of a conditional import statement that only loads records that have a Record Type of ‘4’ in the source file. Data Appraisal_Data Other_Data; Infile 'Input_file.txt' *Name of flat file to load; MISSOVER lrecl=5000; *Following code specifies field attributes used in conditional processing; Length Record_Type $ 1; *Initializes record type field; Input Record_Type $ 61-61 @; *Defines location of record type value in flat file, @ forces SAS to use buffer to evaluate condition and prevents skipped records; *Following code is conditional processing to only load certain record types; If Record_Type ='4' Then Do; *Only loads record types with a value of ‘4’; Length *Initializes and defines other fields in flat file to import; First_Field $ 10 Second_Field $ 50; *Notice that the Record_Type variable is not used here; Input First_Field $ 1-10 Second_Field $ 11-60 Record_Type $61; Output Appraisal_Data; *Name of dataset to write data; End; Else Output Other_Data; *Puts all other records into a scratch dataset, not used; Run; Once data is loaded into SAS®, additional SAS® statements are used to assist in further cleaning the data. For instance, it is common for some fields to be initially imported as text formats, when in fact they should be defined as numeric. The same holds true for some attributes that are imported as numeric when they should be text (e.g. numbers that have leading zeros). The following are two examples of code that are used in SAS® DATA STEP statements to handle these conversion scenarios. *Code for converting values from Numeric (N) to Character (C); C = Strip(Put(N,10.)); *Where ‘10.’ is the desired character length; *Code for converting values From Character (C) to Numeric (N); N=Input(C,8.0); *Where ‘8.0’ is a numeric informat; Another example of data cleanup that is performed on SAS® datasets is that of replacing zero values with NULL values. For appraisal data, it is typically not sufficient to note some values as being zero. Consider the value of land and improvement. 3
  • 4. These items, if they exist, have a value associated with them. If the value is not known, then it should not be zero, but rather NULL so as to not skew statistical analysis. For land values in the appraisal roll data that contain a value of ‘0’, those are changed to be NULL, as all land has a value. The same holds true for improvement values, where if an improvement exists, it should have a value greater than zero, so any values of zero are changed to NULL. These changes are performed using a simple IF THEN statement in SAS® to look for zero value and modify the value to be NULL. *Replaces zero values with NULL values; If Land_Value = 0 Then Land_Value = .; If Improvement_Value = 0 Then Improvement_Value = .; In some instances, data such as zip codes are provides as either aggregated values (e.g. 77027-1234) or separated values in their own fields (e.g. 77027, 1234). H-GAC chooses to store zip code data as two separate fields, so for some counties where the data is only provided in an aggregated format, the SAS® SUBSTR statement is used. The following is an example of how two separate zip code field are created from a single aggregate zip code field. *Code separates 5-digit zip prefix from 4-digit suffix; Zip_Code = Substr(Orig_Zip,1,5); *Reads and stores values of positions 1 thru 5; Zip_Code_Plus4 = Substr(Oriz_Zip,7,4); *Reads and stores values of positions 7 thru 10; Finally, in some instances primary key fields and field with formatted codes are missing characters or proceeded by spaces. This can cause issues when trying to join data in multiple tables, as SQL typically views spaces as valid characters, thus a value of ‘R1234’ in one table is not the same as a value preceded by a space such as ‘ R1234’ in another table, with the latter value being a data entry error. To resolve these issues, a DATA STEP statement is used to remove spaces from fields as in the example provides. The following is an example of such as statement. *Removes leading and trailing spaces from account number field; Acct_Num = Strip(Acct_Num); The end result of using SAS® to process appraisal roll data, is a standardized set of SAS® datasets for each county that have common fields and naming conventions for attributes such as owners, legal descriptions, land value, improvement value, and state classification code. Although each set of appraisal data from the county includes far more than just the standardized fields used by H-GAC, these additional fields are not dropped. Instead they are appended to the end of the common variables. From this point, analysis can be run against the SAS® appraisal roll datasets and reports generated, and if needed, exported to other formats such as Excel®, DBF, or delimited files. GIS PARCEL BOUNDARY DATA DEVELOPMENT In additional to performing quality review on attribute data, SAS® is also used to assist in the cleanup of geographic parcel boundary data. Depending upon the type of parcel (residential, commercial, mixed use, etc), parcel features in the GIS dataset may involve multiple features ‘stacked’ on top of one another, with each feature having a corresponding account numbers. For instance, if there were two owners of a single parcel of land, each with their own account number for a single-family residential property, there may be two spatially and geometrically identical polygon features, each with the account number for the corresponding owner for which it represents. Therefore H-GAC uses ESRI® ArcGIS® in conjunction with SAS® to create a single polygon for these features, but retain the multiple account number assignments. In effect, the flat file structure of the appraisal GIS dataset is transformed into a more extensive relational database system, capable of supporting complex analysis. 4
  • 5. First, GIS is used to calculate a centroid value for each polygon in the original parcel dataset, which is expressed as an X/Y coordinate. Think of an X/Y value as being latitude and longitude values, and if two geometrically identical parcels are stacked on top of one another in the same geographic space, both will have the same X/Y coordinate value, or centroid location. Next, the parcel dataset is then processed using a method called Dissolving, where each polygon is grouped and simplified based on some common value, in this case the X/Y coordinate. The result of the dissolve process is a new dataset that contains only one parcel boundary to a defined space, where before there may have been multiple parcels stacked on top one another. This new dataset also retains the X/Y coordinate value of the final aggregated polygons. If a parcel is not stacked on top of another parcel to begin with, then the dissolve process merely takes the single parcel and places it into the new dataset. What exists at this point are two GIS datasets: The original parcel boundaries, which contain stacked and non-stacked parcels, each with their respective account numbers and X/Y coordinate; and, The dissolved parcel boundaries, which contains only one parcel to an area of land and an X/Y coordinate of each parcel For the newly created dissolved parcel boundaries dataset, each parcel is given a unique parcel identification code, or Parcel ID. The Parcel ID field serves as the primary key for this dataset. Then both parcel datasets are exported to a shapefile format, which stores attribute data such as X/Y coordinate, Account Number, and Parcel ID in a DBF data table. At this point, this is where SAS® assists in the integration of the two datasets into a relational database structure. Due to the large amount of data to be processed for each county, sometimes upwards of 1 million parcels, SAS® is very efficient in handling this volume of data. Using SAS® IMPORT statements as illustrated below, both DBF tables are loaded into SAS®. *Loads original parcels data table containing the Account Number, Parcel ID, and X/Y coordinate of each parcel; Proc Import Out=Original_Parcels Replace Datafile= 'c:Original_Parcels.dbf'; Run; *Loads dissolved parcels data table containing the X/Y coordinate of each parcel; Proc Import Out=Dissolved_Parcels Replace Datafile= 'c:Dissolved_Parcels.dbf'; Run; Next, the two datasets are joined using a PROC SQL LEFT JOIN statement as illustrated below. *Joins dissolved parcels dataset to original parcels dataset to obtain account numbers affiliated with each dissolved parcel; Proc SQL; Create Table Parcel_ID_to_Account_Number as SELECT X.Parcel_ID, X.XY_Coord, Y.Account_Number From Diss_Parcels AS X Left Join Orig_Parcels AS Y On X.XY_Coord = Y.XY_Coord; Quit; 5
  • 6. The above SAS® statement creates a dataset that contains all Parcel IDs from the dissolved parcels dataset, and their affiliated Account Numbers from the original dataset. The Parcel ID to Account Number table becomes a critical link between the parcel boundary GIS data, and the Appraisal Roll property attribute data. Specifically, it allows for the relating of a single parcel of land to one or more accounts affiliated with that parcel, and then each account to it corresponding record of detail in the Appraisal Roll dataset. The following section will illustrate how having such a table allows H-GAC to produce parcel level land use data for the region. Determination of Land Use from Appraisal Roll Databases H-GAC uses appraisal data as a basis for determining land use in the 13-county region surrounding the Houston Metropolitan area. To process large amounts of appraisal data, H-GAC organizes appraisal records by parcel, which can number upwards of 1 million records for a county, and over 3 million for the region. However, not just appraisal data is used in the land use determination process, as H-GAC also acquires a variety of other data related to land use, such as locations of schools, government buildings, infrastructure, and environmental conservation and park areas. This additional information is used in conjunction with the appraisal roll data to obtain a more accurate land use determination, where none may exist. The first step in the process is to assign each appraisal roll record a Parcel ID. As discussed in the prior section, SAS® was used to process data from the H-GAC GIS to determine parcel assignments for each appraisal account. Using a PROC SQL LEFT JOIN statement illustrated below, each appraisal roll record is assigned to a parcel. *Joins appraisal roll to Parcel ID based on Account Number assigned to parcels; Proc SQL; Create Table Appraisal_Roll_Parcel_ID as SELECT X.Account_Number, X.Owner_Name, X.Legal, X.State_Class_Code, Y.Parcel_ID From Harris_Appraisal_Roll AS X Left Join Parcel_ID_to_Account_Number AS Y On X.Account_Number = Y.Account_Number; Quit; The result of the query is a table that can be used as the basis for the land use model to determine land use and ownership by parcel. Since the process is primarily focused on land use, only a few of the many fields available in the Appraisal Roll dataset are retained for further processing. In order to determine land use, the State_Class_Code field will be the field of focus, as this field contains two-digit codes that denote the type of property (e.g. single-family residential, commercial, industrial, etc). The next step in the process is to determine land use of each parcel based on the State Class Code attribute retained in the prior query. Each record in the Appraisal Roll dataset is aggregated by the combined values of the Parcel_ID and State_Class_Code fields. This prevents two different accounts with the same Parcel ID and State Class Code from being listed more than once. For instance, if account ‘R12345’ had as State Class Code of ‘A1’, and account ‘R45678’ has a State Class Code of ‘A1’, and both were assigned to Parcel Id ‘HR890’, then all that is needed is a record that lists parcel HR890 as having a State Class Code of ‘A1’. Alternatively, if one of the State Class Codes for the above two accounts was different, say ‘A2’ for account R45678, then two records would be produced for parcel HR890, one with a State Class Code value of ‘A1’, and another with a value of ‘A2’. The following is an illustration of the PROC SQL code used for this step in the process. 6
  • 7. *Keeps only unique Parcel ID and State Class Code combinations; Proc SQL; Create Table Unique_Parcels_SC AS SELECT Distinct(Parcel_ID) AS Unique_Parcel_ID, State_Class_Code, Count(State_Class_Code) AS NumberOfDups From Appraisal_Roll_Parcel_ID GROUP BY Parcel_ID, State_Class_Code Having NumberOfDups >= 0; Quit; As the next step, two SAS® procedures are used to transpose the vertical records for each parcel, whether it is a single State Class Code or multiple, into columns. Next those columns are then merged to create a single State Class Code field or SSC. *Creates counter to identify first Parcel ID record; Data Unique_Parcels_SC_N (Rename =(Unique_Parcel_ID = Parcel_ID)); Retain Counter; Set Unique_Parcels_SC (Drop = NumberOfDups); By Unique_Parcel_ID; If First.Unique_Parcel_ID Then Counter = 1; Else Counter = Counter +1; Run; The result of the above statement is a dataset that numbers each Parcel ID observation in order starting with a value of ‘1’ for the first instance, and then ‘2’, ‘3’, etc if there are additional observations for that Parcel ID. This dataset is then used as input to the PROC TRANSPOSE statement below. *Transposes based on Parcel ID for each State Class Code value; Proc Transpose Data =Unique_Parcels_SC_N Out = Parcels_SC_Horiz (Drop = _Name_); By Parcel_ID; Var State_Class_Code; ID Counter; Run; The result of the above statement is a table that lists each Parcel ID as a record with one or more values in horizontal attribute columns. Some parcels may have only one State Class Code value, whereas other may have several, and thus the dataset may have anywhere from one to seven attribute field for each transposed value. Those multiple values are then merged into a single State Class Code field as illustrated below. *Creates final transposed parcel to state class code dataset; Data Parcel_SSC (Keep = Parcel_ID State_Class_Code); Set Parcels_SC_Horiz; Length State_Class_Code $10; *Set field size to be sum of all variables being merged; State_Class_Code = Strip(Strip(_1)||' '||Strip(_2)); *Merges multiple values; Run; 7
  • 8. The above statement creates a two column table that contains a field for Parcel ID and the merged State Class Code value stored as SSC. Also, the Strip command is used to remove any leading or trailing spaces as a result of merging fields that may be empty. Next the Parcel_SSC table then joined with a Land Use to State Class Code lookup table to assign a land use code for each parcel. H-GAC has defined approximately 70 land use types and has grouped them into 8 Land Use Categories. The Land Use to State Class Code lookup table includes the following fields: Land Use Code, Land Use Category, and State Class Code. Using a PROC SQL LEFT JOIN statement, the Parcel_SSC table is joined to the Land Use to State Class Code lookup table to obtain the corresponding Land Use Code and Land Use Category information for that parcel based on its State Class Code value. At this point, a baseline land use determination is established for each parcel. However, as previously mentioned, H-GAC has additional information that can supplement the appraisal data to determine a more accurate land use classification. This supplemental information is helpful, as many appraisal roll records have Exempt status for their State Class Code values. Exempt properties are typically schools, religious entities, government property, public infrastructure, and natural areas that are not typically taxed as non-exempt properties. As a separate initiative, H-GAC uses GIS to overlay source data representing these types of properties on top of the parcel boundary framework, in order to obtain Parcel IDs for each of these entities. That information for each geographic dataset is then aggregated and place into a single Land Use Overrides table that contains fields for the Parcel ID and the Land Use Code determined by the nature of the source geographic data (e.g. school, religious, government owned, park, etc). As a final step to creating a regional land use dataset, the baseline land use data developed in SAS® is then joined with the Land Use Overrides table using as series of SAS® statements. This series of statements evaluates each parcel’s override table value to determine if it is the same as the parcel’s baseline value, and if it is, then the override value is ignored and the existing land use value determined from the appraisal roll data is retained. This allows for a more accurate tracking of how land use was determined, and helps to gauge the accuracy of appraisal data over time. Furthermore, if there are any conflicting values in the override table for a parcel, such as a parcel being listed as both a commercial facility and an industrial facility, those override records are ignored as well, and an error report table is produced so that the override values can be investigated further and corrected. What remains following the override audit steps are a final list of land use codes that should replace the existing baseline land use determination values. The override values are then joined to the baseline land use table and a final land use code is determined for each parcel, where a valid override value exists, and for those parcels that do not have a match with the override table, they retain their baseline value. As a final output of the land use model, SAS® is used to create land use datasets that can be joined with GIS datasets using the Parcel ID value. This allows for a simplified method in which to produce regional land use maps. Furthermore, SAS® is used to summarize the land use table by land use type to determine the amount of acreage in the region for each land use type. This is accomplished by joining the land use table to a table that lists each parcel and its acreage. Conclusion As discussed in this paper, H-GAC uses SAS® as a critical component to determining land use for the region. The regional land use efforts are not a process that can be accomplished through the use of a single technology or software platform, but rather by integrating two separate software products. By using the best capabilities of two different systems, ESRI® ArcGIS® and SAS®, an integrated process has been developed. This process assists in overcoming challenges such as large volume datasets, quality review/control of variables, and relating multiple datasets from different sources together to create a comprehensive regional database. Furthermore, it allows H-GAC to conduct regional analysis by standardizing data across all county geographies. 8
  • 9. References H-GAC (Houston-Galveston Area Council). 2008. www.h-gac.com. Contact Information Bill Bass, GISP Houston-Galveston Area Council Socio-Economic Modeling 3555 Timmons Lane Suite 120 Houston, Texas 77027 (713) 499-6687 William.Bass@h-gac.com 9