SlideShare une entreprise Scribd logo
1  sur  46
Télécharger pour lire hors ligne
1
Chicago Area Housing Analysis
A Hedonic Price Study
By
Eric P. Morel
An MMSS Senior Thesis
Advisors
Michael Dacey
Edwin Mills
June 2000
2
ACKNOWLEDGMENTS
First and foremost, I would like to thank Ed Mills for his interest, support and
guidance. He gave me much needed direction and encouragement, as well as access to
his extensive knowledge of the subject. I would also like to thank Professor Dacey and
the rest of the MMSS faculty for facilitating such a wonderful program. Finally, I would
like to thank my family and friends for their love and support.
3
ABSTRACT
This study utilizes data provided by the Chicago Tribune Homes web site to
analyze the relationships between community home attributes and median home sale
prices in Chicago and surrounding counties for 1999. Based on the hedonic pricing
model that partitions complex commodities into variable quantities of uniform attributes,
the study shows that certain home related community attributes significantly and
predictably contribute to median home prices. The study examines attributes in four
categories including physical, community, inhabitant and location characteristics.
Regression analysis reveals that attributes in each of these categories affect home prices.
However, the feasibility of using least squares estimation to analyze this data is carefully
scrutinized. Beyond attribute pricing, this study also offers other interesting relational
findings among the community data.
4
INTRODUCTION
For years the housing market has captured the attention of professionals in both
academic and business arenas. The home as a commodity stands out for several reasons.
First, a home is the biggest purchase and investment many families will make in a
lifetime. Outside of extraordinary cases, whether renting or purchasing, every family
must devote some of their income towards a living environment. This makes supply and
demand analysis particularly interesting. Finally, a home is one of the best examples of
a truly heterogeneous good. Homes can vary by any number of factors including size,
material, age, design and location. As one of the most expensive and complex goods
available, homes and the housing market draw a great amount of analysis and speculation
from both commercial and private investors hoping to find trends or other insight to give
them a competitive edge. Study of housing data has also sparked as well as helped to
answer many sociological questions that have been the concern of many academics and
politicians as well as society as a whole.
Formalized by Rosen in the 1970’s, the hedonic pricing model has been regularly
applied to the housing market. The model asserts that any heterogeneous good is really a
bundle of fairly uniform, homogeneous goods or attributes that vary in quantity.
Therefore, the price or value of a complex commodity can be represented by a vector
containing the quantities of the underlying attributes. When applied to a utility function,
this attribute vector generates a value function. The addition of a composite good vector
and a budget constraint allow for the estimation of a housing demand function. Much of
the analysis of housing data is an attempt to identify and quantify these attribute goods
and estimate their value function within the context of the entire home. Other studies
5
focus on price indexing, studying national data in order to discover price discrepancies
among homes in different regions and at different time periods. There are many time
varying and regionally specific factors that affect the supply and demand for homes,
causing price variation. However, this broad longitudinal and regional analysis will not
be the focus of this work. This study will take the former approach in an attempt to
define and quantify the hedonic values of home attributes in a localized, cross sectional
study. According to the hedonic pricing model, the value of a home can be attributed to
the value of the bundle of homogeneous qualities that it contains. All other things equal,
a home containing more of one of these positively valued attributes will be worth more.
This study intends to show that these valued attributes are not only physical
characteristics of a home, but community characteristics as well. A home provides
membership in a community with benefits, obligations and sometimes problems. Some
of these factors, whether legally enforced or simply effects of proximity, can be measured
and compared across communities as commodities in the pricing model. This study
utilizes recent housing data from the city of Chicago and communities from seven
surrounding counties to analyze these commodities, both structural and neighborhood
attributes, to estimate a model for home values. The study also uncovers many
interesting sociological relationships worth mentioning.
6
DATA
The data for this study comes from the Homes section of the Chicago Tribune’s
web site, http://www.chicagotribune.com/homes. The site references sources including
the U.S. Census Bureau, Claritas Inc., MLS of Northern Illinois, Northeastern Planning
Commission, Illinois State Police, Chicago Police Department and the Illinois State
Board of Education. The site provides fairly uniform information for Chicago’s 77
community areas as well as an additional 296 suburban communities within Cook,
DuPage, Lake, McHenry and Will counties in Illinois and Lake and Porter counties in
Northwest Indiana. Most of the 77 Chicago communities, adopted by the Chicago
Association of Realtors, correspond with traditional Chicago neighborhoods such as
Lakeview or Hyde Park, while others are aggregates of neighborhoods that are seldom
referred to outside of the real estate community. The Chicago Tribune page for each
community includes a profile by a contracted writer, with access to a facts and figures
page as well as archived information. A wealth of information exists from the many
sources that are compiled into the facts page. The site includes location data, including
county designation, a measure of distance from the loop and area in square miles.
Community information reported from the 1990 census survey consists of percentages of
single-family units, number of housing units, population, percentages for the number of
people in a housing unit, the number of rooms in a housing unit, homes built in grouped
cohort years, as well as age, race, sex, marital status, education, employment and
occupation distributions. Important data also exists concerning standardized 1998 crime
data, 1998 educational data including average ACT scores by school district, and finally
7
1999 data on quarterly median home sales prices coupled with the number of homes sold
in that period and current figures for population and number of housing units. Archived
information for 1998 median home value also exists for some communities. Although
the available information is fairly consistent among the different communities, there is
some small variation in reporting depending on the given county. Missing information
also exists for a small number of communities. While most of the missing data exists for
the smallest and most outlying communities, there also appears to be some random holes
in the data that seem to be simple mistakes in data compilation. Treatment of missing
values will be discussed later.
When organized correctly and thoughtfully, this data contains valuable
information concerning the relationships between median community home sale prices
and community attributes. The 300 plus fairly uniform observations allow for adequate
statistical analysis. There have been a multitude of studies utilizing home sales data from
various sources. Home sale transactions and prices are traditionally well documented.
Unfortunately, the quality and quantity of existing information on the attributes of those
homes is much lower. The AHS (American Housing Survey), is one heavily studied
source, along with NAR (National Association of Realtors) data. These data sets differ
from the Tribune data, however, because the observations deal with individual homes and
sales. Although these provide an accurate source of pricing information and physical
home properties, they can be inadequate for deciphering community attributes. The
American Housing Survey polls for the adequacy of some community amenities, and
comparisons between communities are the product of home owner opinion. The response
is based on the individual home owner’s values rather than any standardized measure of
8
performance. This data may become biased when compared across communities. For
instance, a suburban family might be overly critical of a great school that might not quite
measure up to the one in the next neighborhood, although they are both in the top few
percent of schools. Also, a homeowner might respond with incomplete information. A
single professional’s response to the adequacy of education might be completely arbitrary
due to lack of concern. Ultimately, these survey questions do provide valuable insight
towards qualities of attributes in a neighborhood, but may not be as accurate as
standardized statistical measurements. This is a great feature of the Tribune data.
Measurements for many community commodities are uniform and standardized. Mean
ACT scores of high school seniors are available for each community, as well as crime
rates per 1000 residents. While these are still only proxies for the quality of education
and safety in a community, they at least insure a standardized comparison across
communities. Another main motivation for using the Chicago Tribune Homes data to
study attribute contributions toward housing prices involves the assumption that there is a
great deal more variance in home values between communities than within communities.
Homes within a community share many of the same resources that affect value. Also,
homes should be expected to share more physical characteristics then present for a larger
sample. The Tribune data is conveniently aggregated at this level, facilitating the
exploration of variance causation between communities. The aggregated observations
are also interesting from a modeling standpoint. In this study, a model will be formulated
using home attribute variation to account for variance in median home values. The
residual errors of this model will be particularly valuable because they will show which
communities are undervalued and overvalued by the regression analysis. The largest
9
outlying communities can be studied to look for any explanation of the deviance between
the actual and expected home prices. This may lead to the identification of significant
home and community attributes that might have been over looked by the model. This
residual analysis would be much more difficult from an individual home perspective, as it
would be very difficult to uncover additional information for each home.
The benefits of the Chicago Tribune data are coupled with some weak points.
First, the data is a collection of information from a variety of sources. Although all the
sources seem reputable and the data appears to be accurate, the actual conditions and
integrity of the data collection will never be known. The population data from the
different years are most likely sophisticated estimates of actual population. While the
Census Bureau provides the 1990 data, Claritas provides the 1999 population
information. It is possible that these two sources have different estimation methods that
might cause a bias in the information.
Another problem that may lead to the misspecification of median home price
variance in the model stems from levels of data aggregation for certain variables that
differ from the community levels. For instance, the 1999 sales data provided by the MLS
couples some of the smaller outlying suburbs with a larger neighboring suburb. Since the
observations are represented per community, any sales data that bundles several
communities will result in a replication of the exogenous variables. The model losses
some freedom as the medians may have differed if the data had been partitioned by
community. If a difference does exist, some of the explanatory power of the endogenous
variables for those communities will be lost through the overly aggregated data.
10
An interesting problem of aggregation is present in the descriptive properties of
the Tribune data. Some community variables are displayed as averages of individual
statistics while others are median values. One relevant example includes quarterly home
sale prices and number of rooms which are represented as median values and gross
percentages respectively. For modeling purposes, these percentages were used as
weights to derive the average number of rooms. When data is not distributed
symmetrically, means can differ significantly from medians. It is possible to
misrepresent the data when regressing means against medians. Using the median home
price and average number of rooms as an example, imagine that the true value of a home
is exactly $50,000 times the number of rooms. Community A has 10 home sales; all four
room homes for $200,000. Community B also has 10 home sales; six two room homes
for $100,000 and four seven room homes for $350,000. For each community, there is an
average of four rooms per home and the average home price is $200,000. However, the
median home price for community B is only $100,000. The comparison of the mean of
one variable to the median of another fails to uncover the true relationship between the
number of rooms and home price. This phenomenon is particularly threatening to any
future model because the median home prices are certainly not normally or symmetrically
distributed among communities (see Appendix 1.1), although individual home prices
within a community may more closely subscribe to these distributions.
While the previous problems are more subtle, the time lag between the 1990
census attribute variables and the 1999 median price data may be the most recognizable
problem with the data. Crime and educational data are lagged by one year. This is ideal
under the assumption that these are the most current statistics that would be realized by a
11
potential buyer and are good proxies for the level of these attributes. However, many of
the attribute variables that will be used in the hedonic model were calculated from census
information that is nine years removed from the home sales data. Luckily, most of these
attributes should not vary greatly over a decade for communities within reasonable
growth and construction limitations. These include the average number of rooms per
home, average number of people per housing unit and the percent of single family
structures. Two other community attribute variables raise concern, however. One
attribute that could change significantly over a decade is the racial composition of a
community. Particularly in Chicago, many traditionally ethnic and minority
neighborhoods have been experiencing gentrification and an influx white urban
professionals within recent years. The census data cannot account for any of these recent
changes and may cause a bias for some neighborhoods. Another disturbing shortcoming
of the census data involves average home age. All average community home ages have
been calculated as of the year 1990. This data is right censored and does not factor any
new homes built after 1990. With the Tribune data, there is a lack of information
concerning the percentage of the sales of new homes as opposed to old structures. Many
outlying suburbs are rapidly growing under new construction. Even some of the most run
down neighborhoods in Chicago are experiencing home restoration and construction as
investors attempt to take advantage of low property values within minutes of the Loop. It
seems logical that the median home price within a community would increase as the
percentage of new home sales increases. Unfortunately, the Tribune data offers little to
account for this median price variance. For communities with a stable population, the
lack of accountability for the last ten years of average home age should make little
12
difference in median price. However, for rapidly growing communities, particularly
those outside of Cook County, the average home age as of 1990 could greatly
misrepresent the true average age of homes as reflected in the median home prices for
1999.
As entered directly from the Tribune Homes site, the data contained 374
observations of 63 variables. Certain cases were missing data. Some of this was
systematic by county; some appeared to be more frequent in smaller outlying
communities, while other missing values seemed completely random. It became
immediately apparent that too much information was missing from the two Indiana
counties. These observations could not be included in any model of median home prices
due to missing values. Therefore the 32 Indiana observations were discarded from all
further data analysis for consistency, leaving the total number of observations at 342.
In order to effectively describe, interpret and model relationships in the data, new
variables were derived from the original input, and for some important variables, missing
values were estimated using available data from surrounding communities. Any
respecifications or extrapolations of data were conducted in a uniform manner. Much of
the data from the 1990 Census contained percentages from categorized survey responses.
In all cases, these percentages were used as weights to compute average values. For the
case of home age, dummy variables indicating the decade of median home age were also
derived. Other variables appeared as a single percentage, such as the racial and
occupational data. Dummy variables were also created for these figures, provided that
the use of percentages in OLS regression creates a bounding problem. In general,
percentage bounds for the dummy variables were assigned close to one standard
13
deviation from the mean. Several variables required considerably more attention.
Excluding the Indiana data, 27 missing values for crime existed, all for communities
outside of Cook and DuPage Counties. Three variables were created to deal with the
missing values. The first variable leaves the missing values as missing, the second
replaces the missing values with reported county averages, while the third estimates the
crime statistic by averaging crime rates from surrounding communities. Equal care was
given to the education variable of ACT scores. The mean ACT scores were supplied by
school district, not by community. Fortunately, in the suburbs, one school district
typically corresponds to one community. However, when a suburban community was
served by more than one school district, the mean ACT scores of the relevant districts
were averaged to arrive at the community statistic. The situation becomes more complex
for the 77 Chicago communities. One school district, including over 60 high schools,
encompasses the entire city. Unlike the suburbs, children are not forced to attend the
closest public school. In fact, there exist a number of magnet schools like Young that
encourage the enrollment of promising students from all over the city. Complicating
matters further, many parents who can afford private schools avoid the Chicago Public
School system. As a result, schools are filled with a much higher percentage of
underprivileged minorities than the surrounding population. Despite the overall
complexity and diminished significance of public education in Chicago, ACT statistics
were retrieved and recorded for each individual school. Under the loose assumption that
families might locate themselves closest to the school in which they intend to enroll their
children, ACT scores were matched with Chicago communities by high school. ACT
values were estimated for communities without a high school by averaging the scores
14
from nearby communities. Therefore, two versions of the ACT proxy variable for quality
of education exist. The first records the school district mean ACT score of 17.3 for all of
the Chicago communities, while the second lists individual high school results with
extrapolation for surrounding community values. For the second variable, ACT statistics
were estimated for 28 out of the 77 communities.
The lack of distance measures for the Chicago communities also merits a final
procedural mention. Within the Tribune data for the suburbs surrounding Chicago,
community measures of area in square miles and distance to the Loop in miles were
listed. These statistics were not included for the Chicago communities. Anticipating
their importance for descriptive and modeling purposes, these values were measured
manually from the Census Tract Reference Maps distributed by the Chicago Association
of Realtors. Other transformations of existing data occurred in order to arrive at
noteworthy statistics or regression friendly data. Any procedures of importance not
previously covered will be mentioned in later sections of this study.
After a good amount of data analysis, any observations containing less than ten
1999 home sales and or less than 2500 population were eliminated to create a new subset
of the data, hereafter entitled adjusted data. This subset was created for several reasons.
First, by eliminating the smallest communities with few home sales, the chance of the
median house data being unrepresentative of true median community home value is
reduced. As mentioned earlier, some of the more sparsely populated outlying suburbs
were combined with bigger suburbs to aggregate sales data. By placing a minimum
constraint on population, and eliminating some of these outlying suburbs, the presence of
duplicated exogenous variables under individually specified community attributes can be
15
reduced for more accurate explanation of variance between communities. Another
rationale involves the reduction of both endogenous and exogenous outlying statistics.
Some community attribute variables are population sensitive, such as the number of
crimes per 1000 residents or the estimate for average lot size, which varies with the
number of housing units. With small populations figured in the denominator of a
statistic, values can become unreasonably large. For instance, the community of Bedford
Park has a population of 535 and a large industrial park. In 1998, the town recorded
approximately 600 crimes, mostly nonviolent property crimes occurring in the park. The
computed statistic of 1127, several standard deviations above the mean, presumably
overestimates the danger placed on the average home owner in this neighborhood. These
problems arise from the misspecification of variables, and can be magnified in small
communities. In that example, the data did not distinguish between violent crime and
industrial crime outside of residential areas. A similar phenomenon can occur for median
home prices in a small residential area. The smaller the community, the more likely the
majority of homes sold can contain a common attribute not contained in or explained by
the data. For analytical and regression purposes, both the complete and adjusted data sets
will be utilized, insuring two perspectives on the Tribune data.
16
DESCRIPTIVES
This section highlights and describes the variables collected from the Tribune
data, as well as studies relationships between variables prior to regression analysis.
Descriptive statistics are displayed for variables from both the complete and adjusted data
set. In order to avoid large ranges created by outlying data points, scatter plots will be
limited to the adjusted data set. In order to achieve consistency, any further graphs,
simple regressions or Pearson correlation statistics will also be derived from the adjusted
data set. Pearson correlation statistics with a significance of .05 or better will be
designated by one asterisk, while .01 or better will be given two asterisks.
Exogenous Variable
Median community home sales prices are at the focus of this study. Actually, the
finalized statistics are the average and weighted average of the quarterly sales prices.
These statistics would differ from true yearly medians. The first computation, not
weighted by quarterly home sales, highlights the fact the quarterly prices are not
averages, and that any quarterly median could be closest to the true median with equal
probability. This variable will be labeled Price1
. The weighted average assumes that a
quarter with more home sales is more likely to represent the true yearly median statistic.
When the means of these two variables are compared, the weighted average registers
slightly higher. This is because the quarter with the highest mean number of home sales
also has the highest mean quarterly price figure. This third quarter phenomenon could be
the result of seasonality or supply and demand issues. Descriptive statistics are listed
below.
17
Mean Standard
Deviation
Median Minimum Maximum Valid N
Complete
Price1
168,338 106,282 142,563 15,900 1,011,500 336
Price2
169,006 106,278 143,484 15,900 963,945 336
Adjusted
Price1
163,046 89,085 141,730 24,750 694,125 286
Price2
163,512 89,464 142,178 24,654 700,754 286
Price1
= Average of Quarterly Median Home Sale Prices
Price2
= Average of Quarterly Median Home Sale Prices Weighted by Quarterly Number of Homes Sold
As the dependent variable, the price data is highly correlated with many variables
assumed to affect the price. Price is also correlated with other factors that direct
individual demand levels for home attributes. These include income, years of education,
age, and marital status. The distributions of the price variables are also important to
consider. Statistics for skewness and kurtosis are both well over two, suggesting that
there is little chance that the distribution is normal. A histogram of the non-weighted
average variable (Appendix 1.1) shows that the distribution is skewed to the right. This
identifies the presence of several elite communities where the average of quarterly
median home values is several times the mean average.
Endogenous Variables
For classification purposes, expected hedonic attributes have been placed in four
groups: home attributes, community attributes, inhabitant attributes and indicator
attributes. These groups will resurface in the choice of regression models. The first
group encompasses physical properties of the house. Using all available Tribune data,
four basic variables were created within this category, along with logical derivations.
They include average number of rooms per home, home age, percent of single family
homes and a calculation for number of square feet per housing unit.
18
The first variable of the group, average number of rooms per home, is an obvious
valued home attribute. Unfortunately, the data for this variable is lagged, as derived from
the 1990 census data. Fortunately, the average home structure of a community should not
change much over a decade. The variable still shares a strong correlation with Price1
of
(0.617)**. Descriptive statistics are shown below.
Mean Standard
Deviation
Median Minimum Maximum Valid N
Complete
Rooms In Home 5.70 0.94 5.58 3.36 8.55 337
Rooms Per Person 2.02 0.31 2.02 0.94 2.94 330
Adjusted
Rooms In Home 5.64 0.91 5.53 3.36 8.55 287
Rooms Per Person 2.01 0.30 2.00 0.94 2.84 280
Notice that the descriptive statistics for rooms per person are included in the table above.
Although this variable should not have a direct causal relationship with room prices, it is
highly correlated (0.760)** with median family income, since high income homeowners
demand more space per person. It would be interesting to compare the mean number of
rooms per person with means from different geographical areas.
The transformation of Tribune data for home age to a set of analyzable variables
was much more complicated than for average number of rooms. Again, the data
originated from the 1990 census. Already this presents a problem of censoring. Ideally,
if no homes were built in the ten years since the survey, average home age would just
increase by ten. However, new homes have been built. This provides the freedom for
average home age to increase by less than ten or even decrease in a rapidly expanding
community. The question for determining whether the lagged values for home age are
adequate is whether or not home construction and has been fairly uniform across
communities. The answer is no. Another problem lies at the other end of the home age
19
spectrum. In order to display home age, the census survey lists period ranges, usually by
decade, along with the percent of homes built within that period. The earliest period is
listed as 1939 or earlier. This introduces a problem of left censoring. While the other
periods have a ten year range, the range for the earliest period is much greater. The
statistic for home age was calculated by the weighted average of the upper bounds of
these periods. Clearly this could underestimate the average home age in a community
with a number of homes built in the early 20th
century, or even the 1800's. In order to
correct for this left censoring, a series of dummy variables indicating median period of
home construction as of 1990 was created and will be considered in regression models.
Overall the variables for home age leave much to be desired and contain a possible bias
for comminutes with large quantities of home construction in the past 10 years.
Considering the available data, however, they are the best approximations of true
community home age. Descriptive statistics for the average home age as of 1990, using
upper decade bounds of the census, are listed below.
Mean Standard
Deviation
Median Minimum Maximum Valid N
Complete
Average Home Age 25.78 10.08 25.60 6.02 48.05 337
Adjusted
Average Home Age 25.64 10.09 25.60 6.02 48.05 287
Average home age is slightly yet significantly negatively correlated (-0.172)** with
Price1
. This supports the reasonable assumption that newer homes are worth more.
Hedonic analysis will determine whether the age of the home itself or other attributes that
are correlated with the age of the home contribute to the variance in home price. This
appears interesting because several variables, including education (-0.601)**, average
20
number of rooms (-0.43)** and percent minority (0.474)** are all more highly correlated
with home age than price. Another correlation (-0.585)** suggests that average home
age decreases as communities move further from the loop. This supports the claim that
new home construction is most likely not uniform across communities.
The percentage of single family units in a community is an interesting variable to
study because of its high correlation with many other variables. Logically, this variable
is positively correlated with Price1
(0.361)**. However, there is a much stronger
correlation between this variable and the average number of rooms in a housing unit
(0.789)**. A scatter plot of these two variables can be found in Appendix 1.2. Basically,
the average home size in a community closely approximates the percentage of single
family units. This could present problems if both variables are included in a model of
price estimation. Descriptive statistics are found below.
Mean Standard
Deviation
Median Minimum Maximum Valid N
Complete
Percentage of Single
Family Units
.692 .210 .739 .019 .993 337
Adjusted
Percentage of Single
Family Units
.686 .206 .730 .019 .990 287
The final physical attribute variable was computed using two of the Tribune
variables in an attempt to approximate lot size. Unfortunately, no lot size data was
directly available. The statistic was derived by dividing the data for the area of the
community by the number of housing units in the community as of 1999. The area,
which had been provided in square miles, was converted to square feet. This statistic is
inferior to an actual measure of average lot size because the percentage of land devoted to
21
residential zoning is unknown among the communities. Even if this percentage is fairly
constant among communities, space unoccupied by homes is generally unaccounted for.
Certainly a large space containing a park will have a different effect on home values
when compared to a space devoted to a garbage dump or a chemical plant. Despite its
shortcomings, this variable does contain some explanatory power. As expected, the
number of square feet per housing unit is positively correlated with Price1
(0.236)**.
There is also a positive correlation with the distance to the Loop (0.387)**, suggesting
that housing density decreases as communities get further from the city. Descriptive
statistics are displayed below.
Mean Standard
Deviation
Median Minimum Maximum Valid N
Complete
Number of Square
Feet Per Housing
Unit
87,507 470,485 22,026 1,754 8,131,200 330
Adjusted
Number of Square
Feet Per Housing
Unit
34,131 62,238 20,063 1,754 612,419 280
The next group of attributes provided by the Tribune data, entitled community
attributes, differs from the previous set because the attributes are independent of the
physical characteristics of properties within the community. However, this study intends
to show that these attributes still significantly contribute to median home values among
communities. Each of the four major variables in this group represent a community
characteristic. ACT scores are studied to estimate the quality of public education within
a community, while crime rates should measure safety. Distance to the Loop represents
the ease of access to all the benefits of downtown Chicago, including employment and
social opportunities. Finally, a measure for the number of places of worship per thousand
22
residents may provide a loose estimate for family values and family structure as well as
camaraderie within a community.
The derivation of the ACT statistics was explained in the previous section. For
this study, ACT scores are a great measure of public education because they are
standardized across communities. However, it must be noted that public education is not
the only factor attributed to the level of ACT achievement. Parental influence is critical
for a child's success at school. The significant correlation between ACT scores and
median years of school completed within a community shows this (0.606)** Just as
public education cannot account for all of the results on ACT tests, the tests cannot reveal
the entire level of educational quality at any school. Mean ACT scores are negatively
correlated with the percentage of minorities in a community (-0.644)**, even though
educational spending per student is not dictated by race. Lower percentages of two
parent homes and lower parental education levels contribute to the decreased
performance of minorities on ACT tests as much as inadequate schools. Despite this, the
ACT scores provide good insight into a parent's perception of the quality of education in
a community. This will ultimately affect median home values. The correlation between
ACT scores and Price1
shows a significant relationship between the two variables
(0.557)**. Descriptive statistics can be found below.
Mean Standard
Deviation
Median Minimum Maximum Valid N
Complete
ACT1
20.87 2.57 21.6 16.7 26.2 338
ACT2
20.65 2.98 21.6 14.0 26.2 342
Adjusted
ACT1
20.69 2.60 21.3 16.7 26.2 287
ACT2
20.45 3.02 21.3 14.0 26.2 288
ACT1
= Mean ACT Composite Score (Chicago Communities Treated as One District)
ACT2
= Mean ACT Composite Score (Chicago Communities By Individual Schools - Extrapolated Data)
23
Just as consumers are concerned with education levels, they are also interested in
safety levels. As mentioned earlier, the Tribune crime data measures total crimes
committed in 1998 per 1000 residents. Although reported crime is clearly a measure of
safety, the data does not distinguish between violent crime and property crime. There is
also no distinction between crime in residential areas, opposed to industrial or
commercial areas. These different types of crime might affect a resident's perception of
safety in different ways that cannot be accounted for in the data. Interestingly, crime is
not nearly as correlated with Price1
as many of the endogenous variables (-0.193)**.
There are much higher correlation statistics between crime and percent minority
(0.513)** and average number of rooms per housing unit (-0.503)**. This relationship is
of particular interest and is shown in a scatter plot in Appendix 1.3. There are many
speculative reasons why this relationship might exist. Communities with large homes
may be more likely to have a higher percentage of residential zoning, limiting crime
against industrial and commercial properties. Also larger homes tend to have bigger
yards and better security, inhibiting stealthy movement. The descriptive statistics for the
crime variables can be found below.
Mean Standard
Deviation
Median Minimum Maximum Valid N
Complete
Crime1
54.88 79.82 39 0 1127 315
Crime2
53.05 76.88 36 0 1127 342
Crime3
52.57 77.04 35 0 1127 342
Adjusted
Crime1
51.11 49.55 40 4 615 280
Crime2
50.55 48.97 39 4 615 288
Crime3
50.46 49.04 39 4 615 288
Crime1
= Crimes Per 1000 Residents (Missing Values)
Crime2
= Crimes Per 1000 Residents (Missing Values Filled With County Averages)
Crime3
= Crimes Per 1000 Residents (Missing Values Filled With Extrapolated Values)
24
The variable for the distance to the Loop is of particular interest because there is
no significant correlation between it and Price1
. However, it seems logical that all other
attributes equal, median home prices should decrease as communities distance
themselves from downtown Chicago. It will be interesting to discover the attribute price
assigned to this variable in the hedonic model. Not surprisingly, distance to the Loop is
highly correlated with average home age (-0.585)**. Descriptive statistics are listed
below.
Mean Standard
Deviation
Median Minimum Maximum Valid N
Complete
Distance to Loop 26.18 16.24 23 0 73 342
Adjusted
Distance to Loop 23.76 14.73 21.5 0 70 288
Finally, the number of parks per 1000 residents and places of worship per 1000
residents were measured in order to estimate the 'family atmosphere' of a community.
The Tribune data listed information on parks and worship areas for each community.
These were tallied and divided by the 1999 population and multiplied by 1000.
Unfortunately, parks were only listed for Chicago communities and the variable could not
be utilized in subsequent models. The parks variable is positively correlated with Price1
(0.251)*. After reviewing the data for places of worship in a community, their effect
home prices is not quite clear. It seems that many places of worship per capita should be
considered a positive attribute. However, many of the most poverty stricken
communities have an abundance of churches. Obviously they are not the cause of this
despair, but the result of it. As a result, there seems to be no clear linear relationship
25
between the places of worship variable and Price1
. A scatter plot found in appendix 1.4
displays this. Descriptive statistics are listed below.
Mean Standard
Deviation
Median Minimum Maximum Valid N
Complete
Parks per 1000
Population (1999)
.2294 .1496 .1935 .00 .83 72
Places of Worship
Per 1000 Population
(1999)
.8360 .9685 .5650 .00 7.63 332
Adjusted
Parks per 1000
Population (1999)
.2114 .1237 .1831 .00 .62 68
Places of Worship
Per 1000 Population
(1999)
.7372 .5680 .5980 .00 3.63 282
The third set of variables was grouped as inhabitant attributes. This group of
variables differs from the first two because the direct bearings of these attributes on home
values are questionable. Included in this group are race and occupation. The rationale
behind the inclusion of this group of variables in this study and ensuing hedonic models
relies on the assumption that home owners choose to locate themselves in communities
containing residents of similar background, occupation and race in order to feel
comfortable and sociable. Out of several listed occupations, managerial positions were
chosen to most likely represent white collar lifestyle, while factory positions were to
represent blue collar lifestyle. The descriptive statistics for both race and occupation
variables are listed below.
26
Mean Standard
Deviation
Median Minimum Maximum Valid N
Complete
Percent African Amer .1459 .2900 .0100 .0000 .9940 337
Percent Hispanic .0722 .1199 .0280 .0000 .8780 337
Percent Other .0283 .0436 .0160 .0000 .0521 337
Percent Minority .2463 .3024 .1010 .0000 .9990 337
Adjusted
Percent African Amer .1520 .2893 .0140 .0000 .9910 287
Percent Hispanic .0787 .1266 .0310 .0000 .8780 287
Percent Other .0295 .0356 .0180 .0000 .2410 287
Percent Minority .2601 .3002 .1110 .0020 .9990 287
Mean Standard
Deviation
Median Minimum Maximum Valid N
Complete
Managerial Positions .2823 .1308 .2530 .0440 .6690 337
Factory Positions .0615 .0438 .0530 .0000 .2660 337
Adjusted
Managerial Positions .2851 .1264 .2590 .0740 .6290 287
Factory Positions .0614 .0436 .0530 .0010 .2660 287
27
THE MODEL
The main objective of this hedonic pricing model is to estimate what attributes are
actually incorporated into home values, and to what extent they explain the variation in
median home prices by community. This model differs from most previous analyses as it
only attempts to explain variance of home prices between communities, not within them.
Hopefully, this unique level of aggregation will provide a new and interesting twist to an
old and popular subject of analysis.
It is also important to realize that this regression analysis is an approximation to
an ideal hedonic model. A hedonic model as described by Sheppard (1999) allows for
freedom in individual consumers' preferences and their corresponding utility functions.
The hedonic model also estimates housing demand and equates it with supply before
arriving at price estimation. Although this model should reveal home and community
attributes that significantly affect median community housing prices, it is considerably
constrained from a perfect hedonic model. There are two important assumptions that
should hold true for the success of this model:
1. Sales events are randomly distributed throughout homes in a community
2. Home values are summations of attribute values. There exists no declining
marginal utility of attributes.
The first assumption is important because the sales data that generate the median home
sale prices for the communities are not directly paired with the data that generate the
explanatory variables. The sample of homes for the sales data is different from the
sample of homes that represent the attribute data within a community. Therefore it is
28
imperative that one sample is representative of the other. For instance, if a new housing
division within a community represents 60 percent of the home sales but only 15 percent
of the homes, there is likely to be error. The relationship between variables in the model
and the true community relationship between home price and attribute quantity may
differ because community attributes are not representative of the sample of homes sold.
The second assumption addresses the constraints of linear regression.. For this OLS
model to be effective, consumers' utilities must closely follow a simple summation of
attributes. It is possible to apply a functional form to a variable, but unlikely that a linear
function can closely approximate a complex utility function.
During the actual regression process, over fifty models were analyzed. These
models varied between several factors:
1. The use of both the complete and adjusted data sets.
2. The use of average median price, weighted average median price, and the
natural log of these prices.
3. The inclusion and removal of different variables.
4. The use of different specifications of the same variable, as in crime and
ACT variables.
5. The use of different functional forms
6. The use of weighted least squares estimation
Two variables were removed in order to avoid colinearity. A threshold correlation of .75
was established and the home attribute variable for single family housing had a
correlation statistic of (0.789)** with the average number of rooms, while the dummy
variable for Chicago community was too negatively correlated (-0.79)** with the variable
29
for average ACT score. The majority of models were fairly consistent concerning the
variables of significance and directional effect. A set of models was chosen from the
others because of its simple, straightforward nature, and powerful results. The set of
models regresses an incrementally increasing group of variables against Price1
. The
models placed variables in the groups previously introduced. The group of Home
Attributes contains Average Number of Rooms in Housing Unit, Average Home Age as
of 1990 (Using upper Decade Bounds of Census), and Number of Square Feet per
Community Housing Unit (1999), with dummy variables for level of Single Family
Units omitted. The second group, entitled Community Attributes, contains the variables
ACT1
(Chicago communities counted as one district), Distance from Loop, Crime3
(Extrapolated Missing Values) and Places of Worship per 1000 Population. The next
group contains Inhabitant Attribute Variables, all of which are dummy variables
calculated from percentages. This group includes both occupational and racial attributes.
There is a dummy for Percent Managerial Occupation <= 15% and Percent Managerial
Occupation >= 40%, with >15% < 40% as control values. Another occupational dummy
accounts for the percent of factory workers in a community; Percent of Factory Workers
>= 15% with < 15% as the control. Racial dummies include Percent African American
>= 10% < 90%, Percent African American >= 90% with < 10% as a control and Percent
Hispanic >= 10% with < 10% as a control. The final group contains an indicator variable
called North Shore Effect. This is a dummy variable that incorporates four North Shore
communities that feed into New Trier High School. Median home prices for these
communities are extremely high. Without this indicator variable, the model cannot really
30
account for the large deviation in the prices of these homes. The regression results are
listed below.
31
Variable Coefficients and T Statistics of Regressions Against Price1
MODEL 1 2 3 4
Constant -291,350.3 584,021.3 -325,507.5 -140,233.7
(-7.561)** (-11.6)** (-5.458)** (-2.484)*
Home
Attributes
Rooms 74,007.592 50,890.765 31,464.323 26,124.75
(13.426)** (8.425)** (4.910)** (4.582)**
Home Age 1,528.101 2,313.218 2,160.187 800.597
(2.99)** (4.283)** (3.985)** (1.596)
Square Feet 0.002877 0.007671 0.0002837 0.004787
(0.279) (0.825) (0.033) (0.624)
Community
Attributes
Distance to Loop - -1,906.069 -1,208.848 -878.804
(-5.218)** (-3.248)** (-2.652)**
Crime - 2.167 5.092 15.079
(0.037) (0.093) (.0.312)
ACT - 21,946.031 13,771.243 7,383.63
(8.876)** (5.238)** (3.042)**
Places of Worship - -6,382.023 -2.068.652 -1,287.826
(-1.253) (-0.4) (-0.281)
Inhabitant
Attributes
Low Managerial - - -41,590.288 -42,781.206
(-2.714)** (-3.154)**
High Managerial - - 85,878.798 87,171.531
(6.768)** (7.761)**
High Factory - - 2,556.501 4,234.95
(0.177) (0.331)
Middle African Amer - - -33,835.188 -40,537.345
(-2.900)** (-3.915)**
High African American - - -28,771.233 -38,292.485
(-1.454) (-2.183)*
High Hispanic - - 1,065.53 5,057.242
(0.080) (0.427)
North Shore Effect - - - 320,167.46
(9.266)**
Adjusted R Square 0.371 0.513 0.587 0.676
F Statistic (64.011)** (49.211)** (35.951)** (48.742)**
Incremental F - (24.111)** (10.274)** (85.852)**
32
Results of a regression of the same set of models against Ln(Price1
) can be found in
Appendix 2.1.
Overall, these models appear to be successful and informative. The F statistic is
significant at the .000 level for each model, indicating that there is less than a one in
10,000 chance that these models explain none of the variance in Price1
. The incremental
F statistics for the last three models are also significant at the .000 level, indicating that
there is less than a one in 10,000 chance that each additional model explains no more
variance than the model preceding it. The adjusted R squared statistic for each successive
model increases as more explanatory variables are added. In the final model, over 65
percent of the total variance Price1
is accounted for.
Overview of Regression Coefficients
The directional effect of significant variables on Price1
are as expected other than
the fact that the first three models show Price1
increasing with home age. This might be
explained by the fact that some of the priciest communities contain refurbished vintage
homes and are often located on expensive land. This theory is supported when the
variable for home age losses significance in the fourth model when the North Shore effect
is introduced. The North Shore effect 'steals' explanatory value from the average home
age variable. These communities all contain older homes that have been upgraded over
the years. This model lacks a variable for amount of home quality improvements within
a community. This is an attribute that may not be well recorded or hard to standardize.
Also, due to a suspiciously high correlation statistics among most of the explanatory
variables, multicollinearity may be at hand for any deviations from expected results.
33
As expected, the variable for number of rooms is significant and positive,
although it losses explanatory power in each new model as more variables are introduced.
The variable for community square feet per housing unit, which is supposed to be a proxy
for lot size, is not significant. This could be the cause of poor variable specification. The
variable fails to account for zoning within a community. It is possible and probable for
communities with similar square feet per housing unit estimates to have completely
different lot sizes.
For the community attributes, the ACT and distance to loop variables are
significant as expected. The insignificance of the crime variable is puzzling. It seems
obvious that home values should be lower in crime prevalent communities. Possibly the
measurement of crime is too inadequate and the true crime induced variance in home
prices is explained by other correlated variables, such as race or even home size. The
crime variable also fails to distinguish between violent and property crime, although they
have different impacts on a home owner's estimation of safety. A final problem with the
crime variable might be underreporting of crime in the worst neighborhoods .
For the inhabitant attributes, the percent of residents with a managerial occupation
seems to be very significant in explaining variance in community home prices. The
occupational variables assume that home owners will choose to locate in a community
that provides access to their job type and other people sharing similar occupational and
lifestyle interests. However, the cause/effect relationship between home prices and
managerial occupation may be somewhat unclear. Perhaps a managerial occupation is
just a proxy for income and the model is displaying the fact that families with higher
income will buy homes in more expensive communities. The negative effects of a high
34
percentage of African American inhabitants coupled with insignificant effects of
Hispanic inhabitants is another interesting phenomenon. Perhaps the actual effects of
racial prejudice are really insignificant and a high percentage of African American
inhabitants is just a proxy for truly significant community attributes such as
unemployment, education, income, crime, or even redlining. Finally the significance of
the North Shore effect is apparent in the fourth model. This variable eliminates a large
amount of the unexplained residual error present in the previous model from the four
outlying communities of Wilmette, Winnetka, Glencoe and Kenilworth. These
communities must possess high quantities of some attribute that has not been specified or
was not adequately measured by the Tribune data.
35
PROBLEMS WITH HETEROSCEDASTICITY AND MODEL SPECIFICATIONS
Simple histograms and scatter plots of variables provide suspicion that many of
the variables in this study are not normally distributed, and there are not many clearly
linear relationships between variables. Many variables have outlying data that are
difficult to account for. Sets of variables are highly correlated, questioning a hierarchy of
dependency. In general, the Tribune data and this model draw attention to several of the
assumptions of least-squares regression. Problems of this nature are not uncommon
when dealing with pricing models. This excerpt comes from Sheppard:
Estimation of hedonic prices confronts the economist with a rich sampling
of the standard difficulties that arise in estimation using cross-section data.
These include choices of the proper parametric specification--both of
functional form and of variables to be included--coping with collinearity
and ill-conditioned data, potential heteroscedastic and nonnormal errors,
regressors subject to measurement error, and maximum likelihood
estimation of relationships that are nonlinear. (Sheppard 1614)
One major concern is the presence of heteroscedasticity in the model. By simply
graphing the squared residuals against each explanatory variable, it is difficult to discern
a definite linear relationship. An example is given in Appendix 1.5. However, several
outlying points are a cause for concern. Tests for heteroscedasticity were performed for
several of the models. Using the Breush-Pagan Test, Chi-squared statistics were
generated. Of all the models, the lowest Chi-square value was 65, still highly significant
with the degrees of freedom allowed by the model. After trying the adjusted data,
logarithmic relationships and even least squares weighted by housing units, the presence
of heteroscedasticity seems likely. This presents problems for the regression estimates.
Heteroscedasticity affects models in several ways. First, the OLS estimates are still
36
unbiased and consistent, but they are no longer efficient. This means that another
unbiased linear estimate that has lower variance than the OLS estimate may exist. Also,
variance estimates of the coefficients are no longer valid. They are biased and
inconsistent. This renders hypothesis tests such as F and T tests invalid. Although the
presence of heteroscedasticity should not nullify the findings of this study, it does raise
larger questions of a linear model's overall ability to estimate hedonic functions.
37
CONCLUSION
This study attempts to quantitatively explain median community home prices in
the Chicago area using hedonic analysis. It evaluates whether median community home
prices can be expressed as composite prices of values assigned to varying quantities of
underlying attributes. A linear model was constructed in order to value attributes’
expected to influence home prices. These attributes were derived from aggregated
community data available through the Chicago Tribune Homes web site. Attributes were
divided into home, community and inhabitant groups and a seemingly successful model
was generated using least squares regression. The Adjusted R-squared statistic for the
full model is (0.676), suggesting that the majority of the variance in median home prices
can be explained by the model. However, the success of the model is threatened by the
assumptions that were used to generate it. The marginal utility and corresponding impact
on overall price is unlikely to remain constant as the quantity varies for some attributes
analyzed in the study.
38
Appendix 1.1
Histogram of Price1
AVGPRICE
3325000
275000.0
225000.0
175000.0
125000.0
75000.0
25000.0
675000.0
625000.0
575000.0
525000.0
475000.0
425000.0
375000.00.0
70
60
50
40
30
20
10
0
Std. Dev = 89085.16
Mean = 163046.4
N = 286.00
39
Appendix 1.2
Average Number of Rooms in Housing Unit
9876543
P
e
r
c
e
n
t
a
g
e
o
f
S
i
n
g
l
e
F
a
m
i
l
y
U
n
i
t
s
1.0
.8
.6
.4
.2
0.0
Park City
Lake Forest
South Barrington
West Garfield Park
Uptown
Near South side
40
Appendix 1.3
Average Number of Rooms in Housing Unit
9876543
C
r
i
m
e
w
i
t
h
E
x
t
r
a
p
o
l
a
t
e
d
V
a
l
u
e
s
700
600
500
400
300
200
100
0
-100
Oak Brook
Matteson
Broadview
O'Hare
Near West Side
Loop
41
Appendix 1.4
Places of Worship per 1000 Population (1999)
43210-1
A
V
G
P
R
I
C
E
700000
600000
500000
400000
300000
200000
100000
0
Riverwoods
Barrington
Robbins
Glencoe
Englewood
42
Appendix 1.5
Residuals taken from featured regression Model 4
Average Number of Rooms
9876543
S
q
u
a
r
e
d
R
e
s
i
d
u
a
l
s
39999999000
0
29999999000
0
19999999000
0
9999999700
0
0
-
99999990000
Lake Forest
Bannockburn
Kenilworth
43
Appendix 2.1
Variable Coefficients and T Statistics of Regressions Against Ln(Price1
)
MODEL 1 2 3 4
Constant 10.108 8.659 10.530 10.882
(54.221)** (36.042)** (42.787)** (42.275)**
Home
Attributes
Rooms 0.313 0.180 0.08957 0.07942
(11.754)** (6.256)** (3.387)** (3.055)**
Home Age 4.144E-04 6.955E-03 5.112E-03 2.529E-03
(0.167) (2.699)** (2.285)* (1.105)
Square Feet 3.493E-08 4.518E-08 8.400E-09 1.696E-08
(0.701) (1.018) (0.235) (0.485)
Community
Attributes
Distance to Loop - -6.800E-03 -4.667E-03 -4.040E-03
(-3.900)** (-3.036)** (-2.674)**
Crime - -2.467E-04 -5.139E-05 -3.242E-05
(-0.884) (-0.228) (-0.147)
ACT - 0.110 0.04371 0.03158
(9.283)** (4.029)** (2.853)**
Places of Worship - -0.07237 -0.01106 -9.574E-03
(-2.977)** (-0.518) (-0.459)
Inhabitant
Attributes
Low Managerial - - -0.424 -0.426
(-6.704)** (-6.890)**
High Managerial - - 0.431 0.434
(8.239)** (8.471)**
High Factory - - 0.06613 0.06932
(1.109) (1.189)
Middle Black - - -0.334 -0.346
(-6.926)** (-7.334)**
High Black - - -0.531 -0.549
(-6.496)** (-6.857)**
High Hispanic - - -6.527E-06 7.578E-03
(0.000) (0.140)
North Shore Effect - - - 0.608
(3.860)**
Adjusted R Square 0.338 0.501 0.683 0.697
F Statistic (55.431)** (46.941)** (54.109)** (53.584)**
Incremental F - (26.956)** (30.989)** (14.902)**
44
Appendix 3.1
OTHER INTERESTING VARIABLES
Mean Standard
Deviation
Median Minimum Maximum Valid N
Complete
Households 1999 7,685.66 8,263.60 4983 24 54,935 330
Population 1999 21,297.48 21,908.49 14,124 71 124,321 330
Increase1
.1649 .3276 .0716 -.3499 3.0857 251
Increase2
.1674 .3246 .0713 -.293 3.028 250
People per Household 2.85 .39 2.81 1.69 5.33 330
Adjusted
Households 1999 8,933.93 8,370.02 6586 864 54,935 280
Population 1999 24,743.63 22,047.62 18,119 2,705 124,321 280
Increase1
.1723 .3458 .0753 -.3499 3.0857 207
Increase2
.1766 .3416 .0764 -.293 3.028 206
People per Household 2.84 .39 2.80 1.69 5.33 280
Increase1
= Percent Increase in Households From 1990 to 1999
Increase2
= Percent Increase in Population From 1990 to 1999
Mean Standard
Deviation
Median Minimum Maximum Valid N
Complete
Family Income 75,841.12 42,579.82 64,313.50 8,944.00 256,359.00 330
Never Married .2735 .0822 .2520 .1370 .5550 337
Median Age 36.21 4.78 36.10 21.30 49.50 330
Adjusted
Family Income 75,862.20 40,409.22 65,894.50 11,189.00 256,359.00 280
Never Married .2767 .0797 .2560 .1600 .5550 287
Median Age 36.22 4.57 36.10 21.30 49.50 280
45
REFERENCES
Cited
Chicago Tribune. http://cgi.chicago.tribune.com/homes. 2000.
Sheppard, Stephen. "Hedonic Analysis of Housing Markets." Handbook of Regional
and Urban Economics. Elsevier Science B.V., 1999. 1595-1635.
Not Cited
Asabere, Paul K., and Forrest E. Huffman. "Price Determinants of Foreclosed Urban
Land." Urban Studies 29 (1992): 701-707.
Bednarz, Robert S. The Effect of Air pollution on Property Value in Chicago. Chicago:
The University of Chicago, 1975.
Bloom, George F., and Henry S. Harrison. Appraising the Single Family Residence.
Chicago: American Institute of Real Estate Appraisers, 1978.
Carn, Neil, et al. Real Estate Market Analysis: Techniques & Applications. New Jersey:
Prentice Hall, 1988.
Lawrence, Roderick J. "Housing Quality: An Agenda for Research." Urban Studies 32
(1995): 1655-1664.
Mills, Edwin S. "New Hedonic Estimates of Regional Constant Quality House Prices."
Journal of Urban Economics 39 (1996): 209-215.
46
Paris, Chris. "Demographic Aspects of Social Change: Implications for Strategic
Housing Policy." Urban Studies 32 (1995): 1623-1643.
Peek, J., and J. Wilcox. "The measurement and determinants of single-family house
prices." AREUEA Journal 19 (1991):353-382.
Ring, Alfred A. The Valuation of Real Estate: Second Edition. New Jersey: Prentice
Hall, 1970.

Contenu connexe

En vedette

Examen photoshop
Examen photoshopExamen photoshop
Examen photoshopsibaezc
 
ADERAREA LA CERINTELE NORMEI 6/2015 ASF | Op.team Associates
ADERAREA LA CERINTELE NORMEI 6/2015 ASF | Op.team AssociatesADERAREA LA CERINTELE NORMEI 6/2015 ASF | Op.team Associates
ADERAREA LA CERINTELE NORMEI 6/2015 ASF | Op.team AssociatesDaniela Cretu
 
BAND IT CABLE TIE STOCKIST
BAND IT CABLE TIE STOCKIST BAND IT CABLE TIE STOCKIST
BAND IT CABLE TIE STOCKIST AKBAR TRADING
 
Stanford Africa Forum - Mobile Money and GDP
Stanford Africa Forum - Mobile Money and GDPStanford Africa Forum - Mobile Money and GDP
Stanford Africa Forum - Mobile Money and GDPMenekse Gencer
 
Data science - o co chodzi?
Data science - o co chodzi?Data science - o co chodzi?
Data science - o co chodzi?Pawel Jarosz
 
The Leader Inside You - Power or Influence?
The Leader Inside You - Power or Influence?The Leader Inside You - Power or Influence?
The Leader Inside You - Power or Influence?Daniela Cretu
 

En vedette (12)

Cotap 3.0
Cotap 3.0Cotap 3.0
Cotap 3.0
 
Examen photoshop
Examen photoshopExamen photoshop
Examen photoshop
 
Steffy Patricia Raghu
Steffy Patricia RaghuSteffy Patricia Raghu
Steffy Patricia Raghu
 
ADERAREA LA CERINTELE NORMEI 6/2015 ASF | Op.team Associates
ADERAREA LA CERINTELE NORMEI 6/2015 ASF | Op.team AssociatesADERAREA LA CERINTELE NORMEI 6/2015 ASF | Op.team Associates
ADERAREA LA CERINTELE NORMEI 6/2015 ASF | Op.team Associates
 
Udako gomendioak 2016 libros recomendados
Udako gomendioak 2016 libros recomendados Udako gomendioak 2016 libros recomendados
Udako gomendioak 2016 libros recomendados
 
BAND IT CABLE TIE STOCKIST
BAND IT CABLE TIE STOCKIST BAND IT CABLE TIE STOCKIST
BAND IT CABLE TIE STOCKIST
 
Apresentação1
Apresentação1Apresentação1
Apresentação1
 
ARQUIVOS ANEXOS
ARQUIVOS ANEXOSARQUIVOS ANEXOS
ARQUIVOS ANEXOS
 
Stanford Africa Forum - Mobile Money and GDP
Stanford Africa Forum - Mobile Money and GDPStanford Africa Forum - Mobile Money and GDP
Stanford Africa Forum - Mobile Money and GDP
 
Data science - o co chodzi?
Data science - o co chodzi?Data science - o co chodzi?
Data science - o co chodzi?
 
19
1919
19
 
The Leader Inside You - Power or Influence?
The Leader Inside You - Power or Influence?The Leader Inside You - Power or Influence?
The Leader Inside You - Power or Influence?
 

Similaire à MMSS Senior Thesis 2000

Portion Two Housing Statistics and Housing Market.ppt
Portion Two Housing Statistics and Housing Market.pptPortion Two Housing Statistics and Housing Market.ppt
Portion Two Housing Statistics and Housing Market.pptGoitom Abraha Baraki
 
Thurman Model 1 Cross Sectional ECN 405
Thurman Model 1 Cross Sectional ECN 405Thurman Model 1 Cross Sectional ECN 405
Thurman Model 1 Cross Sectional ECN 405Elizabeth Thurman
 
DemographyThe scientific study of population.U.S. Ce.docx
DemographyThe scientific study of population.U.S. Ce.docxDemographyThe scientific study of population.U.S. Ce.docx
DemographyThe scientific study of population.U.S. Ce.docxcuddietheresa
 
Capstone Project in Business Intelligence
Capstone Project in Business IntelligenceCapstone Project in Business Intelligence
Capstone Project in Business IntelligenceSamantha Adriaan
 
A Systematic Review of Affordable Homeownership using Data Science Methods
A Systematic Review of Affordable Homeownership using Data Science MethodsA Systematic Review of Affordable Homeownership using Data Science Methods
A Systematic Review of Affordable Homeownership using Data Science MethodsKarthikeyan Umapathy
 
20 THE NEW” HOUSING AND MORTGAGE MARKET SPRING 2016The .docx
20   THE NEW” HOUSING AND MORTGAGE MARKET SPRING 2016The .docx20   THE NEW” HOUSING AND MORTGAGE MARKET SPRING 2016The .docx
20 THE NEW” HOUSING AND MORTGAGE MARKET SPRING 2016The .docxlorainedeserre
 
20 THE NEW” HOUSING AND MORTGAGE MARKET SPRING 2016The .docx
20   THE NEW” HOUSING AND MORTGAGE MARKET SPRING 2016The .docx20   THE NEW” HOUSING AND MORTGAGE MARKET SPRING 2016The .docx
20 THE NEW” HOUSING AND MORTGAGE MARKET SPRING 2016The .docxnovabroom
 
MPSC_Presentation.pptx
MPSC_Presentation.pptxMPSC_Presentation.pptx
MPSC_Presentation.pptxKzAquino
 
16An Annotated Bibliography on Solving the Problem of Home
16An Annotated Bibliography on Solving the Problem of Home16An Annotated Bibliography on Solving the Problem of Home
16An Annotated Bibliography on Solving the Problem of HomeKiyokoSlagleis
 
16An Annotated Bibliography on Solving the Problem of Home
16An Annotated Bibliography on Solving the Problem of Home16An Annotated Bibliography on Solving the Problem of Home
16An Annotated Bibliography on Solving the Problem of HomeEttaBenton28
 
Household Affordability Code Prescription
Household Affordability Code PrescriptionHousehold Affordability Code Prescription
Household Affordability Code PrescriptionPaul Schumann
 
Household Affordability Code Prescription
Household Affordability Code PrescriptionHousehold Affordability Code Prescription
Household Affordability Code PrescriptionPaul Schumann
 
Strategic Communications Plan for Seattle Affordable Housing and Homelessness
Strategic Communications Plan for Seattle Affordable Housing and HomelessnessStrategic Communications Plan for Seattle Affordable Housing and Homelessness
Strategic Communications Plan for Seattle Affordable Housing and HomelessnessAdrian MacDonald
 
Housing Snapshot Final June30 2023df
Housing Snapshot Final June30 2023dfHousing Snapshot Final June30 2023df
Housing Snapshot Final June30 2023dfARCResearch
 
The Social Benefits Of Stable Housing
The Social Benefits Of Stable HousingThe Social Benefits Of Stable Housing
The Social Benefits Of Stable HousingTom Cryer
 

Similaire à MMSS Senior Thesis 2000 (20)

Portion Two Housing Statistics and Housing Market.ppt
Portion Two Housing Statistics and Housing Market.pptPortion Two Housing Statistics and Housing Market.ppt
Portion Two Housing Statistics and Housing Market.ppt
 
community devolepment
community devolepmentcommunity devolepment
community devolepment
 
Thurman Model 1 Cross Sectional ECN 405
Thurman Model 1 Cross Sectional ECN 405Thurman Model 1 Cross Sectional ECN 405
Thurman Model 1 Cross Sectional ECN 405
 
DemographyThe scientific study of population.U.S. Ce.docx
DemographyThe scientific study of population.U.S. Ce.docxDemographyThe scientific study of population.U.S. Ce.docx
DemographyThe scientific study of population.U.S. Ce.docx
 
Capstone Project in Business Intelligence
Capstone Project in Business IntelligenceCapstone Project in Business Intelligence
Capstone Project in Business Intelligence
 
Housing Paper
Housing PaperHousing Paper
Housing Paper
 
A Systematic Review of Affordable Homeownership using Data Science Methods
A Systematic Review of Affordable Homeownership using Data Science MethodsA Systematic Review of Affordable Homeownership using Data Science Methods
A Systematic Review of Affordable Homeownership using Data Science Methods
 
20 THE NEW” HOUSING AND MORTGAGE MARKET SPRING 2016The .docx
20   THE NEW” HOUSING AND MORTGAGE MARKET SPRING 2016The .docx20   THE NEW” HOUSING AND MORTGAGE MARKET SPRING 2016The .docx
20 THE NEW” HOUSING AND MORTGAGE MARKET SPRING 2016The .docx
 
20 THE NEW” HOUSING AND MORTGAGE MARKET SPRING 2016The .docx
20   THE NEW” HOUSING AND MORTGAGE MARKET SPRING 2016The .docx20   THE NEW” HOUSING AND MORTGAGE MARKET SPRING 2016The .docx
20 THE NEW” HOUSING AND MORTGAGE MARKET SPRING 2016The .docx
 
2017 05-CoreLogic Perceptions of Housing Affordability Report
2017 05-CoreLogic Perceptions of Housing Affordability Report2017 05-CoreLogic Perceptions of Housing Affordability Report
2017 05-CoreLogic Perceptions of Housing Affordability Report
 
MPSC_Presentation.pptx
MPSC_Presentation.pptxMPSC_Presentation.pptx
MPSC_Presentation.pptx
 
16An Annotated Bibliography on Solving the Problem of Home
16An Annotated Bibliography on Solving the Problem of Home16An Annotated Bibliography on Solving the Problem of Home
16An Annotated Bibliography on Solving the Problem of Home
 
16An Annotated Bibliography on Solving the Problem of Home
16An Annotated Bibliography on Solving the Problem of Home16An Annotated Bibliography on Solving the Problem of Home
16An Annotated Bibliography on Solving the Problem of Home
 
Missoula Housing Report, 2015
Missoula Housing Report, 2015Missoula Housing Report, 2015
Missoula Housing Report, 2015
 
Household Affordability Code Prescription
Household Affordability Code PrescriptionHousehold Affordability Code Prescription
Household Affordability Code Prescription
 
Household Affordability Code Prescription
Household Affordability Code PrescriptionHousehold Affordability Code Prescription
Household Affordability Code Prescription
 
Strategic Communications Plan for Seattle Affordable Housing and Homelessness
Strategic Communications Plan for Seattle Affordable Housing and HomelessnessStrategic Communications Plan for Seattle Affordable Housing and Homelessness
Strategic Communications Plan for Seattle Affordable Housing and Homelessness
 
Housing Snapshot Final June30 2023df
Housing Snapshot Final June30 2023dfHousing Snapshot Final June30 2023df
Housing Snapshot Final June30 2023df
 
The Social Benefits Of Stable Housing
The Social Benefits Of Stable HousingThe Social Benefits Of Stable Housing
The Social Benefits Of Stable Housing
 
2017 Missoula Housing Report
2017 Missoula Housing Report2017 Missoula Housing Report
2017 Missoula Housing Report
 

MMSS Senior Thesis 2000

  • 1. 1 Chicago Area Housing Analysis A Hedonic Price Study By Eric P. Morel An MMSS Senior Thesis Advisors Michael Dacey Edwin Mills June 2000
  • 2. 2 ACKNOWLEDGMENTS First and foremost, I would like to thank Ed Mills for his interest, support and guidance. He gave me much needed direction and encouragement, as well as access to his extensive knowledge of the subject. I would also like to thank Professor Dacey and the rest of the MMSS faculty for facilitating such a wonderful program. Finally, I would like to thank my family and friends for their love and support.
  • 3. 3 ABSTRACT This study utilizes data provided by the Chicago Tribune Homes web site to analyze the relationships between community home attributes and median home sale prices in Chicago and surrounding counties for 1999. Based on the hedonic pricing model that partitions complex commodities into variable quantities of uniform attributes, the study shows that certain home related community attributes significantly and predictably contribute to median home prices. The study examines attributes in four categories including physical, community, inhabitant and location characteristics. Regression analysis reveals that attributes in each of these categories affect home prices. However, the feasibility of using least squares estimation to analyze this data is carefully scrutinized. Beyond attribute pricing, this study also offers other interesting relational findings among the community data.
  • 4. 4 INTRODUCTION For years the housing market has captured the attention of professionals in both academic and business arenas. The home as a commodity stands out for several reasons. First, a home is the biggest purchase and investment many families will make in a lifetime. Outside of extraordinary cases, whether renting or purchasing, every family must devote some of their income towards a living environment. This makes supply and demand analysis particularly interesting. Finally, a home is one of the best examples of a truly heterogeneous good. Homes can vary by any number of factors including size, material, age, design and location. As one of the most expensive and complex goods available, homes and the housing market draw a great amount of analysis and speculation from both commercial and private investors hoping to find trends or other insight to give them a competitive edge. Study of housing data has also sparked as well as helped to answer many sociological questions that have been the concern of many academics and politicians as well as society as a whole. Formalized by Rosen in the 1970’s, the hedonic pricing model has been regularly applied to the housing market. The model asserts that any heterogeneous good is really a bundle of fairly uniform, homogeneous goods or attributes that vary in quantity. Therefore, the price or value of a complex commodity can be represented by a vector containing the quantities of the underlying attributes. When applied to a utility function, this attribute vector generates a value function. The addition of a composite good vector and a budget constraint allow for the estimation of a housing demand function. Much of the analysis of housing data is an attempt to identify and quantify these attribute goods and estimate their value function within the context of the entire home. Other studies
  • 5. 5 focus on price indexing, studying national data in order to discover price discrepancies among homes in different regions and at different time periods. There are many time varying and regionally specific factors that affect the supply and demand for homes, causing price variation. However, this broad longitudinal and regional analysis will not be the focus of this work. This study will take the former approach in an attempt to define and quantify the hedonic values of home attributes in a localized, cross sectional study. According to the hedonic pricing model, the value of a home can be attributed to the value of the bundle of homogeneous qualities that it contains. All other things equal, a home containing more of one of these positively valued attributes will be worth more. This study intends to show that these valued attributes are not only physical characteristics of a home, but community characteristics as well. A home provides membership in a community with benefits, obligations and sometimes problems. Some of these factors, whether legally enforced or simply effects of proximity, can be measured and compared across communities as commodities in the pricing model. This study utilizes recent housing data from the city of Chicago and communities from seven surrounding counties to analyze these commodities, both structural and neighborhood attributes, to estimate a model for home values. The study also uncovers many interesting sociological relationships worth mentioning.
  • 6. 6 DATA The data for this study comes from the Homes section of the Chicago Tribune’s web site, http://www.chicagotribune.com/homes. The site references sources including the U.S. Census Bureau, Claritas Inc., MLS of Northern Illinois, Northeastern Planning Commission, Illinois State Police, Chicago Police Department and the Illinois State Board of Education. The site provides fairly uniform information for Chicago’s 77 community areas as well as an additional 296 suburban communities within Cook, DuPage, Lake, McHenry and Will counties in Illinois and Lake and Porter counties in Northwest Indiana. Most of the 77 Chicago communities, adopted by the Chicago Association of Realtors, correspond with traditional Chicago neighborhoods such as Lakeview or Hyde Park, while others are aggregates of neighborhoods that are seldom referred to outside of the real estate community. The Chicago Tribune page for each community includes a profile by a contracted writer, with access to a facts and figures page as well as archived information. A wealth of information exists from the many sources that are compiled into the facts page. The site includes location data, including county designation, a measure of distance from the loop and area in square miles. Community information reported from the 1990 census survey consists of percentages of single-family units, number of housing units, population, percentages for the number of people in a housing unit, the number of rooms in a housing unit, homes built in grouped cohort years, as well as age, race, sex, marital status, education, employment and occupation distributions. Important data also exists concerning standardized 1998 crime data, 1998 educational data including average ACT scores by school district, and finally
  • 7. 7 1999 data on quarterly median home sales prices coupled with the number of homes sold in that period and current figures for population and number of housing units. Archived information for 1998 median home value also exists for some communities. Although the available information is fairly consistent among the different communities, there is some small variation in reporting depending on the given county. Missing information also exists for a small number of communities. While most of the missing data exists for the smallest and most outlying communities, there also appears to be some random holes in the data that seem to be simple mistakes in data compilation. Treatment of missing values will be discussed later. When organized correctly and thoughtfully, this data contains valuable information concerning the relationships between median community home sale prices and community attributes. The 300 plus fairly uniform observations allow for adequate statistical analysis. There have been a multitude of studies utilizing home sales data from various sources. Home sale transactions and prices are traditionally well documented. Unfortunately, the quality and quantity of existing information on the attributes of those homes is much lower. The AHS (American Housing Survey), is one heavily studied source, along with NAR (National Association of Realtors) data. These data sets differ from the Tribune data, however, because the observations deal with individual homes and sales. Although these provide an accurate source of pricing information and physical home properties, they can be inadequate for deciphering community attributes. The American Housing Survey polls for the adequacy of some community amenities, and comparisons between communities are the product of home owner opinion. The response is based on the individual home owner’s values rather than any standardized measure of
  • 8. 8 performance. This data may become biased when compared across communities. For instance, a suburban family might be overly critical of a great school that might not quite measure up to the one in the next neighborhood, although they are both in the top few percent of schools. Also, a homeowner might respond with incomplete information. A single professional’s response to the adequacy of education might be completely arbitrary due to lack of concern. Ultimately, these survey questions do provide valuable insight towards qualities of attributes in a neighborhood, but may not be as accurate as standardized statistical measurements. This is a great feature of the Tribune data. Measurements for many community commodities are uniform and standardized. Mean ACT scores of high school seniors are available for each community, as well as crime rates per 1000 residents. While these are still only proxies for the quality of education and safety in a community, they at least insure a standardized comparison across communities. Another main motivation for using the Chicago Tribune Homes data to study attribute contributions toward housing prices involves the assumption that there is a great deal more variance in home values between communities than within communities. Homes within a community share many of the same resources that affect value. Also, homes should be expected to share more physical characteristics then present for a larger sample. The Tribune data is conveniently aggregated at this level, facilitating the exploration of variance causation between communities. The aggregated observations are also interesting from a modeling standpoint. In this study, a model will be formulated using home attribute variation to account for variance in median home values. The residual errors of this model will be particularly valuable because they will show which communities are undervalued and overvalued by the regression analysis. The largest
  • 9. 9 outlying communities can be studied to look for any explanation of the deviance between the actual and expected home prices. This may lead to the identification of significant home and community attributes that might have been over looked by the model. This residual analysis would be much more difficult from an individual home perspective, as it would be very difficult to uncover additional information for each home. The benefits of the Chicago Tribune data are coupled with some weak points. First, the data is a collection of information from a variety of sources. Although all the sources seem reputable and the data appears to be accurate, the actual conditions and integrity of the data collection will never be known. The population data from the different years are most likely sophisticated estimates of actual population. While the Census Bureau provides the 1990 data, Claritas provides the 1999 population information. It is possible that these two sources have different estimation methods that might cause a bias in the information. Another problem that may lead to the misspecification of median home price variance in the model stems from levels of data aggregation for certain variables that differ from the community levels. For instance, the 1999 sales data provided by the MLS couples some of the smaller outlying suburbs with a larger neighboring suburb. Since the observations are represented per community, any sales data that bundles several communities will result in a replication of the exogenous variables. The model losses some freedom as the medians may have differed if the data had been partitioned by community. If a difference does exist, some of the explanatory power of the endogenous variables for those communities will be lost through the overly aggregated data.
  • 10. 10 An interesting problem of aggregation is present in the descriptive properties of the Tribune data. Some community variables are displayed as averages of individual statistics while others are median values. One relevant example includes quarterly home sale prices and number of rooms which are represented as median values and gross percentages respectively. For modeling purposes, these percentages were used as weights to derive the average number of rooms. When data is not distributed symmetrically, means can differ significantly from medians. It is possible to misrepresent the data when regressing means against medians. Using the median home price and average number of rooms as an example, imagine that the true value of a home is exactly $50,000 times the number of rooms. Community A has 10 home sales; all four room homes for $200,000. Community B also has 10 home sales; six two room homes for $100,000 and four seven room homes for $350,000. For each community, there is an average of four rooms per home and the average home price is $200,000. However, the median home price for community B is only $100,000. The comparison of the mean of one variable to the median of another fails to uncover the true relationship between the number of rooms and home price. This phenomenon is particularly threatening to any future model because the median home prices are certainly not normally or symmetrically distributed among communities (see Appendix 1.1), although individual home prices within a community may more closely subscribe to these distributions. While the previous problems are more subtle, the time lag between the 1990 census attribute variables and the 1999 median price data may be the most recognizable problem with the data. Crime and educational data are lagged by one year. This is ideal under the assumption that these are the most current statistics that would be realized by a
  • 11. 11 potential buyer and are good proxies for the level of these attributes. However, many of the attribute variables that will be used in the hedonic model were calculated from census information that is nine years removed from the home sales data. Luckily, most of these attributes should not vary greatly over a decade for communities within reasonable growth and construction limitations. These include the average number of rooms per home, average number of people per housing unit and the percent of single family structures. Two other community attribute variables raise concern, however. One attribute that could change significantly over a decade is the racial composition of a community. Particularly in Chicago, many traditionally ethnic and minority neighborhoods have been experiencing gentrification and an influx white urban professionals within recent years. The census data cannot account for any of these recent changes and may cause a bias for some neighborhoods. Another disturbing shortcoming of the census data involves average home age. All average community home ages have been calculated as of the year 1990. This data is right censored and does not factor any new homes built after 1990. With the Tribune data, there is a lack of information concerning the percentage of the sales of new homes as opposed to old structures. Many outlying suburbs are rapidly growing under new construction. Even some of the most run down neighborhoods in Chicago are experiencing home restoration and construction as investors attempt to take advantage of low property values within minutes of the Loop. It seems logical that the median home price within a community would increase as the percentage of new home sales increases. Unfortunately, the Tribune data offers little to account for this median price variance. For communities with a stable population, the lack of accountability for the last ten years of average home age should make little
  • 12. 12 difference in median price. However, for rapidly growing communities, particularly those outside of Cook County, the average home age as of 1990 could greatly misrepresent the true average age of homes as reflected in the median home prices for 1999. As entered directly from the Tribune Homes site, the data contained 374 observations of 63 variables. Certain cases were missing data. Some of this was systematic by county; some appeared to be more frequent in smaller outlying communities, while other missing values seemed completely random. It became immediately apparent that too much information was missing from the two Indiana counties. These observations could not be included in any model of median home prices due to missing values. Therefore the 32 Indiana observations were discarded from all further data analysis for consistency, leaving the total number of observations at 342. In order to effectively describe, interpret and model relationships in the data, new variables were derived from the original input, and for some important variables, missing values were estimated using available data from surrounding communities. Any respecifications or extrapolations of data were conducted in a uniform manner. Much of the data from the 1990 Census contained percentages from categorized survey responses. In all cases, these percentages were used as weights to compute average values. For the case of home age, dummy variables indicating the decade of median home age were also derived. Other variables appeared as a single percentage, such as the racial and occupational data. Dummy variables were also created for these figures, provided that the use of percentages in OLS regression creates a bounding problem. In general, percentage bounds for the dummy variables were assigned close to one standard
  • 13. 13 deviation from the mean. Several variables required considerably more attention. Excluding the Indiana data, 27 missing values for crime existed, all for communities outside of Cook and DuPage Counties. Three variables were created to deal with the missing values. The first variable leaves the missing values as missing, the second replaces the missing values with reported county averages, while the third estimates the crime statistic by averaging crime rates from surrounding communities. Equal care was given to the education variable of ACT scores. The mean ACT scores were supplied by school district, not by community. Fortunately, in the suburbs, one school district typically corresponds to one community. However, when a suburban community was served by more than one school district, the mean ACT scores of the relevant districts were averaged to arrive at the community statistic. The situation becomes more complex for the 77 Chicago communities. One school district, including over 60 high schools, encompasses the entire city. Unlike the suburbs, children are not forced to attend the closest public school. In fact, there exist a number of magnet schools like Young that encourage the enrollment of promising students from all over the city. Complicating matters further, many parents who can afford private schools avoid the Chicago Public School system. As a result, schools are filled with a much higher percentage of underprivileged minorities than the surrounding population. Despite the overall complexity and diminished significance of public education in Chicago, ACT statistics were retrieved and recorded for each individual school. Under the loose assumption that families might locate themselves closest to the school in which they intend to enroll their children, ACT scores were matched with Chicago communities by high school. ACT values were estimated for communities without a high school by averaging the scores
  • 14. 14 from nearby communities. Therefore, two versions of the ACT proxy variable for quality of education exist. The first records the school district mean ACT score of 17.3 for all of the Chicago communities, while the second lists individual high school results with extrapolation for surrounding community values. For the second variable, ACT statistics were estimated for 28 out of the 77 communities. The lack of distance measures for the Chicago communities also merits a final procedural mention. Within the Tribune data for the suburbs surrounding Chicago, community measures of area in square miles and distance to the Loop in miles were listed. These statistics were not included for the Chicago communities. Anticipating their importance for descriptive and modeling purposes, these values were measured manually from the Census Tract Reference Maps distributed by the Chicago Association of Realtors. Other transformations of existing data occurred in order to arrive at noteworthy statistics or regression friendly data. Any procedures of importance not previously covered will be mentioned in later sections of this study. After a good amount of data analysis, any observations containing less than ten 1999 home sales and or less than 2500 population were eliminated to create a new subset of the data, hereafter entitled adjusted data. This subset was created for several reasons. First, by eliminating the smallest communities with few home sales, the chance of the median house data being unrepresentative of true median community home value is reduced. As mentioned earlier, some of the more sparsely populated outlying suburbs were combined with bigger suburbs to aggregate sales data. By placing a minimum constraint on population, and eliminating some of these outlying suburbs, the presence of duplicated exogenous variables under individually specified community attributes can be
  • 15. 15 reduced for more accurate explanation of variance between communities. Another rationale involves the reduction of both endogenous and exogenous outlying statistics. Some community attribute variables are population sensitive, such as the number of crimes per 1000 residents or the estimate for average lot size, which varies with the number of housing units. With small populations figured in the denominator of a statistic, values can become unreasonably large. For instance, the community of Bedford Park has a population of 535 and a large industrial park. In 1998, the town recorded approximately 600 crimes, mostly nonviolent property crimes occurring in the park. The computed statistic of 1127, several standard deviations above the mean, presumably overestimates the danger placed on the average home owner in this neighborhood. These problems arise from the misspecification of variables, and can be magnified in small communities. In that example, the data did not distinguish between violent crime and industrial crime outside of residential areas. A similar phenomenon can occur for median home prices in a small residential area. The smaller the community, the more likely the majority of homes sold can contain a common attribute not contained in or explained by the data. For analytical and regression purposes, both the complete and adjusted data sets will be utilized, insuring two perspectives on the Tribune data.
  • 16. 16 DESCRIPTIVES This section highlights and describes the variables collected from the Tribune data, as well as studies relationships between variables prior to regression analysis. Descriptive statistics are displayed for variables from both the complete and adjusted data set. In order to avoid large ranges created by outlying data points, scatter plots will be limited to the adjusted data set. In order to achieve consistency, any further graphs, simple regressions or Pearson correlation statistics will also be derived from the adjusted data set. Pearson correlation statistics with a significance of .05 or better will be designated by one asterisk, while .01 or better will be given two asterisks. Exogenous Variable Median community home sales prices are at the focus of this study. Actually, the finalized statistics are the average and weighted average of the quarterly sales prices. These statistics would differ from true yearly medians. The first computation, not weighted by quarterly home sales, highlights the fact the quarterly prices are not averages, and that any quarterly median could be closest to the true median with equal probability. This variable will be labeled Price1 . The weighted average assumes that a quarter with more home sales is more likely to represent the true yearly median statistic. When the means of these two variables are compared, the weighted average registers slightly higher. This is because the quarter with the highest mean number of home sales also has the highest mean quarterly price figure. This third quarter phenomenon could be the result of seasonality or supply and demand issues. Descriptive statistics are listed below.
  • 17. 17 Mean Standard Deviation Median Minimum Maximum Valid N Complete Price1 168,338 106,282 142,563 15,900 1,011,500 336 Price2 169,006 106,278 143,484 15,900 963,945 336 Adjusted Price1 163,046 89,085 141,730 24,750 694,125 286 Price2 163,512 89,464 142,178 24,654 700,754 286 Price1 = Average of Quarterly Median Home Sale Prices Price2 = Average of Quarterly Median Home Sale Prices Weighted by Quarterly Number of Homes Sold As the dependent variable, the price data is highly correlated with many variables assumed to affect the price. Price is also correlated with other factors that direct individual demand levels for home attributes. These include income, years of education, age, and marital status. The distributions of the price variables are also important to consider. Statistics for skewness and kurtosis are both well over two, suggesting that there is little chance that the distribution is normal. A histogram of the non-weighted average variable (Appendix 1.1) shows that the distribution is skewed to the right. This identifies the presence of several elite communities where the average of quarterly median home values is several times the mean average. Endogenous Variables For classification purposes, expected hedonic attributes have been placed in four groups: home attributes, community attributes, inhabitant attributes and indicator attributes. These groups will resurface in the choice of regression models. The first group encompasses physical properties of the house. Using all available Tribune data, four basic variables were created within this category, along with logical derivations. They include average number of rooms per home, home age, percent of single family homes and a calculation for number of square feet per housing unit.
  • 18. 18 The first variable of the group, average number of rooms per home, is an obvious valued home attribute. Unfortunately, the data for this variable is lagged, as derived from the 1990 census data. Fortunately, the average home structure of a community should not change much over a decade. The variable still shares a strong correlation with Price1 of (0.617)**. Descriptive statistics are shown below. Mean Standard Deviation Median Minimum Maximum Valid N Complete Rooms In Home 5.70 0.94 5.58 3.36 8.55 337 Rooms Per Person 2.02 0.31 2.02 0.94 2.94 330 Adjusted Rooms In Home 5.64 0.91 5.53 3.36 8.55 287 Rooms Per Person 2.01 0.30 2.00 0.94 2.84 280 Notice that the descriptive statistics for rooms per person are included in the table above. Although this variable should not have a direct causal relationship with room prices, it is highly correlated (0.760)** with median family income, since high income homeowners demand more space per person. It would be interesting to compare the mean number of rooms per person with means from different geographical areas. The transformation of Tribune data for home age to a set of analyzable variables was much more complicated than for average number of rooms. Again, the data originated from the 1990 census. Already this presents a problem of censoring. Ideally, if no homes were built in the ten years since the survey, average home age would just increase by ten. However, new homes have been built. This provides the freedom for average home age to increase by less than ten or even decrease in a rapidly expanding community. The question for determining whether the lagged values for home age are adequate is whether or not home construction and has been fairly uniform across communities. The answer is no. Another problem lies at the other end of the home age
  • 19. 19 spectrum. In order to display home age, the census survey lists period ranges, usually by decade, along with the percent of homes built within that period. The earliest period is listed as 1939 or earlier. This introduces a problem of left censoring. While the other periods have a ten year range, the range for the earliest period is much greater. The statistic for home age was calculated by the weighted average of the upper bounds of these periods. Clearly this could underestimate the average home age in a community with a number of homes built in the early 20th century, or even the 1800's. In order to correct for this left censoring, a series of dummy variables indicating median period of home construction as of 1990 was created and will be considered in regression models. Overall the variables for home age leave much to be desired and contain a possible bias for comminutes with large quantities of home construction in the past 10 years. Considering the available data, however, they are the best approximations of true community home age. Descriptive statistics for the average home age as of 1990, using upper decade bounds of the census, are listed below. Mean Standard Deviation Median Minimum Maximum Valid N Complete Average Home Age 25.78 10.08 25.60 6.02 48.05 337 Adjusted Average Home Age 25.64 10.09 25.60 6.02 48.05 287 Average home age is slightly yet significantly negatively correlated (-0.172)** with Price1 . This supports the reasonable assumption that newer homes are worth more. Hedonic analysis will determine whether the age of the home itself or other attributes that are correlated with the age of the home contribute to the variance in home price. This appears interesting because several variables, including education (-0.601)**, average
  • 20. 20 number of rooms (-0.43)** and percent minority (0.474)** are all more highly correlated with home age than price. Another correlation (-0.585)** suggests that average home age decreases as communities move further from the loop. This supports the claim that new home construction is most likely not uniform across communities. The percentage of single family units in a community is an interesting variable to study because of its high correlation with many other variables. Logically, this variable is positively correlated with Price1 (0.361)**. However, there is a much stronger correlation between this variable and the average number of rooms in a housing unit (0.789)**. A scatter plot of these two variables can be found in Appendix 1.2. Basically, the average home size in a community closely approximates the percentage of single family units. This could present problems if both variables are included in a model of price estimation. Descriptive statistics are found below. Mean Standard Deviation Median Minimum Maximum Valid N Complete Percentage of Single Family Units .692 .210 .739 .019 .993 337 Adjusted Percentage of Single Family Units .686 .206 .730 .019 .990 287 The final physical attribute variable was computed using two of the Tribune variables in an attempt to approximate lot size. Unfortunately, no lot size data was directly available. The statistic was derived by dividing the data for the area of the community by the number of housing units in the community as of 1999. The area, which had been provided in square miles, was converted to square feet. This statistic is inferior to an actual measure of average lot size because the percentage of land devoted to
  • 21. 21 residential zoning is unknown among the communities. Even if this percentage is fairly constant among communities, space unoccupied by homes is generally unaccounted for. Certainly a large space containing a park will have a different effect on home values when compared to a space devoted to a garbage dump or a chemical plant. Despite its shortcomings, this variable does contain some explanatory power. As expected, the number of square feet per housing unit is positively correlated with Price1 (0.236)**. There is also a positive correlation with the distance to the Loop (0.387)**, suggesting that housing density decreases as communities get further from the city. Descriptive statistics are displayed below. Mean Standard Deviation Median Minimum Maximum Valid N Complete Number of Square Feet Per Housing Unit 87,507 470,485 22,026 1,754 8,131,200 330 Adjusted Number of Square Feet Per Housing Unit 34,131 62,238 20,063 1,754 612,419 280 The next group of attributes provided by the Tribune data, entitled community attributes, differs from the previous set because the attributes are independent of the physical characteristics of properties within the community. However, this study intends to show that these attributes still significantly contribute to median home values among communities. Each of the four major variables in this group represent a community characteristic. ACT scores are studied to estimate the quality of public education within a community, while crime rates should measure safety. Distance to the Loop represents the ease of access to all the benefits of downtown Chicago, including employment and social opportunities. Finally, a measure for the number of places of worship per thousand
  • 22. 22 residents may provide a loose estimate for family values and family structure as well as camaraderie within a community. The derivation of the ACT statistics was explained in the previous section. For this study, ACT scores are a great measure of public education because they are standardized across communities. However, it must be noted that public education is not the only factor attributed to the level of ACT achievement. Parental influence is critical for a child's success at school. The significant correlation between ACT scores and median years of school completed within a community shows this (0.606)** Just as public education cannot account for all of the results on ACT tests, the tests cannot reveal the entire level of educational quality at any school. Mean ACT scores are negatively correlated with the percentage of minorities in a community (-0.644)**, even though educational spending per student is not dictated by race. Lower percentages of two parent homes and lower parental education levels contribute to the decreased performance of minorities on ACT tests as much as inadequate schools. Despite this, the ACT scores provide good insight into a parent's perception of the quality of education in a community. This will ultimately affect median home values. The correlation between ACT scores and Price1 shows a significant relationship between the two variables (0.557)**. Descriptive statistics can be found below. Mean Standard Deviation Median Minimum Maximum Valid N Complete ACT1 20.87 2.57 21.6 16.7 26.2 338 ACT2 20.65 2.98 21.6 14.0 26.2 342 Adjusted ACT1 20.69 2.60 21.3 16.7 26.2 287 ACT2 20.45 3.02 21.3 14.0 26.2 288 ACT1 = Mean ACT Composite Score (Chicago Communities Treated as One District) ACT2 = Mean ACT Composite Score (Chicago Communities By Individual Schools - Extrapolated Data)
  • 23. 23 Just as consumers are concerned with education levels, they are also interested in safety levels. As mentioned earlier, the Tribune crime data measures total crimes committed in 1998 per 1000 residents. Although reported crime is clearly a measure of safety, the data does not distinguish between violent crime and property crime. There is also no distinction between crime in residential areas, opposed to industrial or commercial areas. These different types of crime might affect a resident's perception of safety in different ways that cannot be accounted for in the data. Interestingly, crime is not nearly as correlated with Price1 as many of the endogenous variables (-0.193)**. There are much higher correlation statistics between crime and percent minority (0.513)** and average number of rooms per housing unit (-0.503)**. This relationship is of particular interest and is shown in a scatter plot in Appendix 1.3. There are many speculative reasons why this relationship might exist. Communities with large homes may be more likely to have a higher percentage of residential zoning, limiting crime against industrial and commercial properties. Also larger homes tend to have bigger yards and better security, inhibiting stealthy movement. The descriptive statistics for the crime variables can be found below. Mean Standard Deviation Median Minimum Maximum Valid N Complete Crime1 54.88 79.82 39 0 1127 315 Crime2 53.05 76.88 36 0 1127 342 Crime3 52.57 77.04 35 0 1127 342 Adjusted Crime1 51.11 49.55 40 4 615 280 Crime2 50.55 48.97 39 4 615 288 Crime3 50.46 49.04 39 4 615 288 Crime1 = Crimes Per 1000 Residents (Missing Values) Crime2 = Crimes Per 1000 Residents (Missing Values Filled With County Averages) Crime3 = Crimes Per 1000 Residents (Missing Values Filled With Extrapolated Values)
  • 24. 24 The variable for the distance to the Loop is of particular interest because there is no significant correlation between it and Price1 . However, it seems logical that all other attributes equal, median home prices should decrease as communities distance themselves from downtown Chicago. It will be interesting to discover the attribute price assigned to this variable in the hedonic model. Not surprisingly, distance to the Loop is highly correlated with average home age (-0.585)**. Descriptive statistics are listed below. Mean Standard Deviation Median Minimum Maximum Valid N Complete Distance to Loop 26.18 16.24 23 0 73 342 Adjusted Distance to Loop 23.76 14.73 21.5 0 70 288 Finally, the number of parks per 1000 residents and places of worship per 1000 residents were measured in order to estimate the 'family atmosphere' of a community. The Tribune data listed information on parks and worship areas for each community. These were tallied and divided by the 1999 population and multiplied by 1000. Unfortunately, parks were only listed for Chicago communities and the variable could not be utilized in subsequent models. The parks variable is positively correlated with Price1 (0.251)*. After reviewing the data for places of worship in a community, their effect home prices is not quite clear. It seems that many places of worship per capita should be considered a positive attribute. However, many of the most poverty stricken communities have an abundance of churches. Obviously they are not the cause of this despair, but the result of it. As a result, there seems to be no clear linear relationship
  • 25. 25 between the places of worship variable and Price1 . A scatter plot found in appendix 1.4 displays this. Descriptive statistics are listed below. Mean Standard Deviation Median Minimum Maximum Valid N Complete Parks per 1000 Population (1999) .2294 .1496 .1935 .00 .83 72 Places of Worship Per 1000 Population (1999) .8360 .9685 .5650 .00 7.63 332 Adjusted Parks per 1000 Population (1999) .2114 .1237 .1831 .00 .62 68 Places of Worship Per 1000 Population (1999) .7372 .5680 .5980 .00 3.63 282 The third set of variables was grouped as inhabitant attributes. This group of variables differs from the first two because the direct bearings of these attributes on home values are questionable. Included in this group are race and occupation. The rationale behind the inclusion of this group of variables in this study and ensuing hedonic models relies on the assumption that home owners choose to locate themselves in communities containing residents of similar background, occupation and race in order to feel comfortable and sociable. Out of several listed occupations, managerial positions were chosen to most likely represent white collar lifestyle, while factory positions were to represent blue collar lifestyle. The descriptive statistics for both race and occupation variables are listed below.
  • 26. 26 Mean Standard Deviation Median Minimum Maximum Valid N Complete Percent African Amer .1459 .2900 .0100 .0000 .9940 337 Percent Hispanic .0722 .1199 .0280 .0000 .8780 337 Percent Other .0283 .0436 .0160 .0000 .0521 337 Percent Minority .2463 .3024 .1010 .0000 .9990 337 Adjusted Percent African Amer .1520 .2893 .0140 .0000 .9910 287 Percent Hispanic .0787 .1266 .0310 .0000 .8780 287 Percent Other .0295 .0356 .0180 .0000 .2410 287 Percent Minority .2601 .3002 .1110 .0020 .9990 287 Mean Standard Deviation Median Minimum Maximum Valid N Complete Managerial Positions .2823 .1308 .2530 .0440 .6690 337 Factory Positions .0615 .0438 .0530 .0000 .2660 337 Adjusted Managerial Positions .2851 .1264 .2590 .0740 .6290 287 Factory Positions .0614 .0436 .0530 .0010 .2660 287
  • 27. 27 THE MODEL The main objective of this hedonic pricing model is to estimate what attributes are actually incorporated into home values, and to what extent they explain the variation in median home prices by community. This model differs from most previous analyses as it only attempts to explain variance of home prices between communities, not within them. Hopefully, this unique level of aggregation will provide a new and interesting twist to an old and popular subject of analysis. It is also important to realize that this regression analysis is an approximation to an ideal hedonic model. A hedonic model as described by Sheppard (1999) allows for freedom in individual consumers' preferences and their corresponding utility functions. The hedonic model also estimates housing demand and equates it with supply before arriving at price estimation. Although this model should reveal home and community attributes that significantly affect median community housing prices, it is considerably constrained from a perfect hedonic model. There are two important assumptions that should hold true for the success of this model: 1. Sales events are randomly distributed throughout homes in a community 2. Home values are summations of attribute values. There exists no declining marginal utility of attributes. The first assumption is important because the sales data that generate the median home sale prices for the communities are not directly paired with the data that generate the explanatory variables. The sample of homes for the sales data is different from the sample of homes that represent the attribute data within a community. Therefore it is
  • 28. 28 imperative that one sample is representative of the other. For instance, if a new housing division within a community represents 60 percent of the home sales but only 15 percent of the homes, there is likely to be error. The relationship between variables in the model and the true community relationship between home price and attribute quantity may differ because community attributes are not representative of the sample of homes sold. The second assumption addresses the constraints of linear regression.. For this OLS model to be effective, consumers' utilities must closely follow a simple summation of attributes. It is possible to apply a functional form to a variable, but unlikely that a linear function can closely approximate a complex utility function. During the actual regression process, over fifty models were analyzed. These models varied between several factors: 1. The use of both the complete and adjusted data sets. 2. The use of average median price, weighted average median price, and the natural log of these prices. 3. The inclusion and removal of different variables. 4. The use of different specifications of the same variable, as in crime and ACT variables. 5. The use of different functional forms 6. The use of weighted least squares estimation Two variables were removed in order to avoid colinearity. A threshold correlation of .75 was established and the home attribute variable for single family housing had a correlation statistic of (0.789)** with the average number of rooms, while the dummy variable for Chicago community was too negatively correlated (-0.79)** with the variable
  • 29. 29 for average ACT score. The majority of models were fairly consistent concerning the variables of significance and directional effect. A set of models was chosen from the others because of its simple, straightforward nature, and powerful results. The set of models regresses an incrementally increasing group of variables against Price1 . The models placed variables in the groups previously introduced. The group of Home Attributes contains Average Number of Rooms in Housing Unit, Average Home Age as of 1990 (Using upper Decade Bounds of Census), and Number of Square Feet per Community Housing Unit (1999), with dummy variables for level of Single Family Units omitted. The second group, entitled Community Attributes, contains the variables ACT1 (Chicago communities counted as one district), Distance from Loop, Crime3 (Extrapolated Missing Values) and Places of Worship per 1000 Population. The next group contains Inhabitant Attribute Variables, all of which are dummy variables calculated from percentages. This group includes both occupational and racial attributes. There is a dummy for Percent Managerial Occupation <= 15% and Percent Managerial Occupation >= 40%, with >15% < 40% as control values. Another occupational dummy accounts for the percent of factory workers in a community; Percent of Factory Workers >= 15% with < 15% as the control. Racial dummies include Percent African American >= 10% < 90%, Percent African American >= 90% with < 10% as a control and Percent Hispanic >= 10% with < 10% as a control. The final group contains an indicator variable called North Shore Effect. This is a dummy variable that incorporates four North Shore communities that feed into New Trier High School. Median home prices for these communities are extremely high. Without this indicator variable, the model cannot really
  • 30. 30 account for the large deviation in the prices of these homes. The regression results are listed below.
  • 31. 31 Variable Coefficients and T Statistics of Regressions Against Price1 MODEL 1 2 3 4 Constant -291,350.3 584,021.3 -325,507.5 -140,233.7 (-7.561)** (-11.6)** (-5.458)** (-2.484)* Home Attributes Rooms 74,007.592 50,890.765 31,464.323 26,124.75 (13.426)** (8.425)** (4.910)** (4.582)** Home Age 1,528.101 2,313.218 2,160.187 800.597 (2.99)** (4.283)** (3.985)** (1.596) Square Feet 0.002877 0.007671 0.0002837 0.004787 (0.279) (0.825) (0.033) (0.624) Community Attributes Distance to Loop - -1,906.069 -1,208.848 -878.804 (-5.218)** (-3.248)** (-2.652)** Crime - 2.167 5.092 15.079 (0.037) (0.093) (.0.312) ACT - 21,946.031 13,771.243 7,383.63 (8.876)** (5.238)** (3.042)** Places of Worship - -6,382.023 -2.068.652 -1,287.826 (-1.253) (-0.4) (-0.281) Inhabitant Attributes Low Managerial - - -41,590.288 -42,781.206 (-2.714)** (-3.154)** High Managerial - - 85,878.798 87,171.531 (6.768)** (7.761)** High Factory - - 2,556.501 4,234.95 (0.177) (0.331) Middle African Amer - - -33,835.188 -40,537.345 (-2.900)** (-3.915)** High African American - - -28,771.233 -38,292.485 (-1.454) (-2.183)* High Hispanic - - 1,065.53 5,057.242 (0.080) (0.427) North Shore Effect - - - 320,167.46 (9.266)** Adjusted R Square 0.371 0.513 0.587 0.676 F Statistic (64.011)** (49.211)** (35.951)** (48.742)** Incremental F - (24.111)** (10.274)** (85.852)**
  • 32. 32 Results of a regression of the same set of models against Ln(Price1 ) can be found in Appendix 2.1. Overall, these models appear to be successful and informative. The F statistic is significant at the .000 level for each model, indicating that there is less than a one in 10,000 chance that these models explain none of the variance in Price1 . The incremental F statistics for the last three models are also significant at the .000 level, indicating that there is less than a one in 10,000 chance that each additional model explains no more variance than the model preceding it. The adjusted R squared statistic for each successive model increases as more explanatory variables are added. In the final model, over 65 percent of the total variance Price1 is accounted for. Overview of Regression Coefficients The directional effect of significant variables on Price1 are as expected other than the fact that the first three models show Price1 increasing with home age. This might be explained by the fact that some of the priciest communities contain refurbished vintage homes and are often located on expensive land. This theory is supported when the variable for home age losses significance in the fourth model when the North Shore effect is introduced. The North Shore effect 'steals' explanatory value from the average home age variable. These communities all contain older homes that have been upgraded over the years. This model lacks a variable for amount of home quality improvements within a community. This is an attribute that may not be well recorded or hard to standardize. Also, due to a suspiciously high correlation statistics among most of the explanatory variables, multicollinearity may be at hand for any deviations from expected results.
  • 33. 33 As expected, the variable for number of rooms is significant and positive, although it losses explanatory power in each new model as more variables are introduced. The variable for community square feet per housing unit, which is supposed to be a proxy for lot size, is not significant. This could be the cause of poor variable specification. The variable fails to account for zoning within a community. It is possible and probable for communities with similar square feet per housing unit estimates to have completely different lot sizes. For the community attributes, the ACT and distance to loop variables are significant as expected. The insignificance of the crime variable is puzzling. It seems obvious that home values should be lower in crime prevalent communities. Possibly the measurement of crime is too inadequate and the true crime induced variance in home prices is explained by other correlated variables, such as race or even home size. The crime variable also fails to distinguish between violent and property crime, although they have different impacts on a home owner's estimation of safety. A final problem with the crime variable might be underreporting of crime in the worst neighborhoods . For the inhabitant attributes, the percent of residents with a managerial occupation seems to be very significant in explaining variance in community home prices. The occupational variables assume that home owners will choose to locate in a community that provides access to their job type and other people sharing similar occupational and lifestyle interests. However, the cause/effect relationship between home prices and managerial occupation may be somewhat unclear. Perhaps a managerial occupation is just a proxy for income and the model is displaying the fact that families with higher income will buy homes in more expensive communities. The negative effects of a high
  • 34. 34 percentage of African American inhabitants coupled with insignificant effects of Hispanic inhabitants is another interesting phenomenon. Perhaps the actual effects of racial prejudice are really insignificant and a high percentage of African American inhabitants is just a proxy for truly significant community attributes such as unemployment, education, income, crime, or even redlining. Finally the significance of the North Shore effect is apparent in the fourth model. This variable eliminates a large amount of the unexplained residual error present in the previous model from the four outlying communities of Wilmette, Winnetka, Glencoe and Kenilworth. These communities must possess high quantities of some attribute that has not been specified or was not adequately measured by the Tribune data.
  • 35. 35 PROBLEMS WITH HETEROSCEDASTICITY AND MODEL SPECIFICATIONS Simple histograms and scatter plots of variables provide suspicion that many of the variables in this study are not normally distributed, and there are not many clearly linear relationships between variables. Many variables have outlying data that are difficult to account for. Sets of variables are highly correlated, questioning a hierarchy of dependency. In general, the Tribune data and this model draw attention to several of the assumptions of least-squares regression. Problems of this nature are not uncommon when dealing with pricing models. This excerpt comes from Sheppard: Estimation of hedonic prices confronts the economist with a rich sampling of the standard difficulties that arise in estimation using cross-section data. These include choices of the proper parametric specification--both of functional form and of variables to be included--coping with collinearity and ill-conditioned data, potential heteroscedastic and nonnormal errors, regressors subject to measurement error, and maximum likelihood estimation of relationships that are nonlinear. (Sheppard 1614) One major concern is the presence of heteroscedasticity in the model. By simply graphing the squared residuals against each explanatory variable, it is difficult to discern a definite linear relationship. An example is given in Appendix 1.5. However, several outlying points are a cause for concern. Tests for heteroscedasticity were performed for several of the models. Using the Breush-Pagan Test, Chi-squared statistics were generated. Of all the models, the lowest Chi-square value was 65, still highly significant with the degrees of freedom allowed by the model. After trying the adjusted data, logarithmic relationships and even least squares weighted by housing units, the presence of heteroscedasticity seems likely. This presents problems for the regression estimates. Heteroscedasticity affects models in several ways. First, the OLS estimates are still
  • 36. 36 unbiased and consistent, but they are no longer efficient. This means that another unbiased linear estimate that has lower variance than the OLS estimate may exist. Also, variance estimates of the coefficients are no longer valid. They are biased and inconsistent. This renders hypothesis tests such as F and T tests invalid. Although the presence of heteroscedasticity should not nullify the findings of this study, it does raise larger questions of a linear model's overall ability to estimate hedonic functions.
  • 37. 37 CONCLUSION This study attempts to quantitatively explain median community home prices in the Chicago area using hedonic analysis. It evaluates whether median community home prices can be expressed as composite prices of values assigned to varying quantities of underlying attributes. A linear model was constructed in order to value attributes’ expected to influence home prices. These attributes were derived from aggregated community data available through the Chicago Tribune Homes web site. Attributes were divided into home, community and inhabitant groups and a seemingly successful model was generated using least squares regression. The Adjusted R-squared statistic for the full model is (0.676), suggesting that the majority of the variance in median home prices can be explained by the model. However, the success of the model is threatened by the assumptions that were used to generate it. The marginal utility and corresponding impact on overall price is unlikely to remain constant as the quantity varies for some attributes analyzed in the study.
  • 38. 38 Appendix 1.1 Histogram of Price1 AVGPRICE 3325000 275000.0 225000.0 175000.0 125000.0 75000.0 25000.0 675000.0 625000.0 575000.0 525000.0 475000.0 425000.0 375000.00.0 70 60 50 40 30 20 10 0 Std. Dev = 89085.16 Mean = 163046.4 N = 286.00
  • 39. 39 Appendix 1.2 Average Number of Rooms in Housing Unit 9876543 P e r c e n t a g e o f S i n g l e F a m i l y U n i t s 1.0 .8 .6 .4 .2 0.0 Park City Lake Forest South Barrington West Garfield Park Uptown Near South side
  • 40. 40 Appendix 1.3 Average Number of Rooms in Housing Unit 9876543 C r i m e w i t h E x t r a p o l a t e d V a l u e s 700 600 500 400 300 200 100 0 -100 Oak Brook Matteson Broadview O'Hare Near West Side Loop
  • 41. 41 Appendix 1.4 Places of Worship per 1000 Population (1999) 43210-1 A V G P R I C E 700000 600000 500000 400000 300000 200000 100000 0 Riverwoods Barrington Robbins Glencoe Englewood
  • 42. 42 Appendix 1.5 Residuals taken from featured regression Model 4 Average Number of Rooms 9876543 S q u a r e d R e s i d u a l s 39999999000 0 29999999000 0 19999999000 0 9999999700 0 0 - 99999990000 Lake Forest Bannockburn Kenilworth
  • 43. 43 Appendix 2.1 Variable Coefficients and T Statistics of Regressions Against Ln(Price1 ) MODEL 1 2 3 4 Constant 10.108 8.659 10.530 10.882 (54.221)** (36.042)** (42.787)** (42.275)** Home Attributes Rooms 0.313 0.180 0.08957 0.07942 (11.754)** (6.256)** (3.387)** (3.055)** Home Age 4.144E-04 6.955E-03 5.112E-03 2.529E-03 (0.167) (2.699)** (2.285)* (1.105) Square Feet 3.493E-08 4.518E-08 8.400E-09 1.696E-08 (0.701) (1.018) (0.235) (0.485) Community Attributes Distance to Loop - -6.800E-03 -4.667E-03 -4.040E-03 (-3.900)** (-3.036)** (-2.674)** Crime - -2.467E-04 -5.139E-05 -3.242E-05 (-0.884) (-0.228) (-0.147) ACT - 0.110 0.04371 0.03158 (9.283)** (4.029)** (2.853)** Places of Worship - -0.07237 -0.01106 -9.574E-03 (-2.977)** (-0.518) (-0.459) Inhabitant Attributes Low Managerial - - -0.424 -0.426 (-6.704)** (-6.890)** High Managerial - - 0.431 0.434 (8.239)** (8.471)** High Factory - - 0.06613 0.06932 (1.109) (1.189) Middle Black - - -0.334 -0.346 (-6.926)** (-7.334)** High Black - - -0.531 -0.549 (-6.496)** (-6.857)** High Hispanic - - -6.527E-06 7.578E-03 (0.000) (0.140) North Shore Effect - - - 0.608 (3.860)** Adjusted R Square 0.338 0.501 0.683 0.697 F Statistic (55.431)** (46.941)** (54.109)** (53.584)** Incremental F - (26.956)** (30.989)** (14.902)**
  • 44. 44 Appendix 3.1 OTHER INTERESTING VARIABLES Mean Standard Deviation Median Minimum Maximum Valid N Complete Households 1999 7,685.66 8,263.60 4983 24 54,935 330 Population 1999 21,297.48 21,908.49 14,124 71 124,321 330 Increase1 .1649 .3276 .0716 -.3499 3.0857 251 Increase2 .1674 .3246 .0713 -.293 3.028 250 People per Household 2.85 .39 2.81 1.69 5.33 330 Adjusted Households 1999 8,933.93 8,370.02 6586 864 54,935 280 Population 1999 24,743.63 22,047.62 18,119 2,705 124,321 280 Increase1 .1723 .3458 .0753 -.3499 3.0857 207 Increase2 .1766 .3416 .0764 -.293 3.028 206 People per Household 2.84 .39 2.80 1.69 5.33 280 Increase1 = Percent Increase in Households From 1990 to 1999 Increase2 = Percent Increase in Population From 1990 to 1999 Mean Standard Deviation Median Minimum Maximum Valid N Complete Family Income 75,841.12 42,579.82 64,313.50 8,944.00 256,359.00 330 Never Married .2735 .0822 .2520 .1370 .5550 337 Median Age 36.21 4.78 36.10 21.30 49.50 330 Adjusted Family Income 75,862.20 40,409.22 65,894.50 11,189.00 256,359.00 280 Never Married .2767 .0797 .2560 .1600 .5550 287 Median Age 36.22 4.57 36.10 21.30 49.50 280
  • 45. 45 REFERENCES Cited Chicago Tribune. http://cgi.chicago.tribune.com/homes. 2000. Sheppard, Stephen. "Hedonic Analysis of Housing Markets." Handbook of Regional and Urban Economics. Elsevier Science B.V., 1999. 1595-1635. Not Cited Asabere, Paul K., and Forrest E. Huffman. "Price Determinants of Foreclosed Urban Land." Urban Studies 29 (1992): 701-707. Bednarz, Robert S. The Effect of Air pollution on Property Value in Chicago. Chicago: The University of Chicago, 1975. Bloom, George F., and Henry S. Harrison. Appraising the Single Family Residence. Chicago: American Institute of Real Estate Appraisers, 1978. Carn, Neil, et al. Real Estate Market Analysis: Techniques & Applications. New Jersey: Prentice Hall, 1988. Lawrence, Roderick J. "Housing Quality: An Agenda for Research." Urban Studies 32 (1995): 1655-1664. Mills, Edwin S. "New Hedonic Estimates of Regional Constant Quality House Prices." Journal of Urban Economics 39 (1996): 209-215.
  • 46. 46 Paris, Chris. "Demographic Aspects of Social Change: Implications for Strategic Housing Policy." Urban Studies 32 (1995): 1623-1643. Peek, J., and J. Wilcox. "The measurement and determinants of single-family house prices." AREUEA Journal 19 (1991):353-382. Ring, Alfred A. The Valuation of Real Estate: Second Edition. New Jersey: Prentice Hall, 1970.