This document summarizes a regression model built to predict housing prices in King County, USA based on various house features. It analyzes factors like square footage, number of bathrooms, location, and condition that have strong correlations with price. The model uses data from house sales in 2014-2015 and can explain 70% of price variations. The document recommends real estate agents and developers use the model to estimate reasonable selling prices and advise home owners on features that impact value.
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Real estate regression model King County
1. Data Source: House sales in King County in 2014 &
2015
Presenter: Thu Phung
Real Estate Regression Model
King County, USA
2. o The project uses the house sales data in King County in 2014 and 2015
o This project is to build a regression model to help real estate developers
and real estate agencies to check against prices setting by employees and
house owners
o Technical Tools: Python (Matplotlib, Statsmodels, Scikit-Learn,
Heatmap), Tableau
Main Content and Purpose
3. Monitored Variables
House’s
structures
Inside house
Outside house
Latitude (Lat)House’s location
Year built
Year renovation
Y Price
Longitude (Long)
Grade
Sqft_living
Bathrooms
Floor
Bedrooms
View
Waterfront
Neighbor
Zipcode
Condition
Sqft_living15
Sqft_lot15
Sqft_basement
Sqft_above
Sqft_living15: sqft interior
living space for nearest 15
neighbors
Sqft_lot15: sqft of the land
lots of nearest 15 neighbors
Sqft_lot
4. Removed variables:
• Remove the “sqft_living” to avoid duplicate because
“sqft_living=sqft_above+sqft_basement”
• Remove “zipcode” because “zipcode” is set up without any relate to housing price
Correlation between price and monitor variables:
• Variables relate to size of a house “Bathrooms”, “sqft_above”, “sqft_living15” and the
quality of a house “grade” “view” have strong correlations with price
Data source: House Sales Data King County 2014-2015
5. Data source: House Sales Data King County 2014-2015
Variables with strong correlations
“Sqft_above”: One of the most important element to
determine the size and living space of the house.
Bigger house normally more costly to build, then
more expensive
“bathrooms”: number of bathrooms is not a crucial,
but still favorable of buyers. More bathrooms, more
convenient especially for a big family.
“Sqft_ling15”: This variable reflects a part of
financial healthy of neighbors. Bigger neighbor’s
houses make the whole areas more valuable
6. Location Variables: Latitude and Longitude
With houses in Seattle and Bellevue, the more expensive areas are further
North, showing up as a correlation with Latitude. The heat map shows the higher
prices commanded by houses relatively close to the water. However, Seattle / Bellevue
area is largely a collection of islands and inlets spread out over a large West-
East block, so there is no specific correlation with longitude.
Data source: House Sales Data King County 2014-2015
0.31
0.02
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
lat long
Correlation with Price
7. Most of house are
graded at 7 point. These
may be the "reasonable
quality – reasonable
price" houses – sufficient
quality to be attractive
and yet, not too expensive
compared to the higher
grade houses
Most of houses has
average condition (3),
minor houses are poor
and fair (1&2) which also
less valuable
Categorical variables
Data source: House Sales Data King County 2014-2015
8. Most of house has one or two
floors. Houses with more floors
have higher average price
which makes sense because
they can provide more space of
living. Townhouse style seems
to be more prefer than ranch
Most of houses don’t have a
waterfront, but having a
waterfront seem to be an
advantage, and bring higher
price to nearby houses
Data source: House Sales Data King County 2014-2015
Categorical variables
9. No renovation
Renovation
Ages of houses: Only 4% of house sold were renovated, and most of them were built before
1990. New houses and renovated house are more costly and expected higher price
Data source: House Sales Data King County 2014-2015
10. This regression model help to predict house price based on monitor variables
with R2=70%
Data source: House Sales Data King County 2014-2015
R squared: 0.70
Mean Absolute Error: 122292.14
Mean Squared Error: 36326416754.04
Root Mean Squared Error: 190594.90
Intercept x1[0] x1[1] x1[2] x1[3] x1[4] x1[5] x1[6] x1[7]
-36887551.63 -34182.5 42184.46 0.127222 764.7502 587905 49491.46 31060.32 97315.06
x1[8] x1[9] x1[10] x1[11] x1[12] x1[13] x1[14] x1[15]
179.6263 146.7543 -2458.42 21.54448 561126.6 -117226 27.42714 -0.39311
11. • Realtors and real estate agencies can measure house features (monitored
variables) to estimate the reasonable selling price and give advice to sellers.
• To make house easier to sell, it can be good to consult owners with below key
factors:
Houses offer more space capacity and living quality have higher prices.
Houses in main cities and close to the coast also more attractive
Neighbor is also a strong element to create an invisible value to a house.
Owners of houses built before 1990 may consider to invest in renovation
to get the better selling price
Conclusion