2. Predicting final sale price of house
Listed at $1.0 M
June 1st
Closed at $1.1 M
July 1st
ΔP, Δt
3. The data
• Database of ~150,000 properties in urban California
• Numerical features: home area, lot area, #bedrooms, #bathrooms,
year built, date of sale, lat/long
• Categorical features: ZIP code, home type, seller’s agent
• Median error assuming zero ΔP: 2.3%
4. A simple predictive model
• For each house, look at nearby houses that sold recently
• ΔP ~ 1 + List + ΔP1/List1 + List(ΔP1/List1) + ΔP2/List2 + …
• Gets sign of ΔP right 59% of the time
• When it does, median error reduced to 1.4%
r = 0.53
2
1
3
5. Segmenting regression by region
California
BayArea
LosAngeles
InlandEmpire
med. |ΔP|/List (%) 2.3 7.0 2.0 1.7
med. error in predicted sale price (%) 1.4 6.6 1.4 1.4
freq. sign correct (%) 59 87 60 60
Pearson’s r for ΔPpredicted vs. ΔP 0.53 0.42 0.43 0.14