SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
Sberbank Russian
Housing Market - Top 1%
Solution
Team
Chase Edge, Grant Webb
And Brandy Freitas
Project Overview
Project Overview
● Kaggle Competition
● Predict housing prices in Moscow during July 2015 to May 2016 using data
from August 2011 to June 2015
● Data includes housing transaction information (e.g. square meter, number of
rooms and build year), neighborhood details and macroeconomic information
Machine Learning Checklist
1. Frame the Problem and Look at the Big Picture
2. Get The Data
3. Explore the Data
4. Prepare the Data
5. Short List Promising Models
6. Fine-Tune the System
Look at the Big Picture
● Mitigate Risk
○ Help avoid overlending (issuing mortgages in excess of the value of the home)
● Valuation
○ Help banks value their portfolio
● Predictions
○ To give confidence to renters, developers and lenders when they sign a lease or purchase a
building
Value to Sberbank
Data Exploration
Data Overview
Housing Data
● Number of Observations
○ Training - 30,471
○ Test - 7,662
● Number of Features - 290
Macro Data
● Number of Observations - 2,484
● Number of Features - 100
Data Overview - Most Important Features
● Initial Thoughts
○ Square Meters
○ Number of Rooms
○ Build Year
○ Location
Data Overview - Most Important Features
● Initial Thoughts
○ Square Meters
○ Number of Rooms
○ Build Year
○ Location
Housing Price
Housing Price
Prices by District
Observations by District
Data Preparation
Missingness in Top Features
Data Inconsistencies
Data Inconsistencies
Imputations - KNN by Neighborhood
Feature Engineering &
Insights
Feature Engineering
Sq_diff = Full_sq - Kitch_sq
Floor_ratio = Floor / Max_floor
Life_to_full = Life_sq / Full_sq
Kitch_to_full = Kitch_sq / Full_sq
Room_size = Life_sq / Num_room
Month = month of sale
Decline in Russian Housing Prices
Dealing with the Decline
● Problem
○ Housing prices declined in 2015 and 2016
○ Model predicts values that are too high
● Solutions
○ Incorporate economic data
○ Make downward adjustments to predicted values
○ Adjust prices for fluctuations in the market based on a price index
Price Index
● The Data
○ Russian government statistics on the monthly rental prices within Moscow
● The Index
○ 3 month rolling average of 3 bed, 2 bed and 1 bed rentals in Moscow
○ Indexed to the start of the training data (August 2011)
○ Averaged the 3 indices
● The Application
○ Adjust all prices in the training data for changes in the index (divide by index)
○ Model predicts prices as if they occurred in August 2011
○ Adjust the predicted values for the index (multiply by index)
Price Index
Price Index - Example
● Training Transaction
○ Date - April 2013
○ Price - RUB 5,693,972
○ Index - 1.13
○ Adjusted Price - RUB 5,038,913
● Predicted Value
○ Date - April 2016
○ Price - RUB 3,902,007
○ Index - 1.06
○ Adjusted Price - RUB 4,136,127
Short-list Promising
Models
KISS - OLS with 2 Variables
Features = full_sq, sub_area
Response = log_price
KISS - OLS with 7 Variables
Features = full_sq, floor, max_floor, life_to_full, kitch_to_life, product_type, sub_area
Response = log_price
Tree-Based Models
Unrestrained Decision Trees are known to overfit as
shown:
_
Decision Tree
Number of Trees
MaxFeatures
8 hrs to run with 3
cores processing!
Random Forest
XGBoost
● Log(Scaled Price / Meter Sq) as
Target
● Fast - Training time 1 min 43 sec
● Best Results
● Less Interpretable
Future Work
Next Steps
● Model Ensembling
● Feature Engineering
● Time Series Analysis on Pricing Index
● Cluster Analysis on Neighborhoods
Questions

Contenu connexe

En vedette

Data mining with caret package
Data mining with caret packageData mining with caret package
Data mining with caret packageVivian S. Zhang
 
Max Kuhn's talk on R machine learning
Max Kuhn's talk on R machine learningMax Kuhn's talk on R machine learning
Max Kuhn's talk on R machine learningVivian S. Zhang
 
Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangVivian S. Zhang
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorVivian S. Zhang
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitionsOwen Zhang
 

En vedette (6)

Data mining with caret package
Data mining with caret packageData mining with caret package
Data mining with caret package
 
Max Kuhn's talk on R machine learning
Max Kuhn's talk on R machine learningMax Kuhn's talk on R machine learning
Max Kuhn's talk on R machine learning
 
Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen Zhang
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
 
Xgboost
XgboostXgboost
Xgboost
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitions
 

Similaire à Kaggle Top1% Solution: Predicting Housing Prices in Moscow

Days on Housing Market Capstone Presentation
Days on Housing Market Capstone PresentationDays on Housing Market Capstone Presentation
Days on Housing Market Capstone PresentationJonathan Boyle
 
Usa Retail Sales Analysis.pdf
Usa Retail Sales Analysis.pdfUsa Retail Sales Analysis.pdf
Usa Retail Sales Analysis.pdfVishwas Saini
 
Introduction to Quantitative Factor Investing
Introduction to Quantitative Factor InvestingIntroduction to Quantitative Factor Investing
Introduction to Quantitative Factor InvestingQuantInsti
 
Qlik view demo project
Qlik view   demo projectQlik view   demo project
Qlik view demo projectWhirldata
 
Econometric Semiconductor Forecast
Econometric Semiconductor ForecastEconometric Semiconductor Forecast
Econometric Semiconductor ForecastLinx Consulting, LLC
 
"Marketing Analytics and Applications": Course Introduction
"Marketing Analytics and Applications": Course Introduction"Marketing Analytics and Applications": Course Introduction
"Marketing Analytics and Applications": Course IntroductionMasao Kakihara
 
Big Data Day LA 2016/ NoSQL track - Big Data and Real Estate, Jon Zifcak, CEO...
Big Data Day LA 2016/ NoSQL track - Big Data and Real Estate, Jon Zifcak, CEO...Big Data Day LA 2016/ NoSQL track - Big Data and Real Estate, Jon Zifcak, CEO...
Big Data Day LA 2016/ NoSQL track - Big Data and Real Estate, Jon Zifcak, CEO...Data Con LA
 
Stock market analysis using supervised machine learning
Stock market analysis using supervised machine learningStock market analysis using supervised machine learning
Stock market analysis using supervised machine learningPriyanshu Gandhi
 
The Decision-making Model for the Stock Market under Uncertainty
The Decision-making Model for the Stock Market under Uncertainty The Decision-making Model for the Stock Market under Uncertainty
The Decision-making Model for the Stock Market under Uncertainty IJECEIAES
 
Demand time series analysis and forecasting
Demand time series analysis and forecastingDemand time series analysis and forecasting
Demand time series analysis and forecastingM Baddar
 
Better Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolBetter Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolLouis Cialdella
 
Migration-related video retrieval
Migration-related video retrievalMigration-related video retrieval
Migration-related video retrievalVasileiosMezaris
 
LECTURE 7.ppt.pdf
LECTURE 7.ppt.pdfLECTURE 7.ppt.pdf
LECTURE 7.ppt.pdfcikajen791
 
Statistical Databases
Statistical DatabasesStatistical Databases
Statistical Databasesssuseraef7e0
 
bitcoin prediction
bitcoin predictionbitcoin prediction
bitcoin predictionRACHANAB18
 
RatingBot: A Text Mining Based Rating Approach
RatingBot: A Text Mining Based Rating ApproachRatingBot: A Text Mining Based Rating Approach
RatingBot: A Text Mining Based Rating ApproachNauman Shahid
 
Twitter Sentiment Prediction.pptx
Twitter Sentiment Prediction.pptxTwitter Sentiment Prediction.pptx
Twitter Sentiment Prediction.pptxKrishnesh Pujari
 

Similaire à Kaggle Top1% Solution: Predicting Housing Prices in Moscow (20)

Days on Housing Market Capstone Presentation
Days on Housing Market Capstone PresentationDays on Housing Market Capstone Presentation
Days on Housing Market Capstone Presentation
 
Usa Retail Sales Analysis.pdf
Usa Retail Sales Analysis.pdfUsa Retail Sales Analysis.pdf
Usa Retail Sales Analysis.pdf
 
Introduction to Quantitative Factor Investing
Introduction to Quantitative Factor InvestingIntroduction to Quantitative Factor Investing
Introduction to Quantitative Factor Investing
 
State of AI/ML in Real Estate
State of AI/ML in Real EstateState of AI/ML in Real Estate
State of AI/ML in Real Estate
 
Qlik view demo project
Qlik view   demo projectQlik view   demo project
Qlik view demo project
 
Econometric Semiconductor Forecast
Econometric Semiconductor ForecastEconometric Semiconductor Forecast
Econometric Semiconductor Forecast
 
"Marketing Analytics and Applications": Course Introduction
"Marketing Analytics and Applications": Course Introduction"Marketing Analytics and Applications": Course Introduction
"Marketing Analytics and Applications": Course Introduction
 
Big Data Day LA 2016/ NoSQL track - Big Data and Real Estate, Jon Zifcak, CEO...
Big Data Day LA 2016/ NoSQL track - Big Data and Real Estate, Jon Zifcak, CEO...Big Data Day LA 2016/ NoSQL track - Big Data and Real Estate, Jon Zifcak, CEO...
Big Data Day LA 2016/ NoSQL track - Big Data and Real Estate, Jon Zifcak, CEO...
 
Stock market analysis using supervised machine learning
Stock market analysis using supervised machine learningStock market analysis using supervised machine learning
Stock market analysis using supervised machine learning
 
Data science guide
Data science guideData science guide
Data science guide
 
Life of a data engineer
Life of a data engineerLife of a data engineer
Life of a data engineer
 
The Decision-making Model for the Stock Market under Uncertainty
The Decision-making Model for the Stock Market under Uncertainty The Decision-making Model for the Stock Market under Uncertainty
The Decision-making Model for the Stock Market under Uncertainty
 
Demand time series analysis and forecasting
Demand time series analysis and forecastingDemand time series analysis and forecasting
Demand time series analysis and forecasting
 
Better Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolBetter Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product School
 
Migration-related video retrieval
Migration-related video retrievalMigration-related video retrieval
Migration-related video retrieval
 
LECTURE 7.ppt.pdf
LECTURE 7.ppt.pdfLECTURE 7.ppt.pdf
LECTURE 7.ppt.pdf
 
Statistical Databases
Statistical DatabasesStatistical Databases
Statistical Databases
 
bitcoin prediction
bitcoin predictionbitcoin prediction
bitcoin prediction
 
RatingBot: A Text Mining Based Rating Approach
RatingBot: A Text Mining Based Rating ApproachRatingBot: A Text Mining Based Rating Approach
RatingBot: A Text Mining Based Rating Approach
 
Twitter Sentiment Prediction.pptx
Twitter Sentiment Prediction.pptxTwitter Sentiment Prediction.pptx
Twitter Sentiment Prediction.pptx
 

Plus de Vivian S. Zhang

Career services workshop- Roger Ren
Career services workshop- Roger RenCareer services workshop- Roger Ren
Career services workshop- Roger RenVivian S. Zhang
 
Nycdsa wordpress guide book
Nycdsa wordpress guide bookNycdsa wordpress guide book
Nycdsa wordpress guide bookVivian S. Zhang
 
Nyc open-data-2015-andvanced-sklearn-expanded
Nyc open-data-2015-andvanced-sklearn-expandedNyc open-data-2015-andvanced-sklearn-expanded
Nyc open-data-2015-andvanced-sklearn-expandedVivian S. Zhang
 
Nycdsa ml conference slides march 2015
Nycdsa ml conference slides march 2015 Nycdsa ml conference slides march 2015
Nycdsa ml conference slides march 2015 Vivian S. Zhang
 
THE HACK ON JERSEY CITY CONDO PRICES explore trends in public data
THE HACK ON JERSEY CITY CONDO PRICES explore trends in public dataTHE HACK ON JERSEY CITY CONDO PRICES explore trends in public data
THE HACK ON JERSEY CITY CONDO PRICES explore trends in public dataVivian S. Zhang
 
Natural Language Processing(SupStat Inc)
Natural Language Processing(SupStat Inc)Natural Language Processing(SupStat Inc)
Natural Language Processing(SupStat Inc)Vivian S. Zhang
 
Data Science Academy Student Demo day--Moyi Dang, Visualizing global public c...
Data Science Academy Student Demo day--Moyi Dang, Visualizing global public c...Data Science Academy Student Demo day--Moyi Dang, Visualizing global public c...
Data Science Academy Student Demo day--Moyi Dang, Visualizing global public c...Vivian S. Zhang
 
Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc
Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nycData Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc
Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nycVivian S. Zhang
 
Data Science Academy Student Demo day--Chang Wang, dogs breeds in nyc
Data Science Academy Student Demo day--Chang Wang, dogs breeds in nycData Science Academy Student Demo day--Chang Wang, dogs breeds in nyc
Data Science Academy Student Demo day--Chang Wang, dogs breeds in nycVivian S. Zhang
 
Data Science Academy Student Demo day--Richard Sheng, kinvolved school attend...
Data Science Academy Student Demo day--Richard Sheng, kinvolved school attend...Data Science Academy Student Demo day--Richard Sheng, kinvolved school attend...
Data Science Academy Student Demo day--Richard Sheng, kinvolved school attend...Vivian S. Zhang
 
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...Vivian S. Zhang
 
Data Science Academy Student Demo day--Michael blecher,the importance of clea...
Data Science Academy Student Demo day--Michael blecher,the importance of clea...Data Science Academy Student Demo day--Michael blecher,the importance of clea...
Data Science Academy Student Demo day--Michael blecher,the importance of clea...Vivian S. Zhang
 
Data Science Academy Student Demo day--Shelby Ahern, An Exploration of Non-Mi...
Data Science Academy Student Demo day--Shelby Ahern, An Exploration of Non-Mi...Data Science Academy Student Demo day--Shelby Ahern, An Exploration of Non-Mi...
Data Science Academy Student Demo day--Shelby Ahern, An Exploration of Non-Mi...Vivian S. Zhang
 
R003 laila restaurant sanitation report(NYC Data Science Academy, Data Scienc...
R003 laila restaurant sanitation report(NYC Data Science Academy, Data Scienc...R003 laila restaurant sanitation report(NYC Data Science Academy, Data Scienc...
R003 laila restaurant sanitation report(NYC Data Science Academy, Data Scienc...Vivian S. Zhang
 
R003 jiten south park episode popularity analysis(NYC Data Science Academy, D...
R003 jiten south park episode popularity analysis(NYC Data Science Academy, D...R003 jiten south park episode popularity analysis(NYC Data Science Academy, D...
R003 jiten south park episode popularity analysis(NYC Data Science Academy, D...Vivian S. Zhang
 

Plus de Vivian S. Zhang (17)

Why NYC DSA.pdf
Why NYC DSA.pdfWhy NYC DSA.pdf
Why NYC DSA.pdf
 
Career services workshop- Roger Ren
Career services workshop- Roger RenCareer services workshop- Roger Ren
Career services workshop- Roger Ren
 
Nycdsa wordpress guide book
Nycdsa wordpress guide bookNycdsa wordpress guide book
Nycdsa wordpress guide book
 
Xgboost
XgboostXgboost
Xgboost
 
Nyc open-data-2015-andvanced-sklearn-expanded
Nyc open-data-2015-andvanced-sklearn-expandedNyc open-data-2015-andvanced-sklearn-expanded
Nyc open-data-2015-andvanced-sklearn-expanded
 
Nycdsa ml conference slides march 2015
Nycdsa ml conference slides march 2015 Nycdsa ml conference slides march 2015
Nycdsa ml conference slides march 2015
 
THE HACK ON JERSEY CITY CONDO PRICES explore trends in public data
THE HACK ON JERSEY CITY CONDO PRICES explore trends in public dataTHE HACK ON JERSEY CITY CONDO PRICES explore trends in public data
THE HACK ON JERSEY CITY CONDO PRICES explore trends in public data
 
Natural Language Processing(SupStat Inc)
Natural Language Processing(SupStat Inc)Natural Language Processing(SupStat Inc)
Natural Language Processing(SupStat Inc)
 
Data Science Academy Student Demo day--Moyi Dang, Visualizing global public c...
Data Science Academy Student Demo day--Moyi Dang, Visualizing global public c...Data Science Academy Student Demo day--Moyi Dang, Visualizing global public c...
Data Science Academy Student Demo day--Moyi Dang, Visualizing global public c...
 
Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc
Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nycData Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc
Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc
 
Data Science Academy Student Demo day--Chang Wang, dogs breeds in nyc
Data Science Academy Student Demo day--Chang Wang, dogs breeds in nycData Science Academy Student Demo day--Chang Wang, dogs breeds in nyc
Data Science Academy Student Demo day--Chang Wang, dogs breeds in nyc
 
Data Science Academy Student Demo day--Richard Sheng, kinvolved school attend...
Data Science Academy Student Demo day--Richard Sheng, kinvolved school attend...Data Science Academy Student Demo day--Richard Sheng, kinvolved school attend...
Data Science Academy Student Demo day--Richard Sheng, kinvolved school attend...
 
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...
 
Data Science Academy Student Demo day--Michael blecher,the importance of clea...
Data Science Academy Student Demo day--Michael blecher,the importance of clea...Data Science Academy Student Demo day--Michael blecher,the importance of clea...
Data Science Academy Student Demo day--Michael blecher,the importance of clea...
 
Data Science Academy Student Demo day--Shelby Ahern, An Exploration of Non-Mi...
Data Science Academy Student Demo day--Shelby Ahern, An Exploration of Non-Mi...Data Science Academy Student Demo day--Shelby Ahern, An Exploration of Non-Mi...
Data Science Academy Student Demo day--Shelby Ahern, An Exploration of Non-Mi...
 
R003 laila restaurant sanitation report(NYC Data Science Academy, Data Scienc...
R003 laila restaurant sanitation report(NYC Data Science Academy, Data Scienc...R003 laila restaurant sanitation report(NYC Data Science Academy, Data Scienc...
R003 laila restaurant sanitation report(NYC Data Science Academy, Data Scienc...
 
R003 jiten south park episode popularity analysis(NYC Data Science Academy, D...
R003 jiten south park episode popularity analysis(NYC Data Science Academy, D...R003 jiten south park episode popularity analysis(NYC Data Science Academy, D...
R003 jiten south park episode popularity analysis(NYC Data Science Academy, D...
 

Dernier

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Dernier (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Kaggle Top1% Solution: Predicting Housing Prices in Moscow

  • 1. Sberbank Russian Housing Market - Top 1% Solution Team Chase Edge, Grant Webb And Brandy Freitas
  • 3. Project Overview ● Kaggle Competition ● Predict housing prices in Moscow during July 2015 to May 2016 using data from August 2011 to June 2015 ● Data includes housing transaction information (e.g. square meter, number of rooms and build year), neighborhood details and macroeconomic information
  • 4. Machine Learning Checklist 1. Frame the Problem and Look at the Big Picture 2. Get The Data 3. Explore the Data 4. Prepare the Data 5. Short List Promising Models 6. Fine-Tune the System
  • 5. Look at the Big Picture
  • 6. ● Mitigate Risk ○ Help avoid overlending (issuing mortgages in excess of the value of the home) ● Valuation ○ Help banks value their portfolio ● Predictions ○ To give confidence to renters, developers and lenders when they sign a lease or purchase a building Value to Sberbank
  • 8. Data Overview Housing Data ● Number of Observations ○ Training - 30,471 ○ Test - 7,662 ● Number of Features - 290 Macro Data ● Number of Observations - 2,484 ● Number of Features - 100
  • 9. Data Overview - Most Important Features ● Initial Thoughts ○ Square Meters ○ Number of Rooms ○ Build Year ○ Location
  • 10. Data Overview - Most Important Features ● Initial Thoughts ○ Square Meters ○ Number of Rooms ○ Build Year ○ Location
  • 16. Missingness in Top Features
  • 19. Imputations - KNN by Neighborhood
  • 21. Feature Engineering Sq_diff = Full_sq - Kitch_sq Floor_ratio = Floor / Max_floor Life_to_full = Life_sq / Full_sq Kitch_to_full = Kitch_sq / Full_sq Room_size = Life_sq / Num_room Month = month of sale
  • 22. Decline in Russian Housing Prices
  • 23. Dealing with the Decline ● Problem ○ Housing prices declined in 2015 and 2016 ○ Model predicts values that are too high ● Solutions ○ Incorporate economic data ○ Make downward adjustments to predicted values ○ Adjust prices for fluctuations in the market based on a price index
  • 24. Price Index ● The Data ○ Russian government statistics on the monthly rental prices within Moscow ● The Index ○ 3 month rolling average of 3 bed, 2 bed and 1 bed rentals in Moscow ○ Indexed to the start of the training data (August 2011) ○ Averaged the 3 indices ● The Application ○ Adjust all prices in the training data for changes in the index (divide by index) ○ Model predicts prices as if they occurred in August 2011 ○ Adjust the predicted values for the index (multiply by index)
  • 26. Price Index - Example ● Training Transaction ○ Date - April 2013 ○ Price - RUB 5,693,972 ○ Index - 1.13 ○ Adjusted Price - RUB 5,038,913 ● Predicted Value ○ Date - April 2016 ○ Price - RUB 3,902,007 ○ Index - 1.06 ○ Adjusted Price - RUB 4,136,127
  • 28. KISS - OLS with 2 Variables Features = full_sq, sub_area Response = log_price
  • 29. KISS - OLS with 7 Variables Features = full_sq, floor, max_floor, life_to_full, kitch_to_life, product_type, sub_area Response = log_price
  • 31. Unrestrained Decision Trees are known to overfit as shown: _ Decision Tree
  • 32. Number of Trees MaxFeatures 8 hrs to run with 3 cores processing! Random Forest
  • 33. XGBoost ● Log(Scaled Price / Meter Sq) as Target ● Fast - Training time 1 min 43 sec ● Best Results ● Less Interpretable
  • 35. Next Steps ● Model Ensembling ● Feature Engineering ● Time Series Analysis on Pricing Index ● Cluster Analysis on Neighborhoods