SlideShare a Scribd company logo
1 of 24
VISVESVARAYA TECHNOLOGICAL UNIVERSITY
BELAGAVI, KARNATAKA
SEMINAR (17ECS86)
Report on
“HOUSE PRICE PREDICTION USING MACHINE
LEARNING”
Submitted in the partial fulfillment for the award of bachelor’s degree in
ELECTRONICS AND COMMUNICATION ENGINEERING
Submitted by
STUDENT NAME USN
SHAHBAZ KHAN 1GA16EC136
Under the guidance of
Dr. MANJUNATH R C,
Asst. Professor,
Dept. of ECE, GAT
Department of Electronics and Communication Engineering
GLOBAL ACADEMY OF TECHNOLOGY
Bengaluru – 560 098
GLOBAL ACADEMY OF TECHNOLOGY
Rajarajeshwari Nagar, Bengaluru – 560 098
NACC - ‘A’ Grade and NBA Accredited
DEPARTMENT OF ELECTRONICS AND COMMUNICATION
ENGINEERING
CERTIFICATE
Certified that the Seminar (17ECS86) entitled
“HOUSE PRICE PREDICTION USING MACHINE
LEARNING”
Carried out by,
STUDENT NAME USN
SHREYAS K M 1GA16EC36
Bonafede students of Global Academy of Technology in partial fulfillment for the award of
Bachelor of Engineering in Electronics and Communication Engineering of the Visvesvaraya
Technological University, Belagavi during the academic year 2021-
22. It is certified that all corrections/suggestions indicated for Internal Assessment have been
incorporated in the Report deposited in the Departmental library. The TECHNICAL SEMIN A R
report has been approved as it satisfies the academic requirements in respect of Technical
Seminar prescribed for the said Degree.
Signature of the Guide Signature of the HOD
[Dr. Manjunath R C] [Dr. Manjunatha Reddy H S]
Abstract
House price forecasting is an important topic of real estate. The literature attempts to derive
useful knowledge from historical data of property markets. Machine learning techniques
are applied to analyze historical property transactions in India to discover useful models
for house buyers and sellers. Revealed is the high discrepancy between house prices in the
most expensive and most affordable suburbs in the city of Mumbai. Moreover, experiments
demonstrate that the Multiple Linear Regression that is based on mean squared error
measurement is a competitive approach.
INTRODUCTION
AIM and IMPORTANCE
Aim
These are the Parameters on which we will evaluate ourselves-
• Create an effective price prediction model
• Validate the model’s prediction accuracy
• Identify the important home price attributes which feed the model’s predictive power.
Need and Motivation
Having lived in India for so many years if there is one thing that I had been taking for granted,
it’s that housing and rental prices continue to rise. Since the housing crisis of 2008, housing
prices have recovered remarkably well, especially in major housing markets. However, in the
4th quarter of 2016, I was surprised to read that Bombay housing prices had fallen the most in
the last 4 years. In fact, median resale prices for condos and coops fell 6.3%, marking the first
time there was a decline since Q1 of 2017. The decline has been partly attributed to political
uncertainty domestically and abroad and the 2014 election. So, to maintain the transparency
among customers and also the comparison can be made easy through this model. If customer
finds the price of house at some given website higher than the price predicted by the model, so
he can reject that house.
DATASET
Here we have web scrapped the Data from 99acres.com website which is one of the leading
real estate websites operating in INDIA.
Our Data contains Bombay Houses only.
Dataset looks as follows-
Data Exploration
Data exploration is the first step in data analysis and typically involves summarizing the main
characteristics of a data set, including its size, accuracy, initial patterns in the data and other
attributes. It is commonly conducted by data analysts using visual analytics tools, but it can
also be done in more advanced statistical software, Python. Before it can conduct analysis on
data collected by multiple data sources and stored in data warehouses, an organization must
know how many cases are in a data set, what variables are included, how many missing
values there are and what general hypotheses the data is likely to support. An initial exploration
of the data set can help answer these questions by familiarizing analysts with the data with
which they are working.
We divided the data 9:1 for Training and Testing purpose respectively.
Data Visualization
Data visualization is the graphical representation of information and data. By using
visual elements like charts, graphs, and maps, data visualization tools provide an
accessible way to see and understand trends, outliers, and patterns in data. In the world
of Big Data, data visualization tools and technologies are essential to analyse massive
amounts of information and make data-driven decisions.
Data Selection
Data selectionis defined as the process of determining the appropriate data type
and source, as well as suitable instruments to collect data. Data selection precedes the
actual practice of data collection. This definition distinguishes data selection from
selective data reporting (selectively excluding data that is not supportive of a research
hypothesis) and interactive/active data selection (using collected data for monitoring
activities/events, or conducting secondary data analyses). The process of selecting
suitable data for a research project can impact data integrity.
The primary objective of data selection is the determination of appropriate data type,
source, and instrument(s) that allow investigators to adequately answer research
questions. This determination is often discipline-specific and is primarily driven by
the nature of the investigation, existing literature, and accessibility to necessary data
sources.
Correlation Heatmap
Data Transformation
The log transformation can be used to make highly skewed distributions less skewed. This
can be valuable both for making patterns in the data more interpretable and for helping to meet
the assumptions of inferential statistics.
It is hard to discern a pattern in the upper panel whereas the strong relationship is shown clearly
in the lower panel. The comparison of the means of log-transformed data is actually a
comparison of geometric means. This occurs because, as shown below, the anti-log ofthe
arithmetic mean of log-transformed values is the geometric mean.
Skewed Price
Normal Price
Skewed Area
Normal Area
Skewed Price/Sq.
Normal Price/Sq.
LANGUAGE AND MODELS USED
Python
Libraries Used for this Project include –
 Pandas
 NumPy
 Matplotlib
 Seaborn
 ScikitLearn
 XG Boost
Python is widely used in scientific and numeric computing:
 SciPy is a collection of packages for mathematics, science, and engineering.
 Pandas is a data analysis and modelling library.
 IPython is a powerful interactive shell that features easy editing and recording of a work
session, and supports visualizations and parallel computing.
 The Software Carpentry Course teaches basic skills for scientific computing, running
bootcamps and providing open-access teaching materials.
MODELS USED
Regression Model
• Linear Regression is a machine learning algorithm based on supervised learning.
• It performs a regression task. Regression models a target prediction value based on
independent variables.
• It is mostly used for finding out the relationship between variables and forecasting.
Real Vs Predicted
Random Forest Regression Model
• A Random Forest is an ensemble technique capable of performing both regression and
classification tasks with the use of multiple decision trees and a technique called
Bootstrap Aggregation, commonly known as bagging.
• Bagging, in the Random Forest method, involves training each decision tree on a
different data sample where sampling is done with replacement.
• The basic idea behind this is to combine multiple decision trees in determining the final
output rather than relying on individual decision trees.
Real Vs Predicted
XG Boost Regressor Model
• XG Boost stands for eXtreme Gradient Boosting.
• The XG Boost library implements the gradient boosting decision tree algorithm.
• Boosting is an ensemble technique where new models are added to correct the errors
made by existing models.
• Models are added sequentially until no further improvements can be made.
Real Vs Predicted
RESULTS AND DISCUSSIONS
Best Suited Model
So, our study showed that……..
Linear Regression displayed the best performance for this Dataset and can be used for
deploying purposes.
Random Forest Regressor and XGBoost Regressor are far behind, so can’t be
recommended for further deployment purposes.
RMSE Bar Graph
Deployment App
The Model is deployed through Python Web App Flask in collaboration with HTML and
CSS.
Conclusion
So, our Aim is achieved as we have successfully ticked all our parameters as mentioned in
our Aim Column. It is seen that circle rate is the most effective attribute in predicting the house
price and that the Linear Regression is the most effective model for our Dataset with RMSE
score of 0.5025658262899986.
APPENDIX – 4 (A typical Sample of List of Tables)
LIST OF TABLES
TABLE TITLE PAGE
NUMBER
4.1 Raw Bombay House Dataset vi
4.2 Transformed Dataset House ix
APPENDIX – 5 (A typical Sample of List of Figures)
LIST OF FIGURES
FIGURE TITLE PAGE
NUMBER
4.1 Bar Plots of various Parameters viii
4.2 Correlation Heatmap x
4.3 Skewed Plot xi-xiv
4.4 Real Vs Predicted Plots xvii-xix
4.5 Comparison Bar Plot xxiii
4.6 Website View xxv
APPENDIX – 6: (A typical Sample of List of Symbols and Abbreviations)
ABBREVIATIONS
SqM Square meter
SqFt Square Feet
LR Linear Regression
XGB Xtreme Gradient Boost
rmse Root Mean Square Error
Log Logarithmic

More Related Content

Similar to SHAHBAZ_TECHNICAL_SEMINAR.docx

Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation TechniquesReview on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniquesijtsrd
 
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...IOSR Journals
 
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...IJEACS
 
House Price Prediction for Ai & ml project .pptx
House Price Prediction for Ai & ml project .pptxHouse Price Prediction for Ai & ml project .pptx
House Price Prediction for Ai & ml project .pptxaditichauhan2019404
 
Python for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive GuidePython for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive GuideAivada
 
Final Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal...
Final Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal...Final Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal...
Final Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal...Chaudhry Hussain
 
CS-422 THESIS (1).pptx
CS-422 THESIS (1).pptxCS-422 THESIS (1).pptx
CS-422 THESIS (1).pptxpoojagupta010
 
Survey on Software Data Reduction Techniques Accomplishing Bug Triage
Survey on Software Data Reduction Techniques Accomplishing Bug TriageSurvey on Software Data Reduction Techniques Accomplishing Bug Triage
Survey on Software Data Reduction Techniques Accomplishing Bug TriageIRJET Journal
 
churn_detection.pptx
churn_detection.pptxchurn_detection.pptx
churn_detection.pptxDhanuDhanu49
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...IRJET Journal
 
76201910
7620191076201910
76201910IJRAT
 
Providing highly accurate service recommendation for semantic clustering over...
Providing highly accurate service recommendation for semantic clustering over...Providing highly accurate service recommendation for semantic clustering over...
Providing highly accurate service recommendation for semantic clustering over...IRJET Journal
 
IRJET - House Price Prediction using Machine Learning and RPA
 IRJET - House Price Prediction using Machine Learning and RPA IRJET - House Price Prediction using Machine Learning and RPA
IRJET - House Price Prediction using Machine Learning and RPAIRJET Journal
 

Similar to SHAHBAZ_TECHNICAL_SEMINAR.docx (20)

Predictive modeling
Predictive modelingPredictive modeling
Predictive modeling
 
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation TechniquesReview on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
 
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
 
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
 
House Price Prediction for Ai & ml project .pptx
House Price Prediction for Ai & ml project .pptxHouse Price Prediction for Ai & ml project .pptx
House Price Prediction for Ai & ml project .pptx
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 
Predicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning ApproachPredicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning Approach
 
House price prediction
House price predictionHouse price prediction
House price prediction
 
Python for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive GuidePython for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive Guide
 
Final Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal...
Final Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal...Final Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal...
Final Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal...
 
CS-422 THESIS (1).pptx
CS-422 THESIS (1).pptxCS-422 THESIS (1).pptx
CS-422 THESIS (1).pptx
 
Data Science and Analysis.pptx
Data Science and Analysis.pptxData Science and Analysis.pptx
Data Science and Analysis.pptx
 
ONE HIDDEN LAYER ANFIS MODEL FOR OOS DEVELOPMENT EFFORT ESTIMATION
ONE HIDDEN LAYER ANFIS MODEL FOR OOS DEVELOPMENT EFFORT ESTIMATIONONE HIDDEN LAYER ANFIS MODEL FOR OOS DEVELOPMENT EFFORT ESTIMATION
ONE HIDDEN LAYER ANFIS MODEL FOR OOS DEVELOPMENT EFFORT ESTIMATION
 
Survey on Software Data Reduction Techniques Accomplishing Bug Triage
Survey on Software Data Reduction Techniques Accomplishing Bug TriageSurvey on Software Data Reduction Techniques Accomplishing Bug Triage
Survey on Software Data Reduction Techniques Accomplishing Bug Triage
 
churn_detection.pptx
churn_detection.pptxchurn_detection.pptx
churn_detection.pptx
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
 
76201910
7620191076201910
76201910
 
Providing highly accurate service recommendation for semantic clustering over...
Providing highly accurate service recommendation for semantic clustering over...Providing highly accurate service recommendation for semantic clustering over...
Providing highly accurate service recommendation for semantic clustering over...
 
IRJET - House Price Prediction using Machine Learning and RPA
 IRJET - House Price Prediction using Machine Learning and RPA IRJET - House Price Prediction using Machine Learning and RPA
IRJET - House Price Prediction using Machine Learning and RPA
 
Data analysis
Data analysisData analysis
Data analysis
 

Recently uploaded

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 

Recently uploaded (20)

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 

SHAHBAZ_TECHNICAL_SEMINAR.docx

  • 1. VISVESVARAYA TECHNOLOGICAL UNIVERSITY BELAGAVI, KARNATAKA SEMINAR (17ECS86) Report on “HOUSE PRICE PREDICTION USING MACHINE LEARNING” Submitted in the partial fulfillment for the award of bachelor’s degree in ELECTRONICS AND COMMUNICATION ENGINEERING Submitted by STUDENT NAME USN SHAHBAZ KHAN 1GA16EC136 Under the guidance of Dr. MANJUNATH R C, Asst. Professor, Dept. of ECE, GAT Department of Electronics and Communication Engineering GLOBAL ACADEMY OF TECHNOLOGY Bengaluru – 560 098
  • 2. GLOBAL ACADEMY OF TECHNOLOGY Rajarajeshwari Nagar, Bengaluru – 560 098 NACC - ‘A’ Grade and NBA Accredited DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING CERTIFICATE Certified that the Seminar (17ECS86) entitled “HOUSE PRICE PREDICTION USING MACHINE LEARNING” Carried out by, STUDENT NAME USN SHREYAS K M 1GA16EC36 Bonafede students of Global Academy of Technology in partial fulfillment for the award of Bachelor of Engineering in Electronics and Communication Engineering of the Visvesvaraya Technological University, Belagavi during the academic year 2021- 22. It is certified that all corrections/suggestions indicated for Internal Assessment have been incorporated in the Report deposited in the Departmental library. The TECHNICAL SEMIN A R report has been approved as it satisfies the academic requirements in respect of Technical Seminar prescribed for the said Degree. Signature of the Guide Signature of the HOD [Dr. Manjunath R C] [Dr. Manjunatha Reddy H S]
  • 3. Abstract House price forecasting is an important topic of real estate. The literature attempts to derive useful knowledge from historical data of property markets. Machine learning techniques are applied to analyze historical property transactions in India to discover useful models for house buyers and sellers. Revealed is the high discrepancy between house prices in the most expensive and most affordable suburbs in the city of Mumbai. Moreover, experiments demonstrate that the Multiple Linear Regression that is based on mean squared error measurement is a competitive approach.
  • 4. INTRODUCTION AIM and IMPORTANCE Aim These are the Parameters on which we will evaluate ourselves- • Create an effective price prediction model • Validate the model’s prediction accuracy • Identify the important home price attributes which feed the model’s predictive power.
  • 5. Need and Motivation Having lived in India for so many years if there is one thing that I had been taking for granted, it’s that housing and rental prices continue to rise. Since the housing crisis of 2008, housing prices have recovered remarkably well, especially in major housing markets. However, in the 4th quarter of 2016, I was surprised to read that Bombay housing prices had fallen the most in the last 4 years. In fact, median resale prices for condos and coops fell 6.3%, marking the first time there was a decline since Q1 of 2017. The decline has been partly attributed to political uncertainty domestically and abroad and the 2014 election. So, to maintain the transparency among customers and also the comparison can be made easy through this model. If customer finds the price of house at some given website higher than the price predicted by the model, so he can reject that house.
  • 6. DATASET Here we have web scrapped the Data from 99acres.com website which is one of the leading real estate websites operating in INDIA. Our Data contains Bombay Houses only. Dataset looks as follows-
  • 7. Data Exploration Data exploration is the first step in data analysis and typically involves summarizing the main characteristics of a data set, including its size, accuracy, initial patterns in the data and other attributes. It is commonly conducted by data analysts using visual analytics tools, but it can also be done in more advanced statistical software, Python. Before it can conduct analysis on data collected by multiple data sources and stored in data warehouses, an organization must know how many cases are in a data set, what variables are included, how many missing values there are and what general hypotheses the data is likely to support. An initial exploration of the data set can help answer these questions by familiarizing analysts with the data with which they are working. We divided the data 9:1 for Training and Testing purpose respectively.
  • 8. Data Visualization Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data. In the world of Big Data, data visualization tools and technologies are essential to analyse massive amounts of information and make data-driven decisions.
  • 9. Data Selection Data selectionis defined as the process of determining the appropriate data type and source, as well as suitable instruments to collect data. Data selection precedes the actual practice of data collection. This definition distinguishes data selection from selective data reporting (selectively excluding data that is not supportive of a research hypothesis) and interactive/active data selection (using collected data for monitoring activities/events, or conducting secondary data analyses). The process of selecting suitable data for a research project can impact data integrity. The primary objective of data selection is the determination of appropriate data type, source, and instrument(s) that allow investigators to adequately answer research questions. This determination is often discipline-specific and is primarily driven by the nature of the investigation, existing literature, and accessibility to necessary data sources.
  • 11. Data Transformation The log transformation can be used to make highly skewed distributions less skewed. This can be valuable both for making patterns in the data more interpretable and for helping to meet the assumptions of inferential statistics. It is hard to discern a pattern in the upper panel whereas the strong relationship is shown clearly in the lower panel. The comparison of the means of log-transformed data is actually a comparison of geometric means. This occurs because, as shown below, the anti-log ofthe arithmetic mean of log-transformed values is the geometric mean.
  • 15. Python Libraries Used for this Project include –  Pandas  NumPy  Matplotlib  Seaborn  ScikitLearn  XG Boost Python is widely used in scientific and numeric computing:  SciPy is a collection of packages for mathematics, science, and engineering.  Pandas is a data analysis and modelling library.  IPython is a powerful interactive shell that features easy editing and recording of a work session, and supports visualizations and parallel computing.  The Software Carpentry Course teaches basic skills for scientific computing, running bootcamps and providing open-access teaching materials.
  • 16. MODELS USED Regression Model • Linear Regression is a machine learning algorithm based on supervised learning. • It performs a regression task. Regression models a target prediction value based on independent variables. • It is mostly used for finding out the relationship between variables and forecasting. Real Vs Predicted
  • 17. Random Forest Regression Model • A Random Forest is an ensemble technique capable of performing both regression and classification tasks with the use of multiple decision trees and a technique called Bootstrap Aggregation, commonly known as bagging. • Bagging, in the Random Forest method, involves training each decision tree on a different data sample where sampling is done with replacement. • The basic idea behind this is to combine multiple decision trees in determining the final output rather than relying on individual decision trees. Real Vs Predicted
  • 18. XG Boost Regressor Model • XG Boost stands for eXtreme Gradient Boosting. • The XG Boost library implements the gradient boosting decision tree algorithm. • Boosting is an ensemble technique where new models are added to correct the errors made by existing models. • Models are added sequentially until no further improvements can be made. Real Vs Predicted
  • 19. RESULTS AND DISCUSSIONS Best Suited Model So, our study showed that…….. Linear Regression displayed the best performance for this Dataset and can be used for deploying purposes. Random Forest Regressor and XGBoost Regressor are far behind, so can’t be recommended for further deployment purposes. RMSE Bar Graph
  • 20. Deployment App The Model is deployed through Python Web App Flask in collaboration with HTML and CSS.
  • 21. Conclusion So, our Aim is achieved as we have successfully ticked all our parameters as mentioned in our Aim Column. It is seen that circle rate is the most effective attribute in predicting the house price and that the Linear Regression is the most effective model for our Dataset with RMSE score of 0.5025658262899986.
  • 22. APPENDIX – 4 (A typical Sample of List of Tables) LIST OF TABLES TABLE TITLE PAGE NUMBER 4.1 Raw Bombay House Dataset vi 4.2 Transformed Dataset House ix
  • 23. APPENDIX – 5 (A typical Sample of List of Figures) LIST OF FIGURES FIGURE TITLE PAGE NUMBER 4.1 Bar Plots of various Parameters viii 4.2 Correlation Heatmap x 4.3 Skewed Plot xi-xiv 4.4 Real Vs Predicted Plots xvii-xix 4.5 Comparison Bar Plot xxiii 4.6 Website View xxv
  • 24. APPENDIX – 6: (A typical Sample of List of Symbols and Abbreviations) ABBREVIATIONS SqM Square meter SqFt Square Feet LR Linear Regression XGB Xtreme Gradient Boost rmse Root Mean Square Error Log Logarithmic