SlideShare une entreprise Scribd logo
1  sur  37
HOUSE PRICE
PREDICTION USING
DEEP LEARNING
By Sabah Begum
1
ABSTRACT
Real estate is the least transparent industry in our ecosystem.
Housing prices keep changing day in and day out and
sometimes are hyped rather than being based on valuation.
Predicting housing prices with real factors is the main crux of
our research project. Here we aim to make our evaluations
based on every basic parameter that is considered while
determining the price. We use various regression techniques in
this pathway, and our results are not sole determination of one
technique rather it is the weighted mean of various techniques
to give most accurate results.
2
EXISTING SYSTEM
Data is at the heart of technical innovations, achieving any
result is now possible using predictive models. Machine
learning is extensively used in this approach. Although
development of correct and accurate model is needed.
Previous algorithms like SVM are not suitable for complex real-
world data.
3
PROPOSED SYSTEM
 Our main focus here is to develop a model which predicts the
property cost based on vast datasets using data mining
approach.
 The data mining process in a real estate industry provides an
advantage to the developers by processing the data,
forecasting future trends and thus assisting them to make
favourable knowledge-driven decisions.
 Our dataset comprises of various essential parameters and
data mining has been at the root of our system.
 We will initially clean up our entire dataset and also truncate
the outlier values.
 Finally a system will be made to predict the Real Estate
Prices using Deep Learning Approach.
4
FEASIBILITY STUDY
Basic areas to be covered are –
 Operational Feasibility: The applying can scale back the time
consumed to take care of manual records and isn’t dull and
cumbersome to take care of the records, therefore
operational practicableness is assured.
 Technical Feasibility: Minimum hardware requirements: 1.66
GHz Pentium Processor or Intel compatible processor. 1 GB
RAM net property, eighty MB disk area.
5
 Economical Feasibility :Once the hardware and software
package needs get consummated, there’s no want for the
user of our system to pay for any further overhead. For the
user, the applying are economically possible within the
following aspects: the applying can scale back tons of labor
work. therefore the efforts are reduced. Our application can
scale back the time that’s wasted in manual processes. The
storage and handling issues of the registers are resolved .
6
SYSTEM SPECIFICATIONS
Since this is an AI based Deep Learning approach, we are
going to use a dataset extracting the features of houses and
algorithmic models to train the dataset for predictions.
 Dataset- We are using an unsupervised dataset,i.e, data
without class labels. The objective of this unsupervised
technique, is to find patterns in data based on the
relationship between data points themselves.
7
8
• The dataset consists of more than 20,000 records.
• Around nine features are provided in the dataset like latitude,
total rooms, total bedrooms etc.
• Data mining technique is used for processing of the dataset
to extract relevant data to be further used in data models.
 Data model-The basic technique used for the model is
Regression Analysis which is a statistical method to model
the relationship between a dependent (target) and
independent (predictor) variables with one or more
independent variables. It predicts continuous/real values.
9
• For our application we are using three types of algorithms to
train the model, which are-
• KN-Neighbors: K nearest neighbors is a simple algorithm
that stores all available cases and predict the numerical
target based on a similarity measure (e.g., distance
functions)
• Support Vector Machine: The goal of the SVM algorithm is
to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can
easily put the new data point in the correct category in the
future.
• Artificial Neural Network(ANN): It’s a computational model
in which artificial neurons are nodes, and directed edges with
weights are connections between neuron outputs and neuron
inputs. For our project we will be using TensorFlow library to
integrate ANN in code.
10
HARDWARE AND SOFTWARE
REQUIREMENTS
SOFTWARE REQUIREMENTS-
 Python version- 3.8
 Libraries-
Pandas: library for manipulating data in numerical tables.
scikit-learn: machine learning library that features various
classification, regression and clustering algorithms.
Matplotlib: Matplotlib is a plotting library for creating static,
animated, and interactive visualizations in python.
11
Seaborn: library based on matplotlib. It provides a high-level
interface for drawing attractive and informative statistical
graphics
Tensorflow: The core open source library to help you develop
and train ML models.
Jupyterlab: JupyterLab is a web-based interactive
development environment for Jupyter notebooks, code, and
data.
12
HARDWARE REQUIREMENTS-
1.66 GHz Pentium Processor or Intel compatible processor.
1 GB RAM
 80 MB disk area.
13
DESIGN AND DEVELOPMENT
Since this project mainly integrates software features rather
than hardware ones, the emphasis is primarily on code
development as follows-
OVERVIEW OF STEPS
Step 1
 Import required libraries
 Import data from dataset
 Get some information about dataset.
14
15
Step 2-
 Visualizing the data in terms of house prices, affordability,
closeness to ocean, number of bedrooms etc.
 First we display a histogram for column
‘median_house_value’ from the dataset
 Next we display a Heatmap of the dataset. A heatmap()
method contains values representing various shades of the
same colour for each value to be plotted. Usually the darker
shades of the chart represent higher values than the lighter
shade.
 Next we display a Distplot() for columns
‘median_house_value’, ‘affordable’, and ‘ocean_proximity’. A
distplot() method lets you show a histogram with a line on it.
It plots a univariate distribution of observations.
16
17
18
 Next we display a Countplot() for the column
‘ocean_proximity’. A countplot() method is used to show the
counts of observations in each categorical bin using bars.
 Next we concatenate the values of dataset with values of
‘ocean_proximity’ column according to the different levels of
distance between the ocean and housing area.
 Next we view the countplot() of ‘affordable’ column, convert it
into string and replace string type “true, false” values by int
values 0,1 to represent if the house is affordable or not.
19
20
21
 Next we use the info() function on the dataset which basically
shows the datatypes used for different columns.
22
Step 3-
 Detecting and removing outliers using ‘IsolationForest’
algorithm.
 Isolation forest is an unsupervised learning algorithm for
anomaly detection that works on the principle of isolating
anomalies. It isolates the outliers by randomly selecting a
feature from the given set of features and then randomly
selecting a split value between the maximum and minimum
values of the selected feature.
 In our dataset after running the algorithm we get around 7170
outliers in the form of negative values(-1), which we remove
from the data, i.e., we delete 7170 rows from around 20,000
rows in the dataset.
23
24
Step 4-
 Splitting into training and testing data.
 We use the Train set to make the algorithm learn the data’s
behavior and then check the accuracy of our model on
the Test set.
 Divide the data into two values-
Features (X): The columns that are inserted into our model
will be used to make predictions.
Prediction (y): Target variable that will be predicted by the
features.
 After defining values of x, y we import train and test functions
from scikit learn library.
25
TRAINING AND TESTING
To test the accuracy of the model we use a metric
mean_absolute_percentage_error which we import from scikit
library. The algorithms used for training and testing the model
are as follows-
 KNN- First we import the KNeighborsRegressor() function
from scikit library and then run the algorithm on training set
and then on test set.
 We obtain the mean absolute percentage error of knn as
38.91504096125152 %
26
27
 SVM- Similarly we import SVR() from scikit library.
 Then we run the svm algorithm on the training data and then
the test data.
 The mean absolute percentage error of svm obtained is
32.246431895552135 %, which is slightly less than that of
knn algorithm.
 ANN- We use tensor flow library to build an ANN here for the
application.
 We also use the Keras library here which is an open source
deep learning framework for python. Keras is one of the most
powerful and easy to use python library, which is built on top
of popular deep learning libraries like TensorFlow, Theano,
etc., for creating deep learning models.
28
 To build the different layers of ANN we use ‘Flatten’ and
‘Dense’ function of keras framework.
 Dense layer is the regular deeply connected neural network
layer. It is most common and frequently used layer.
 Flatten layer is used to flatten the input.
 Then after building the layers of the model we need to
compile the model so that the code is converted into machine
readable code.
 We can view the summary of the model using
model.summary() function.
29
30
31
 Next we train the model on the training set and then test the
model using the test set.
 After testing, the mean absolute percentage error obtained is
18.315523022396036 %.
32
 As we notice , among the three models we find the ANN
model to be the most accurate on the dataset.
 Next we visualize the output of the three models graphically,
where the original output is represented in blue colour, the
svm output is represented in green colour and the ANN
output is represented in red colour.
33
IMPLEMENTATION AND OUTPUT
 To implement and check the accurate output of the model,
we first declare variables for each attribute of the dataset and
provide user defined values to the variables.
 These values are fed as input to the model which predicts the
final output, i.e, the price of the house based on the given
input.
 The final output or prediction is as follows-
34
35
CONCLUSION
From the training and testing of dataset on the model, a strong
deduction can be made that the model ANN works most
effectively on the data producing lower levels of errors, and
hence we can conclude that Deep Learning techniques prove
useful in the implementation and estimation of HOUSE PRICE
PREDICTION.
36
THANK YOU
37

Contenu connexe

Tendances

Prediction of house price using multiple regression
Prediction of house price using multiple regressionPrediction of house price using multiple regression
Prediction of house price using multiple regressionvinovk
 
House Price Prediction.pptx
House Price Prediction.pptxHouse Price Prediction.pptx
House Price Prediction.pptxCodingWorld5
 
IRJET- House Rent Price Prediction
IRJET- House Rent Price PredictionIRJET- House Rent Price Prediction
IRJET- House Rent Price PredictionIRJET Journal
 
Predicting house price
Predicting house pricePredicting house price
Predicting house priceDivya Tiwari
 
Data analytics with python introductory
Data analytics with python introductoryData analytics with python introductory
Data analytics with python introductoryAbhimanyu Dwivedi
 
Stock Price Prediction PPT
Stock Price Prediction  PPTStock Price Prediction  PPT
Stock Price Prediction PPTPrashantGanji4
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningLeo Salemann
 
Stock Market Prediction using Machine Learning
Stock Market Prediction using Machine LearningStock Market Prediction using Machine Learning
Stock Market Prediction using Machine LearningAravind Balaji
 
Machine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsMachine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsLarry Smarr
 
Project on disease prediction
Project on disease predictionProject on disease prediction
Project on disease predictionKOYELMAJUMDAR1
 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
housepriceprediction-ml.pptx
housepriceprediction-ml.pptxhousepriceprediction-ml.pptx
housepriceprediction-ml.pptxtommychauhan
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted treesNihar Ranjan
 
Machine Learning for Disease Prediction
Machine Learning for Disease PredictionMachine Learning for Disease Prediction
Machine Learning for Disease PredictionMustafa Oğuz
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & UnderfittingSOUMIT KAR
 

Tendances (20)

Prediction of house price using multiple regression
Prediction of house price using multiple regressionPrediction of house price using multiple regression
Prediction of house price using multiple regression
 
House Price Prediction.pptx
House Price Prediction.pptxHouse Price Prediction.pptx
House Price Prediction.pptx
 
IRJET- House Rent Price Prediction
IRJET- House Rent Price PredictionIRJET- House Rent Price Prediction
IRJET- House Rent Price Prediction
 
Housing price prediction
Housing price predictionHousing price prediction
Housing price prediction
 
Predicting house price
Predicting house pricePredicting house price
Predicting house price
 
Data analytics with python introductory
Data analytics with python introductoryData analytics with python introductory
Data analytics with python introductory
 
Stock Price Prediction PPT
Stock Price Prediction  PPTStock Price Prediction  PPT
Stock Price Prediction PPT
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Stock Market Prediction using Machine Learning
Stock Market Prediction using Machine LearningStock Market Prediction using Machine Learning
Stock Market Prediction using Machine Learning
 
Machine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsMachine Learning in Healthcare Diagnostics
Machine Learning in Healthcare Diagnostics
 
Project on disease prediction
Project on disease predictionProject on disease prediction
Project on disease prediction
 
Linear regression
Linear regressionLinear regression
Linear regression
 
housepriceprediction-ml.pptx
housepriceprediction-ml.pptxhousepriceprediction-ml.pptx
housepriceprediction-ml.pptx
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted trees
 
Shap
ShapShap
Shap
 
Machine Learning for Disease Prediction
Machine Learning for Disease PredictionMachine Learning for Disease Prediction
Machine Learning for Disease Prediction
 
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
Disease Prediction by Machine Learning Over Big Data From Healthcare CommunitiesDisease Prediction by Machine Learning Over Big Data From Healthcare Communities
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 

Similaire à House price prediction

AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONIRJET Journal
 
Build a simple image recognition system with tensor flow
Build a simple image recognition system with tensor flowBuild a simple image recognition system with tensor flow
Build a simple image recognition system with tensor flowDebasisMohanty37
 
IRJET - Hand Gesture Recognition to Perform System Operations
IRJET -  	  Hand Gesture Recognition to Perform System OperationsIRJET -  	  Hand Gesture Recognition to Perform System Operations
IRJET - Hand Gesture Recognition to Perform System OperationsIRJET Journal
 
AMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTAMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTIRJET Journal
 
DLT UNIT-3.docx
DLT  UNIT-3.docxDLT  UNIT-3.docx
DLT UNIT-3.docx0567Padma
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple stepsRenjith M P
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Fatimakhan325
 
Handwritten Digit Recognition using Convolutional Neural Networks
Handwritten Digit Recognition using Convolutional Neural  NetworksHandwritten Digit Recognition using Convolutional Neural  Networks
Handwritten Digit Recognition using Convolutional Neural NetworksIRJET Journal
 
Rachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_reportRachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_reportRachit Mishra
 
Real Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine LearningReal Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine LearningIRJET Journal
 
Machine learning Experiments report
Machine learning Experiments report Machine learning Experiments report
Machine learning Experiments report AlmkdadAli
 
Text Recognition using Convolutional Neural Network: A Review
Text Recognition using Convolutional Neural Network: A ReviewText Recognition using Convolutional Neural Network: A Review
Text Recognition using Convolutional Neural Network: A ReviewIRJET Journal
 
Hand Written Digit Classification
Hand Written Digit ClassificationHand Written Digit Classification
Hand Written Digit Classificationijtsrd
 
Bangla Handwritten Digit Recognition Report.pdf
Bangla Handwritten Digit Recognition  Report.pdfBangla Handwritten Digit Recognition  Report.pdf
Bangla Handwritten Digit Recognition Report.pdfKhondokerAbuNaim
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
 
Blood Cell Image Classification for Detecting Malaria using CNN
Blood Cell Image Classification for Detecting Malaria using CNNBlood Cell Image Classification for Detecting Malaria using CNN
Blood Cell Image Classification for Detecting Malaria using CNNIRJET Journal
 
Data Assessment and Analysis for Model Evaluation
Data Assessment and Analysis for Model Evaluation Data Assessment and Analysis for Model Evaluation
Data Assessment and Analysis for Model Evaluation SaravanakumarSekar4
 

Similaire à House price prediction (20)

AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
 
Build a simple image recognition system with tensor flow
Build a simple image recognition system with tensor flowBuild a simple image recognition system with tensor flow
Build a simple image recognition system with tensor flow
 
IRJET - Hand Gesture Recognition to Perform System Operations
IRJET -  	  Hand Gesture Recognition to Perform System OperationsIRJET -  	  Hand Gesture Recognition to Perform System Operations
IRJET - Hand Gesture Recognition to Perform System Operations
 
AMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTAMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLT
 
DLT UNIT-3.docx
DLT  UNIT-3.docxDLT  UNIT-3.docx
DLT UNIT-3.docx
 
Stock Market Prediction Using ANN
Stock Market Prediction Using ANNStock Market Prediction Using ANN
Stock Market Prediction Using ANN
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)
 
Handwritten Digit Recognition using Convolutional Neural Networks
Handwritten Digit Recognition using Convolutional Neural  NetworksHandwritten Digit Recognition using Convolutional Neural  Networks
Handwritten Digit Recognition using Convolutional Neural Networks
 
Rachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_reportRachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_report
 
Real Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine LearningReal Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine Learning
 
Machine learning Experiments report
Machine learning Experiments report Machine learning Experiments report
Machine learning Experiments report
 
Text Recognition using Convolutional Neural Network: A Review
Text Recognition using Convolutional Neural Network: A ReviewText Recognition using Convolutional Neural Network: A Review
Text Recognition using Convolutional Neural Network: A Review
 
Hand Written Digit Classification
Hand Written Digit ClassificationHand Written Digit Classification
Hand Written Digit Classification
 
Bangla Handwritten Digit Recognition Report.pdf
Bangla Handwritten Digit Recognition  Report.pdfBangla Handwritten Digit Recognition  Report.pdf
Bangla Handwritten Digit Recognition Report.pdf
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
 
Blood Cell Image Classification for Detecting Malaria using CNN
Blood Cell Image Classification for Detecting Malaria using CNNBlood Cell Image Classification for Detecting Malaria using CNN
Blood Cell Image Classification for Detecting Malaria using CNN
 
fINAL ML PPT.pptx
fINAL ML PPT.pptxfINAL ML PPT.pptx
fINAL ML PPT.pptx
 
rscript_paper-1
rscript_paper-1rscript_paper-1
rscript_paper-1
 
Data Assessment and Analysis for Model Evaluation
Data Assessment and Analysis for Model Evaluation Data Assessment and Analysis for Model Evaluation
Data Assessment and Analysis for Model Evaluation
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Dernier (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

House price prediction

  • 1. HOUSE PRICE PREDICTION USING DEEP LEARNING By Sabah Begum 1
  • 2. ABSTRACT Real estate is the least transparent industry in our ecosystem. Housing prices keep changing day in and day out and sometimes are hyped rather than being based on valuation. Predicting housing prices with real factors is the main crux of our research project. Here we aim to make our evaluations based on every basic parameter that is considered while determining the price. We use various regression techniques in this pathway, and our results are not sole determination of one technique rather it is the weighted mean of various techniques to give most accurate results. 2
  • 3. EXISTING SYSTEM Data is at the heart of technical innovations, achieving any result is now possible using predictive models. Machine learning is extensively used in this approach. Although development of correct and accurate model is needed. Previous algorithms like SVM are not suitable for complex real- world data. 3
  • 4. PROPOSED SYSTEM  Our main focus here is to develop a model which predicts the property cost based on vast datasets using data mining approach.  The data mining process in a real estate industry provides an advantage to the developers by processing the data, forecasting future trends and thus assisting them to make favourable knowledge-driven decisions.  Our dataset comprises of various essential parameters and data mining has been at the root of our system.  We will initially clean up our entire dataset and also truncate the outlier values.  Finally a system will be made to predict the Real Estate Prices using Deep Learning Approach. 4
  • 5. FEASIBILITY STUDY Basic areas to be covered are –  Operational Feasibility: The applying can scale back the time consumed to take care of manual records and isn’t dull and cumbersome to take care of the records, therefore operational practicableness is assured.  Technical Feasibility: Minimum hardware requirements: 1.66 GHz Pentium Processor or Intel compatible processor. 1 GB RAM net property, eighty MB disk area. 5
  • 6.  Economical Feasibility :Once the hardware and software package needs get consummated, there’s no want for the user of our system to pay for any further overhead. For the user, the applying are economically possible within the following aspects: the applying can scale back tons of labor work. therefore the efforts are reduced. Our application can scale back the time that’s wasted in manual processes. The storage and handling issues of the registers are resolved . 6
  • 7. SYSTEM SPECIFICATIONS Since this is an AI based Deep Learning approach, we are going to use a dataset extracting the features of houses and algorithmic models to train the dataset for predictions.  Dataset- We are using an unsupervised dataset,i.e, data without class labels. The objective of this unsupervised technique, is to find patterns in data based on the relationship between data points themselves. 7
  • 8. 8
  • 9. • The dataset consists of more than 20,000 records. • Around nine features are provided in the dataset like latitude, total rooms, total bedrooms etc. • Data mining technique is used for processing of the dataset to extract relevant data to be further used in data models.  Data model-The basic technique used for the model is Regression Analysis which is a statistical method to model the relationship between a dependent (target) and independent (predictor) variables with one or more independent variables. It predicts continuous/real values. 9
  • 10. • For our application we are using three types of algorithms to train the model, which are- • KN-Neighbors: K nearest neighbors is a simple algorithm that stores all available cases and predict the numerical target based on a similarity measure (e.g., distance functions) • Support Vector Machine: The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future. • Artificial Neural Network(ANN): It’s a computational model in which artificial neurons are nodes, and directed edges with weights are connections between neuron outputs and neuron inputs. For our project we will be using TensorFlow library to integrate ANN in code. 10
  • 11. HARDWARE AND SOFTWARE REQUIREMENTS SOFTWARE REQUIREMENTS-  Python version- 3.8  Libraries- Pandas: library for manipulating data in numerical tables. scikit-learn: machine learning library that features various classification, regression and clustering algorithms. Matplotlib: Matplotlib is a plotting library for creating static, animated, and interactive visualizations in python. 11
  • 12. Seaborn: library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics Tensorflow: The core open source library to help you develop and train ML models. Jupyterlab: JupyterLab is a web-based interactive development environment for Jupyter notebooks, code, and data. 12
  • 13. HARDWARE REQUIREMENTS- 1.66 GHz Pentium Processor or Intel compatible processor. 1 GB RAM  80 MB disk area. 13
  • 14. DESIGN AND DEVELOPMENT Since this project mainly integrates software features rather than hardware ones, the emphasis is primarily on code development as follows- OVERVIEW OF STEPS Step 1  Import required libraries  Import data from dataset  Get some information about dataset. 14
  • 15. 15
  • 16. Step 2-  Visualizing the data in terms of house prices, affordability, closeness to ocean, number of bedrooms etc.  First we display a histogram for column ‘median_house_value’ from the dataset  Next we display a Heatmap of the dataset. A heatmap() method contains values representing various shades of the same colour for each value to be plotted. Usually the darker shades of the chart represent higher values than the lighter shade.  Next we display a Distplot() for columns ‘median_house_value’, ‘affordable’, and ‘ocean_proximity’. A distplot() method lets you show a histogram with a line on it. It plots a univariate distribution of observations. 16
  • 17. 17
  • 18. 18
  • 19.  Next we display a Countplot() for the column ‘ocean_proximity’. A countplot() method is used to show the counts of observations in each categorical bin using bars.  Next we concatenate the values of dataset with values of ‘ocean_proximity’ column according to the different levels of distance between the ocean and housing area.  Next we view the countplot() of ‘affordable’ column, convert it into string and replace string type “true, false” values by int values 0,1 to represent if the house is affordable or not. 19
  • 20. 20
  • 21. 21
  • 22.  Next we use the info() function on the dataset which basically shows the datatypes used for different columns. 22
  • 23. Step 3-  Detecting and removing outliers using ‘IsolationForest’ algorithm.  Isolation forest is an unsupervised learning algorithm for anomaly detection that works on the principle of isolating anomalies. It isolates the outliers by randomly selecting a feature from the given set of features and then randomly selecting a split value between the maximum and minimum values of the selected feature.  In our dataset after running the algorithm we get around 7170 outliers in the form of negative values(-1), which we remove from the data, i.e., we delete 7170 rows from around 20,000 rows in the dataset. 23
  • 24. 24
  • 25. Step 4-  Splitting into training and testing data.  We use the Train set to make the algorithm learn the data’s behavior and then check the accuracy of our model on the Test set.  Divide the data into two values- Features (X): The columns that are inserted into our model will be used to make predictions. Prediction (y): Target variable that will be predicted by the features.  After defining values of x, y we import train and test functions from scikit learn library. 25
  • 26. TRAINING AND TESTING To test the accuracy of the model we use a metric mean_absolute_percentage_error which we import from scikit library. The algorithms used for training and testing the model are as follows-  KNN- First we import the KNeighborsRegressor() function from scikit library and then run the algorithm on training set and then on test set.  We obtain the mean absolute percentage error of knn as 38.91504096125152 % 26
  • 27. 27
  • 28.  SVM- Similarly we import SVR() from scikit library.  Then we run the svm algorithm on the training data and then the test data.  The mean absolute percentage error of svm obtained is 32.246431895552135 %, which is slightly less than that of knn algorithm.  ANN- We use tensor flow library to build an ANN here for the application.  We also use the Keras library here which is an open source deep learning framework for python. Keras is one of the most powerful and easy to use python library, which is built on top of popular deep learning libraries like TensorFlow, Theano, etc., for creating deep learning models. 28
  • 29.  To build the different layers of ANN we use ‘Flatten’ and ‘Dense’ function of keras framework.  Dense layer is the regular deeply connected neural network layer. It is most common and frequently used layer.  Flatten layer is used to flatten the input.  Then after building the layers of the model we need to compile the model so that the code is converted into machine readable code.  We can view the summary of the model using model.summary() function. 29
  • 30. 30
  • 31. 31
  • 32.  Next we train the model on the training set and then test the model using the test set.  After testing, the mean absolute percentage error obtained is 18.315523022396036 %. 32
  • 33.  As we notice , among the three models we find the ANN model to be the most accurate on the dataset.  Next we visualize the output of the three models graphically, where the original output is represented in blue colour, the svm output is represented in green colour and the ANN output is represented in red colour. 33
  • 34. IMPLEMENTATION AND OUTPUT  To implement and check the accurate output of the model, we first declare variables for each attribute of the dataset and provide user defined values to the variables.  These values are fed as input to the model which predicts the final output, i.e, the price of the house based on the given input.  The final output or prediction is as follows- 34
  • 35. 35
  • 36. CONCLUSION From the training and testing of dataset on the model, a strong deduction can be made that the model ANN works most effectively on the data producing lower levels of errors, and hence we can conclude that Deep Learning techniques prove useful in the implementation and estimation of HOUSE PRICE PREDICTION. 36