SlideShare une entreprise Scribd logo
1  sur  26
Group 4
Team members
Ravi
Richa
Sabarish
Vijay
Problem Statement
Flight Delay Prediction
Dataset understanding and description
No. of features =29, shape of data set = 484551 rows x 28 columns,
Target Variable = ArrDelay
Dataset understanding and description
 Missing Values
 Org_Airport -1177
 Dest_Airport -1479
 Duplicate Data
 2 rows are duplicate
 Duplicate columns
 Columns with same information i.e Org_Airport and Dest_Airport these are repeated
information in data set Origin represented by three letter code for Org_Airport and Dest
represents three letter code for Dest_Airport
 Categorical Variables
 1.UniqueCarrier
 2.FlightNum
 3.TailNum
 4.Origin
 5.Dest
Outliers
# -Columns having outliers :-
 arrdelay
 deepdelay
 taxiout
 carrier delay
 security delay
 late aircraft delay
Data Visualization Techniques used
 Box Plots
 Heat Maps
 Histogram
 Line graphs
 Pie chart
 PairPlot
 Used sweetviz library for more visualization
Screenshots of different visualizations
When we took threshold as .9 then following
features are correlated{'AirTime',
'CRSElapsedTime', 'DepDelay', 'Distance'}
EDA
flight_eda_report.html
Data Processing and FeatureEngg.
 Imputation – Check for Null values in data set and handle it by diff techniques
 Categorical Encoding – target encoding is used as high cardinality
 Handling Outliers – Box plots to analyse outliers in data
 Scaling - Min Max scaler is used
 Feature Selection- Correlation helped in getting feature correlation with each
other and target
 Feature Split – Derived features from Date and time features
Data set post FE – 24 features for Modelling
Our Research
 Analysed the data deeply from domain perspective which gave very interesting insights.
 Different types of categorical handling we have researched on and then came up with target encoding
 As we know deletion of features are always most impactful decision we used both visualization and
domain knowledge to do this part
 Missing Value handling we tried different approached and then finalized one
 We have derived attributes from given features which we really felt will be helpful for further analysing
Future Tasks
 More Feature Engineering
 Training the model on the selected features
 Model development
 Model assessment
Take away from last Meet
 Group Dynamics
 Elements of Data
 Dynamic data
Elements of Data
Feature Engg.
Logistic Regression – SFS for Feature Engineering
Logistic Regression – SFS for Feature
Engineering
Splitting of data
The test_size=0.2 It is split of test and training data as 80/20percent .
X_train data shape after splitting (387639, 24)
X_test data shape after splitting (96910, 24)
y_train data shape after splitting (387639,)
y_test data shape after splitting (96910,)
Linear Regression Model
Interpretation - The R² represents how much variance of the data is explained by
the model, the R2=0.90 means that 0.10 of the variance can not explain by the
model, the logical case when R2=1 the model completely fit and explained all
variance.
 Y = a + bX
 b = slope
 a = intercept
 X= coefficients or features
Ridge Regression Model
Ridge regression is a model tuning method that is used to analyse any data
that suffers from multicollinearity. This method performs L2 regularization.
When the issue of multicollinearity occurs, least-squares are unbiased, and
variances are large, this results in predicted values to be far away from the
actual values.
mean_squared_error with Ridge Regression with train data
0.0033528868720357806
R2 square with Ridge Regression with train data 0.9999999965474273
mean_squared_error with Ridge Regression with test data
0.0027463365607708775
R2 square with Ridge Regression with test data 0.9999999976478994
SVC
Random Forest Regressor
Neural Network
Future Recommendation
1. Regression vs Classification Problem
2. Dataset can have more records for delay = 0
3. Dataset can have more relevant features according to the domain
knowledge/experience
Comparison of Models
Interpretation
 Simple linear regression led to overfitting giving an unrealistic accuracy of 100%. This problem caused by
overfitting is well addressed by applying regularization on the regression model.We have used L2 Regularization
that is Ridge Regression to overcome this issue.
 SVM model is extremely unsuitable for this problem as it takes an unreasonable amount of time(near about 3
hours) to run the model and also gives subpar accuracy. It is computationally expensive and inappropriate for
problems with large datasets such as the one given.
 Random forest is also giving us good accuracy 98 %
 ANN is giving 98 %
Dynamic data
Dynamic data out using linear regression and Ridge Regression

Contenu connexe

Similaire à casestudy_important.pptx

LIDAR- Light Detection and Ranging.
LIDAR- Light Detection and Ranging.LIDAR- Light Detection and Ranging.
LIDAR- Light Detection and Ranging.Gaurav Agarwal
 
Human Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerDataHuman Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerDataIRJET Journal
 
Svd filtered temporal usage clustering
Svd filtered temporal usage clusteringSvd filtered temporal usage clustering
Svd filtered temporal usage clusteringLiang Xie, PhD
 
Parallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingParallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingIRJET Journal
 
Application of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisApplication of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisDr.Pooja Jain
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET Journal
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET Journal
 
EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171Yaxin Liu
 
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...Chakkrit (Kla) Tantithamthavorn
 
Data Assessment and Analysis for Model Evaluation
Data Assessment and Analysis for Model Evaluation Data Assessment and Analysis for Model Evaluation
Data Assessment and Analysis for Model Evaluation SaravanakumarSekar4
 
Apache Lens at Hadoop meetup
Apache Lens at Hadoop meetupApache Lens at Hadoop meetup
Apache Lens at Hadoop meetupamarsri
 
Data science with R - Clustering and Classification
Data science with R - Clustering and ClassificationData science with R - Clustering and Classification
Data science with R - Clustering and ClassificationBrigitte Mueller
 
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...IRJET Journal
 
Backtracking based integer factorisation, primality testing and square root c...
Backtracking based integer factorisation, primality testing and square root c...Backtracking based integer factorisation, primality testing and square root c...
Backtracking based integer factorisation, primality testing and square root c...csandit
 
IRJET - Stock Market Prediction using Machine Learning Algorithm
IRJET - Stock Market Prediction using Machine Learning AlgorithmIRJET - Stock Market Prediction using Machine Learning Algorithm
IRJET - Stock Market Prediction using Machine Learning AlgorithmIRJET Journal
 
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...csandit
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral ResearchPo-Ting Wu
 
Data mining with caret package
Data mining with caret packageData mining with caret package
Data mining with caret packageVivian S. Zhang
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee AttritionShruti Mohan
 

Similaire à casestudy_important.pptx (20)

LIDAR- Light Detection and Ranging.
LIDAR- Light Detection and Ranging.LIDAR- Light Detection and Ranging.
LIDAR- Light Detection and Ranging.
 
Human Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerDataHuman Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerData
 
Svd filtered temporal usage clustering
Svd filtered temporal usage clusteringSvd filtered temporal usage clustering
Svd filtered temporal usage clustering
 
Parallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingParallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive Indexing
 
Application of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisApplication of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosis
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171
 
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
 
Data Assessment and Analysis for Model Evaluation
Data Assessment and Analysis for Model Evaluation Data Assessment and Analysis for Model Evaluation
Data Assessment and Analysis for Model Evaluation
 
Apache Lens at Hadoop meetup
Apache Lens at Hadoop meetupApache Lens at Hadoop meetup
Apache Lens at Hadoop meetup
 
Data science with R - Clustering and Classification
Data science with R - Clustering and ClassificationData science with R - Clustering and Classification
Data science with R - Clustering and Classification
 
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
 
Backtracking based integer factorisation, primality testing and square root c...
Backtracking based integer factorisation, primality testing and square root c...Backtracking based integer factorisation, primality testing and square root c...
Backtracking based integer factorisation, primality testing and square root c...
 
IRJET - Stock Market Prediction using Machine Learning Algorithm
IRJET - Stock Market Prediction using Machine Learning AlgorithmIRJET - Stock Market Prediction using Machine Learning Algorithm
IRJET - Stock Market Prediction using Machine Learning Algorithm
 
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral Research
 
Data mining with caret package
Data mining with caret packageData mining with caret package
Data mining with caret package
 
Oct.22nd.Presentation.Final
Oct.22nd.Presentation.FinalOct.22nd.Presentation.Final
Oct.22nd.Presentation.Final
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
 

Dernier

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 

Dernier (20)

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 

casestudy_important.pptx

  • 3. Dataset understanding and description No. of features =29, shape of data set = 484551 rows x 28 columns, Target Variable = ArrDelay
  • 4. Dataset understanding and description  Missing Values  Org_Airport -1177  Dest_Airport -1479  Duplicate Data  2 rows are duplicate  Duplicate columns  Columns with same information i.e Org_Airport and Dest_Airport these are repeated information in data set Origin represented by three letter code for Org_Airport and Dest represents three letter code for Dest_Airport  Categorical Variables  1.UniqueCarrier  2.FlightNum  3.TailNum  4.Origin  5.Dest
  • 5. Outliers # -Columns having outliers :-  arrdelay  deepdelay  taxiout  carrier delay  security delay  late aircraft delay
  • 6. Data Visualization Techniques used  Box Plots  Heat Maps  Histogram  Line graphs  Pie chart  PairPlot  Used sweetviz library for more visualization
  • 7. Screenshots of different visualizations When we took threshold as .9 then following features are correlated{'AirTime', 'CRSElapsedTime', 'DepDelay', 'Distance'}
  • 9. Data Processing and FeatureEngg.  Imputation – Check for Null values in data set and handle it by diff techniques  Categorical Encoding – target encoding is used as high cardinality  Handling Outliers – Box plots to analyse outliers in data  Scaling - Min Max scaler is used  Feature Selection- Correlation helped in getting feature correlation with each other and target  Feature Split – Derived features from Date and time features Data set post FE – 24 features for Modelling
  • 10. Our Research  Analysed the data deeply from domain perspective which gave very interesting insights.  Different types of categorical handling we have researched on and then came up with target encoding  As we know deletion of features are always most impactful decision we used both visualization and domain knowledge to do this part  Missing Value handling we tried different approached and then finalized one  We have derived attributes from given features which we really felt will be helpful for further analysing
  • 11. Future Tasks  More Feature Engineering  Training the model on the selected features  Model development  Model assessment Take away from last Meet  Group Dynamics  Elements of Data  Dynamic data
  • 13.
  • 14. Feature Engg. Logistic Regression – SFS for Feature Engineering
  • 15. Logistic Regression – SFS for Feature Engineering
  • 16. Splitting of data The test_size=0.2 It is split of test and training data as 80/20percent . X_train data shape after splitting (387639, 24) X_test data shape after splitting (96910, 24) y_train data shape after splitting (387639,) y_test data shape after splitting (96910,)
  • 17. Linear Regression Model Interpretation - The R² represents how much variance of the data is explained by the model, the R2=0.90 means that 0.10 of the variance can not explain by the model, the logical case when R2=1 the model completely fit and explained all variance.
  • 18.  Y = a + bX  b = slope  a = intercept  X= coefficients or features
  • 19. Ridge Regression Model Ridge regression is a model tuning method that is used to analyse any data that suffers from multicollinearity. This method performs L2 regularization. When the issue of multicollinearity occurs, least-squares are unbiased, and variances are large, this results in predicted values to be far away from the actual values. mean_squared_error with Ridge Regression with train data 0.0033528868720357806 R2 square with Ridge Regression with train data 0.9999999965474273 mean_squared_error with Ridge Regression with test data 0.0027463365607708775 R2 square with Ridge Regression with test data 0.9999999976478994
  • 20. SVC
  • 23. Future Recommendation 1. Regression vs Classification Problem 2. Dataset can have more records for delay = 0 3. Dataset can have more relevant features according to the domain knowledge/experience
  • 25. Interpretation  Simple linear regression led to overfitting giving an unrealistic accuracy of 100%. This problem caused by overfitting is well addressed by applying regularization on the regression model.We have used L2 Regularization that is Ridge Regression to overcome this issue.  SVM model is extremely unsuitable for this problem as it takes an unreasonable amount of time(near about 3 hours) to run the model and also gives subpar accuracy. It is computationally expensive and inappropriate for problems with large datasets such as the one given.  Random forest is also giving us good accuracy 98 %  ANN is giving 98 %
  • 26. Dynamic data Dynamic data out using linear regression and Ridge Regression