SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
Location:
Boston Data Festival
September 23rd 2016
What’s Missing ? Methods in missing data
analysis
2016 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
www.QuantUniversity.com
sri@quantuniversity.com
• Will be on the QuantUniversity Meetup page.
• If you are not a member signup here:
https://www.meetup.com/QuantUniversity-Meetup/
Slides and code
- Analytics Advisory services
- Custom training programs
- Architecture assessments, advice and audits
• Founder of QuantUniversity LLC. and
www.analyticscertificate.com
• Advisory and Consultancy for Financial
Analytics
• Prior Experience at MathWorks, Citigroup
and Endeca and 25+ financial services and
energy customers
• Regular Columnist for the Wilmott
Magazine
• Author of forthcoming book
“Financial Modeling: A case study
approach” published by Wiley
• Charted Financial Analyst and Certified
Analytics Professional
• Teaches Analytics in the Babson College
MBA program and at Northeastern
University, Boston
Sri Krishnamurthy
Founder and CEO
4
5
Quantitative Analytics and Big Data Analytics Onboarding
• Trained more than 500 students in
Quantitative methods, Data
Science and Big Data Technologies
using MATLAB, Python and R
• Launching the Analytics Certificate
Program in 2016
(MATLAB version also available)
8
www.analyticscertificate.com/SparkWorkshop
Early bird
pricing ending
today!
 Definition
 Assumptions
 Work flow
What does a missing data problem look like?
Missing data
• Dealing with missing data, has been always a challenge in data analysis context.
• We need methods in missing data analysis that:
▫ Minimize the bias
▫ Maximize use of available information, and
▫ Get good estimates of uncertainty e.g., p-value, confidence interval, etc.
Rec No Variable n
1 Unit non-response Unobserved/Latent
variable
2
3 Missing data
4 Item-non-
response
Assumptions (MCAR, MAR, NMAR)
• When values are missing completely at random (MCAR), the probability of
missingness is unrelated to the values of any variable
• Data are missing at random (MAR) if missingness is unrelated to the value that
is missing, but is related to the values of other variables
• E.g., a question about typical number of hours spent browsing the internet
might be missing more often for married than unmarried participants; BUT
among the married subset, missingness is completely random—Not related
to how many hours the person browses
• Data are missing not at random (MNAR) if missingness is related to the value
that is missing, and often to the values of other variables as well
• E.g. missing values are more prevalent among those who typically browse
more than among those who browse less
• Deletion methods : Delete cases or variables that are missing
▫ Listwise methods
▫ Pairwise deletion
▫ Variable deletion
• Imputation methods : Substitution methods
▫ Single imputation
 Mean imputation
 Conditional mean imputation
 Case mean imputation
 Regression imputation
 Last observation carried forward
 Worst case imputation
 Best case imputation
 EM imputation
▫ Multiple imputation
Methods of handling missing data
List wise deletion
• A good method when the proportion of
missing data is less than 15%.
• Advantages:
▫ It can be used for any type of statistical
analysis.
▫ No special computations are required.
▫ The parameters estimations are
unbiased.
▫ The standard errors are appropriate
compare to original data.
• Disadvantages:
▫ May remove a considerable fraction of
data
Pair wise deletion
• Pairwise deletion involves dropping cases
with missing values on an analysis-by-
analysis basis
• Advantages:
▫ Using all available non-missing data
• Disadvantages:
▫ Estimated standard errors and test
statistics are biased
Variable deletion
• Variable deletion involves dropping
variables with missing values on an case -
by-case basis
• Advantages:
▫ Makes sense when lot of missing values in
a variable and if the variable is of relatively
less importance
• Disadvantages:
▫ Loss of information regarding the variable
Mean imputation
• Replace missing values with the mean of
that variable Case Var1 Var2 Var3
1 9 8 8
2 7.44 7 6
3 8 5 6
4 7 4 5
5 9 5 7
6 8 8 9
7 6 7 6
8 5 9 7
9 7 8 ?
10 8 8 7
Conditional Mean imputation
• Replace missing values with value of the
variable mean for a relevant subgroup
Case Var1 Sex Var2 Var3
1 9 F 8 8
2 8.25 F 7 6
3 8 F 5 6
4 7 F 4 5
5 9 F 5 7
6 8 M 8 9
7 6 M 7 6
8 5 M 9 7
9 7 M 8 ?
10 8 M 8 7
Case Mean imputation
• Replace missing values using information
from other variables for the same case to
impute the missing value
Case Var1 Var2 Var3
1 9 8 8
2 6.50 7 6
3 8 5 6
4 7 4 5
5 9 5 7
6 8 8 9
7 6 7 6
8 5 9 7
9 7 8 ?
10 8 8 7
Regression imputation
• Replace missing values using information
from complete cases to “predict” the
value of the missing data, based on a
regression equation for cases with
nonmissing values
Case Var1 Var2 Var3
1 9 8 8
2 6.32 7 6
3 8 5 6
4 7 4 5
5 9 5 7
6 8 8 9
7 6 7 6
8 5 9 7
9 7 8 ?
10 8 8 7
VAR1′ = 4.621 – (.734 * VAR2) + (1.139 * VAR3)
• Imputes the missing value as a
value on the same outcome the
most recent time it was observed
• Variants :
• Average of T1 and T2
Last observation carried forward
Case T1 T2 T3
1 9 8 8
2 ? 7 6
3 8 5 6
4 7 4 5
5 9 5 7
6 8 8 9
7 6 7 6
8 5 9 7
9 7 8 8
10 8 8 7
• Use interpolation to fill in missing
values
• Useful for longitudinal datasets
Interpolation
• Worst case replaces a missing value with the worst case scenario for a
categorical outcome
• Best case replaces a missing value with the best case scenario for a
categorical outcome
Worst case and Best case imputation
• Substitute best missing values using
a ML imputation
• In the E-step, expected values are
calculated based on all complete
data points
• In the M-step, the procedure
imputes the expected values from
the E-step and then maximizes the
likelihood function to obtain new
parameter estimates
Expectation-Maximization
• Multiple imputation is quickly
becoming the “gold standard”
approach to handling missing
values
• Computationally complex
Multiple imputation
Summary
We have covered Missing data
Introduction  Missing data definition, assumptions and work flow
Deletion methods  Listwise methods
 Pairwise deletion
 Variable deletion
Imputation methods  Single imputation
 Mean imputation
 Conditional mean imputation
 Case mean imputation
 Regression imputation
 Last observation carried forward
 Worst case imputation
 Interpolation
 Best case imputation
 EM imputation
 Multiple imputation
References
29
www.analyticscertificate.com/SparkWorkshop
Early bird
pricing ending
today!
Answer this question on the Eventbrite site
promo code section and take an additional
25% off! Only today!
What package in Python do you need to
use DataFrame functionality?
Thank you!
Sri Krishnamurthy, CFA, CAP
Founder and CEO
srikrishnamurthy
Contact
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and
shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC.

Contenu connexe

Tendances

Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning pyingkodi maran
 
Data Reduction
Data ReductionData Reduction
Data ReductionRajan Shah
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with PythonDavis David
 
Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning ClusteringRupak Roy
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining Sulman Ahmed
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysisVishwas N
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learningSANTHOSH RAJA M G
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony
 
Cross validation
Cross validationCross validation
Cross validationRidhaAfrawe
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesAbhishekKumar4995
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualizationDr. Hamdan Al-Sabri
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science ProcessVishal Patel
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis IntroductionPrasiddhaSarma
 

Tendances (20)

Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
 
Data Reduction
Data ReductionData Reduction
Data Reduction
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning Clustering
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Outlier Detection
Outlier DetectionOutlier Detection
Outlier Detection
 
Resampling methods
Resampling methodsResampling methods
Resampling methods
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Cross validation
Cross validationCross validation
Cross validation
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT Slides
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis Introduction
 

En vedette

Model Risk Management : Best Practices
Model Risk Management : Best PracticesModel Risk Management : Best Practices
Model Risk Management : Best PracticesQuantUniversity
 
Energy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshopEnergy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshopQuantUniversity
 
Outlier analysis for Temporal Datasets
Outlier analysis for Temporal DatasetsOutlier analysis for Temporal Datasets
Outlier analysis for Temporal DatasetsQuantUniversity
 
Scaling Analytics with Apache Spark
Scaling Analytics with Apache SparkScaling Analytics with Apache Spark
Scaling Analytics with Apache SparkQuantUniversity
 
PythonQuants conference - QuantUniversity presentation - Stress Testing in th...
PythonQuants conference - QuantUniversity presentation - Stress Testing in th...PythonQuants conference - QuantUniversity presentation - Stress Testing in th...
PythonQuants conference - QuantUniversity presentation - Stress Testing in th...QuantUniversity
 
FitchLearning QuantUniversity Model Risk Presentation
FitchLearning QuantUniversity Model Risk PresentationFitchLearning QuantUniversity Model Risk Presentation
FitchLearning QuantUniversity Model Risk PresentationQuantUniversity
 
Model Risk Management: Using an infinitely scalable stress testing platform f...
Model Risk Management: Using an infinitely scalable stress testing platform f...Model Risk Management: Using an infinitely scalable stress testing platform f...
Model Risk Management: Using an infinitely scalable stress testing platform f...QuantUniversity
 
Guest talk- Roof Classification
Guest talk- Roof ClassificationGuest talk- Roof Classification
Guest talk- Roof ClassificationQuantUniversity
 
Introduction to Business Modeling
Introduction to Business ModelingIntroduction to Business Modeling
Introduction to Business ModelingLaurence White
 
Big data, Analytics and Beyond
Big data, Analytics and BeyondBig data, Analytics and Beyond
Big data, Analytics and BeyondQuantUniversity
 
Statistical Approaches to Missing Data
Statistical Approaches to Missing DataStatistical Approaches to Missing Data
Statistical Approaches to Missing DataDataCards
 
A Framework Driven Approach to Model Risk Management (www.dataanalyticsfinanc...
A Framework Driven Approach to Model Risk Management (www.dataanalyticsfinanc...A Framework Driven Approach to Model Risk Management (www.dataanalyticsfinanc...
A Framework Driven Approach to Model Risk Management (www.dataanalyticsfinanc...QuantUniversity
 
Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization CS, NcState
 
Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop QuantUniversity
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part IIQuantUniversity
 

En vedette (20)

Model Risk Management : Best Practices
Model Risk Management : Best PracticesModel Risk Management : Best Practices
Model Risk Management : Best Practices
 
Energy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshopEnergy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshop
 
Outlier analysis for Temporal Datasets
Outlier analysis for Temporal DatasetsOutlier analysis for Temporal Datasets
Outlier analysis for Temporal Datasets
 
Scaling Analytics with Apache Spark
Scaling Analytics with Apache SparkScaling Analytics with Apache Spark
Scaling Analytics with Apache Spark
 
PythonQuants conference - QuantUniversity presentation - Stress Testing in th...
PythonQuants conference - QuantUniversity presentation - Stress Testing in th...PythonQuants conference - QuantUniversity presentation - Stress Testing in th...
PythonQuants conference - QuantUniversity presentation - Stress Testing in th...
 
FitchLearning QuantUniversity Model Risk Presentation
FitchLearning QuantUniversity Model Risk PresentationFitchLearning QuantUniversity Model Risk Presentation
FitchLearning QuantUniversity Model Risk Presentation
 
Model Risk Management: Using an infinitely scalable stress testing platform f...
Model Risk Management: Using an infinitely scalable stress testing platform f...Model Risk Management: Using an infinitely scalable stress testing platform f...
Model Risk Management: Using an infinitely scalable stress testing platform f...
 
Machine learning meetup
Machine learning meetupMachine learning meetup
Machine learning meetup
 
Guest talk- Roof Classification
Guest talk- Roof ClassificationGuest talk- Roof Classification
Guest talk- Roof Classification
 
Introduction to Business Modeling
Introduction to Business ModelingIntroduction to Business Modeling
Introduction to Business Modeling
 
Big data, Analytics and Beyond
Big data, Analytics and BeyondBig data, Analytics and Beyond
Big data, Analytics and Beyond
 
Credit risk meetup
Credit risk meetupCredit risk meetup
Credit risk meetup
 
Statistical Approaches to Missing Data
Statistical Approaches to Missing DataStatistical Approaches to Missing Data
Statistical Approaches to Missing Data
 
A Framework Driven Approach to Model Risk Management (www.dataanalyticsfinanc...
A Framework Driven Approach to Model Risk Management (www.dataanalyticsfinanc...A Framework Driven Approach to Model Risk Management (www.dataanalyticsfinanc...
A Framework Driven Approach to Model Risk Management (www.dataanalyticsfinanc...
 
Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization Handling Missing Attributes using Matrix Factorization 
Handling Missing Attributes using Matrix Factorization 
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
C3.4.2
C3.4.2C3.4.2
C3.4.2
 
Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part II
 
Deep learning - Part I
Deep learning - Part IDeep learning - Part I
Deep learning - Part I
 

Similaire à Missing data handling

missingdatahandling-160923201313.pptx
missingdatahandling-160923201313.pptxmissingdatahandling-160923201313.pptx
missingdatahandling-160923201313.pptxDakshKhurana15
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slidesQuantUniversity
 
DataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptxDataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptxAkash527744
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detectionShantanuDeosthale
 
3 Missing data12256429.ppt
3 Missing data12256429.ppt3 Missing data12256429.ppt
3 Missing data12256429.pptAravind Reddy
 
Anomaly detection Meetup Slides
Anomaly detection Meetup SlidesAnomaly detection Meetup Slides
Anomaly detection Meetup SlidesQuantUniversity
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluationeShikshak
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.pptDeadpool120050
 
Analyst’s Nightmare or Laundering Massive Spreadsheets
Analyst’s Nightmare or Laundering Massive SpreadsheetsAnalyst’s Nightmare or Laundering Massive Spreadsheets
Analyst’s Nightmare or Laundering Massive SpreadsheetsPyData
 
Anomaly detection: Core Techniques and Advances in Big Data and Deep Learning
Anomaly detection: Core Techniques and Advances in Big Data and Deep LearningAnomaly detection: Core Techniques and Advances in Big Data and Deep Learning
Anomaly detection: Core Techniques and Advances in Big Data and Deep LearningQuantUniversity
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Maninda Edirisooriya
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
Research 101: Quantitative Data Preparation
Research 101: Quantitative Data PreparationResearch 101: Quantitative Data Preparation
Research 101: Quantitative Data PreparationHarold Gamero
 
2010 smg training_cardiff_day1_session3_higgins
2010 smg training_cardiff_day1_session3_higgins2010 smg training_cardiff_day1_session3_higgins
2010 smg training_cardiff_day1_session3_higginsrgveroniki
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryPranov Mishra
 

Similaire à Missing data handling (20)

missingdatahandling-160923201313.pptx
missingdatahandling-160923201313.pptxmissingdatahandling-160923201313.pptx
missingdatahandling-160923201313.pptx
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
DataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptxDataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptx
 
Kevin Swingler: Introduction to Data Mining
Kevin Swingler: Introduction to Data MiningKevin Swingler: Introduction to Data Mining
Kevin Swingler: Introduction to Data Mining
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detection
 
3 Missing data12256429.ppt
3 Missing data12256429.ppt3 Missing data12256429.ppt
3 Missing data12256429.ppt
 
Anomaly detection Meetup Slides
Anomaly detection Meetup SlidesAnomaly detection Meetup Slides
Anomaly detection Meetup Slides
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.ppt
 
Analyst’s Nightmare or Laundering Massive Spreadsheets
Analyst’s Nightmare or Laundering Massive SpreadsheetsAnalyst’s Nightmare or Laundering Massive Spreadsheets
Analyst’s Nightmare or Laundering Massive Spreadsheets
 
Anomaly detection: Core Techniques and Advances in Big Data and Deep Learning
Anomaly detection: Core Techniques and Advances in Big Data and Deep LearningAnomaly detection: Core Techniques and Advances in Big Data and Deep Learning
Anomaly detection: Core Techniques and Advances in Big Data and Deep Learning
 
Galambos_SlidesNEAIR2015
Galambos_SlidesNEAIR2015Galambos_SlidesNEAIR2015
Galambos_SlidesNEAIR2015
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
Research 101: Quantitative Data Preparation
Research 101: Quantitative Data PreparationResearch 101: Quantitative Data Preparation
Research 101: Quantitative Data Preparation
 
2.mathematics for machine learning
2.mathematics for machine learning2.mathematics for machine learning
2.mathematics for machine learning
 
2010 smg training_cardiff_day1_session3_higgins
2010 smg training_cardiff_day1_session3_higgins2010 smg training_cardiff_day1_session3_higgins
2010 smg training_cardiff_day1_session3_higgins
 
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage Industry
 

Plus de QuantUniversity

EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !QuantUniversity
 
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdfManaging-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdfQuantUniversity
 
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSPYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSQuantUniversity
 
Qu for India - QuantUniversity FundRaiser
Qu for India  - QuantUniversity FundRaiserQu for India  - QuantUniversity FundRaiser
Qu for India - QuantUniversity FundRaiserQuantUniversity
 
Ml master class for CFA Dallas
Ml master class for CFA DallasMl master class for CFA Dallas
Ml master class for CFA DallasQuantUniversity
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0QuantUniversity
 
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...QuantUniversity
 
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...QuantUniversity
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewQuantUniversity
 
AI Explainability and Model Risk Management
AI Explainability and Model Risk ManagementAI Explainability and Model Risk Management
AI Explainability and Model Risk ManagementQuantUniversity
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0QuantUniversity
 
Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021QuantUniversity
 
Bayesian Portfolio Allocation
Bayesian Portfolio AllocationBayesian Portfolio Allocation
Bayesian Portfolio AllocationQuantUniversity
 
Constructing Private Asset Benchmarks
Constructing Private Asset BenchmarksConstructing Private Asset Benchmarks
Constructing Private Asset BenchmarksQuantUniversity
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning InterpretabilityQuantUniversity
 
Responsible AI in Action
Responsible AI in ActionResponsible AI in Action
Responsible AI in ActionQuantUniversity
 
Qu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQuantUniversity
 

Plus de QuantUniversity (20)

EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !
 
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdfManaging-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
 
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSPYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
 
Qu for India - QuantUniversity FundRaiser
Qu for India  - QuantUniversity FundRaiserQu for India  - QuantUniversity FundRaiser
Qu for India - QuantUniversity FundRaiser
 
Ml master class for CFA Dallas
Ml master class for CFA DallasMl master class for CFA Dallas
Ml master class for CFA Dallas
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0
 
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
 
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper review
 
AI Explainability and Model Risk Management
AI Explainability and Model Risk ManagementAI Explainability and Model Risk Management
AI Explainability and Model Risk Management
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0
 
Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021
 
Bayesian Portfolio Allocation
Bayesian Portfolio AllocationBayesian Portfolio Allocation
Bayesian Portfolio Allocation
 
The API Jungle
The API JungleThe API Jungle
The API Jungle
 
Explainable AI Workshop
Explainable AI WorkshopExplainable AI Workshop
Explainable AI Workshop
 
Constructing Private Asset Benchmarks
Constructing Private Asset BenchmarksConstructing Private Asset Benchmarks
Constructing Private Asset Benchmarks
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 
Responsible AI in Action
Responsible AI in ActionResponsible AI in Action
Responsible AI in Action
 
Qu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in Finance
 
Qwafafew meeting 5
Qwafafew meeting 5Qwafafew meeting 5
Qwafafew meeting 5
 

Dernier

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 

Dernier (20)

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

Missing data handling

  • 1. Location: Boston Data Festival September 23rd 2016 What’s Missing ? Methods in missing data analysis 2016 Copyright QuantUniversity LLC. Presented By: Sri Krishnamurthy, CFA, CAP www.QuantUniversity.com sri@quantuniversity.com
  • 2. • Will be on the QuantUniversity Meetup page. • If you are not a member signup here: https://www.meetup.com/QuantUniversity-Meetup/ Slides and code
  • 3. - Analytics Advisory services - Custom training programs - Architecture assessments, advice and audits
  • 4. • Founder of QuantUniversity LLC. and www.analyticscertificate.com • Advisory and Consultancy for Financial Analytics • Prior Experience at MathWorks, Citigroup and Endeca and 25+ financial services and energy customers • Regular Columnist for the Wilmott Magazine • Author of forthcoming book “Financial Modeling: A case study approach” published by Wiley • Charted Financial Analyst and Certified Analytics Professional • Teaches Analytics in the Babson College MBA program and at Northeastern University, Boston Sri Krishnamurthy Founder and CEO 4
  • 5. 5 Quantitative Analytics and Big Data Analytics Onboarding • Trained more than 500 students in Quantitative methods, Data Science and Big Data Technologies using MATLAB, Python and R • Launching the Analytics Certificate Program in 2016
  • 6. (MATLAB version also available)
  • 7.
  • 10. What does a missing data problem look like?
  • 11. Missing data • Dealing with missing data, has been always a challenge in data analysis context. • We need methods in missing data analysis that: ▫ Minimize the bias ▫ Maximize use of available information, and ▫ Get good estimates of uncertainty e.g., p-value, confidence interval, etc. Rec No Variable n 1 Unit non-response Unobserved/Latent variable 2 3 Missing data 4 Item-non- response
  • 12. Assumptions (MCAR, MAR, NMAR) • When values are missing completely at random (MCAR), the probability of missingness is unrelated to the values of any variable • Data are missing at random (MAR) if missingness is unrelated to the value that is missing, but is related to the values of other variables • E.g., a question about typical number of hours spent browsing the internet might be missing more often for married than unmarried participants; BUT among the married subset, missingness is completely random—Not related to how many hours the person browses • Data are missing not at random (MNAR) if missingness is related to the value that is missing, and often to the values of other variables as well • E.g. missing values are more prevalent among those who typically browse more than among those who browse less
  • 13.
  • 14. • Deletion methods : Delete cases or variables that are missing ▫ Listwise methods ▫ Pairwise deletion ▫ Variable deletion • Imputation methods : Substitution methods ▫ Single imputation  Mean imputation  Conditional mean imputation  Case mean imputation  Regression imputation  Last observation carried forward  Worst case imputation  Best case imputation  EM imputation ▫ Multiple imputation Methods of handling missing data
  • 15. List wise deletion • A good method when the proportion of missing data is less than 15%. • Advantages: ▫ It can be used for any type of statistical analysis. ▫ No special computations are required. ▫ The parameters estimations are unbiased. ▫ The standard errors are appropriate compare to original data. • Disadvantages: ▫ May remove a considerable fraction of data
  • 16. Pair wise deletion • Pairwise deletion involves dropping cases with missing values on an analysis-by- analysis basis • Advantages: ▫ Using all available non-missing data • Disadvantages: ▫ Estimated standard errors and test statistics are biased
  • 17. Variable deletion • Variable deletion involves dropping variables with missing values on an case - by-case basis • Advantages: ▫ Makes sense when lot of missing values in a variable and if the variable is of relatively less importance • Disadvantages: ▫ Loss of information regarding the variable
  • 18. Mean imputation • Replace missing values with the mean of that variable Case Var1 Var2 Var3 1 9 8 8 2 7.44 7 6 3 8 5 6 4 7 4 5 5 9 5 7 6 8 8 9 7 6 7 6 8 5 9 7 9 7 8 ? 10 8 8 7
  • 19. Conditional Mean imputation • Replace missing values with value of the variable mean for a relevant subgroup Case Var1 Sex Var2 Var3 1 9 F 8 8 2 8.25 F 7 6 3 8 F 5 6 4 7 F 4 5 5 9 F 5 7 6 8 M 8 9 7 6 M 7 6 8 5 M 9 7 9 7 M 8 ? 10 8 M 8 7
  • 20. Case Mean imputation • Replace missing values using information from other variables for the same case to impute the missing value Case Var1 Var2 Var3 1 9 8 8 2 6.50 7 6 3 8 5 6 4 7 4 5 5 9 5 7 6 8 8 9 7 6 7 6 8 5 9 7 9 7 8 ? 10 8 8 7
  • 21. Regression imputation • Replace missing values using information from complete cases to “predict” the value of the missing data, based on a regression equation for cases with nonmissing values Case Var1 Var2 Var3 1 9 8 8 2 6.32 7 6 3 8 5 6 4 7 4 5 5 9 5 7 6 8 8 9 7 6 7 6 8 5 9 7 9 7 8 ? 10 8 8 7 VAR1′ = 4.621 – (.734 * VAR2) + (1.139 * VAR3)
  • 22. • Imputes the missing value as a value on the same outcome the most recent time it was observed • Variants : • Average of T1 and T2 Last observation carried forward Case T1 T2 T3 1 9 8 8 2 ? 7 6 3 8 5 6 4 7 4 5 5 9 5 7 6 8 8 9 7 6 7 6 8 5 9 7 9 7 8 8 10 8 8 7
  • 23. • Use interpolation to fill in missing values • Useful for longitudinal datasets Interpolation
  • 24. • Worst case replaces a missing value with the worst case scenario for a categorical outcome • Best case replaces a missing value with the best case scenario for a categorical outcome Worst case and Best case imputation
  • 25. • Substitute best missing values using a ML imputation • In the E-step, expected values are calculated based on all complete data points • In the M-step, the procedure imputes the expected values from the E-step and then maximizes the likelihood function to obtain new parameter estimates Expectation-Maximization
  • 26. • Multiple imputation is quickly becoming the “gold standard” approach to handling missing values • Computationally complex Multiple imputation
  • 27. Summary We have covered Missing data Introduction  Missing data definition, assumptions and work flow Deletion methods  Listwise methods  Pairwise deletion  Variable deletion Imputation methods  Single imputation  Mean imputation  Conditional mean imputation  Case mean imputation  Regression imputation  Last observation carried forward  Worst case imputation  Interpolation  Best case imputation  EM imputation  Multiple imputation References
  • 28.
  • 29. 29 www.analyticscertificate.com/SparkWorkshop Early bird pricing ending today! Answer this question on the Eventbrite site promo code section and take an additional 25% off! Only today! What package in Python do you need to use DataFrame functionality?
  • 30. Thank you! Sri Krishnamurthy, CFA, CAP Founder and CEO srikrishnamurthy Contact Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC.