SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions
lavaan.survey: An R package for complex
survey analysis of structural equation models
Daniel Oberski
Department of methodology and statistics
lavaan.survey doberski@uvt.nl
Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions
lavaan.survey doberski@uvt.nl
Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions
What is "complex" about a sample survey?
lavaan.survey doberski@uvt.nl
Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions
1. Clustering
3. Weighting
2. Stratification
lavaan.survey doberski@uvt.nl
Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions
Why account for complex sampling in
structural equation modeling?
• Clustering, stratification, and weighting all affect the
standard errors.
• Weighting affects the point estimates;
(Skinner, Holt & Smith 1989; Pfefferman 1993; Muthén & Satorra 1995; Asparouhov
2005; Asparouhov, & Muthén 2005; Stapleton 2006; Asparouhov & Muthén 2007;
Fuller 2009; Bollen, Tueller, & Oberski 2013)
lavaan.survey doberski@uvt.nl
Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions
lavaan.survey
Input:
1. A lavaan fit object (SEM analysis)
2. A survey design object (complex sample design)
Output:
A lavaan fit object (SEM analysis),
accounting for the complex sample design
lavaan.survey doberski@uvt.nl
Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions
lavaan.survey
Input:
1. A lavaan fit object (SEM analysis)
2. A survey design object (complex sample design)
Output:
A lavaan fit object (SEM analysis),
accounting for the complex sample design
Example call:
lavaan.survey(lavaan.fit, survey.design)
lavaan.survey doberski@uvt.nl
Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions
How to fit a structural equation model accounting for
complex sampling using lavaan.survey in 3 steps:
..1 Fit the SEM using lavaan as you normally would
→ lavaan fit object;
..2 Define the complex sampling design to the survey
package
→ svydesign or svrepdesign object;
..3 Call lavaan.survey with the two objects obtained in (1)
and (2).
→ corrected lavaan fit object.
Afterwards, usual lavaan apparatus available.
lavaan.survey doberski@uvt.nl
Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions
Example complex sample SEM analysis from the literature
Ferla, Valcke & Cai (2009). Academic self-efficacy and academic
self-concept: Reconsidering structural relationships. Learning and
Individual Differences, 19 (4).
lavaan.survey doberski@uvt.nl
Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions
Structural equation model of Belgian PISA data
lavaan.survey doberski@uvt.nl
Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions
PISA data in the lavaan.survey package
load("pisa.be.2003.rda")
> head(pisa.be.2003, 5)[,1:20]
PV1MATH1 PV1MATH2 PV1MATH3 PV1MATH4 ST31Q01 ST31Q02 ST31Q03 ST31Q04
1 1.160 1.26 1.19 1.190 2 1 2 2
2 1.293 1.27 1.22 1.247 1 1 1 1
3 1.383 1.44 1.37 1.347 2 1 1 1
4 0.965 1.02 1.10 0.979 2 2 2 2
5 1.375 1.44 1.31 1.471 3 1 2 2
ST31Q05 ST31Q06 ST31Q07 ST31Q08 ST32Q02 ST32Q04 ST32Q06 ST32Q07
1 2 2 2 3 2 3 3 3
2 1 2 1 4 2 3 3 4
3 1 1 2 3 4 2 1 1
4 2 1 2 2 3 3 2 2
5 1 3 3 3 3 2 3 3
ST32Q09 ESCS male school.type
1 3 1.221 1 1
2 4 1.899 2 1
3 2 0.884 2 1
4 3 1.305 2 1
5 3 0.958 2 1
lavaan.survey doberski@uvt.nl
Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions
Belgium is a complex country.
lavaan.survey doberski@uvt.nl
Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions
In addition to the observed variables, the raw data also
contain 80 replicate weights generated by
``balanced-repeated replication'' (BRR)
> pisa.be.2003[sample(1:8796), 21:101]
W_FSTUWT W_FSTR1 W_FSTR2 W_FSTR3 ... W_FSTR80
2988 1.12 1.64 0.57 1.70 ... 0.528
8666 11.90 17.84 5.95 5.95 ... 5.948
290 16.89 25.33 8.44 8.44 ... 25.333
3505 14.80 8.25 7.90 20.67 ... 8.206
4289 22.70 32.40 10.87 10.80 ... 11.755
2352 13.09 20.62 6.26 18.79 ... 6.301
....
lavaan.survey doberski@uvt.nl
Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions
Step 1: Fitting the model with lavaan
model.pisa <- "
math =~ PV1MATH1 + PV1MATH2 + PV1MATH3 + PV1MATH4
neg.efficacy =~ ST31Q01 + ST31Q02 + ST31Q03 + ST31Q04 +
ST31Q05 + ST31Q06 + ST31Q07 + ST31Q08
neg.selfconcept =~ ST32Q02 + ST32Q04 + ST32Q06 + ST32Q07 + ST32Q09
neg.selfconcept ~ neg.efficacy + ESCS + male
neg.efficacy ~ neg.selfconcept + school.type + ESCS + male
math ~ neg.selfconcept + neg.efficacy + school.type + ESCS + male
"
fit <- lavaan(model.pisa, data=pisa.be.2003, auto.var=TRUE, std.lv=TRUE,
meanstructure=TRUE, int.ov.free=TRUE, estimator="MLM")
lavaan.survey doberski@uvt.nl
Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions
Step 2: Letting survey know
about the replicate weight design
des.rep <- svrepdesign(ids=~1, weights=~W_FSTUWT,
data=pisa.be.2003,
repweights="W_FSTR[0-9]+",
type="Fay", rho=0.5)
More information on defining complex sampling objects:
Lumley (2004). Complex Surveys: A Guide to Analysis using R. New
York: Wiley.
Thomas Lumley's website with tutorials:
http://r-survey.r-forge.r-project.org/survey/index.html
lavaan.survey doberski@uvt.nl
Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions
Complex survey SEM
fit.surv <-
lavaan.survey(lavaan.fit=fit, survey.design=des.rep)
lavaan.survey doberski@uvt.nl
Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions
Point and standard error estimates using robust ML with and
without correction for the sampling design using BRR replicate
weights.
Estimate s.e.
Naive PML Naive PML creff
neg.selfconcept ∼ neg.efficacy -0.021 -0.050 0.032 0.046 1.415
neg.efficacy ∼ neg.selfconcept 0.568 0.609 0.046 0.065 1.421
neg.efficacy ∼ school.type 0.530 0.518 0.022 0.022 1.009
math ∼ neg.selfconcept -0.179 -0.177 0.015 0.021 1.362
math ∼ neg.efficacy -0.239 -0.237 0.015 0.018 1.216
math ∼ school.type -0.606 -0.596 0.019 0.035 1.858
lavaan.survey doberski@uvt.nl
Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions
Conclusions
lavaan.survey doberski@uvt.nl
Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions
Conclusions
• When there are clusters, strata, and/or weights, you
should usually adjust the SEM analysis for them;
• Especially important for correct standard errors and
model fit statistics.
• lavaan.survey leverages the power of the lavaan and
survey packages to let you do this.
• Future: categorical data.
lavaan.survey doberski@uvt.nl
Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions
doberski@uvt.nl
http://daob.nl
Oberski, D.L. (2014). lavaan.survey: An R Package for Complex Survey
Analysis of Structural Equation Models. Journal of Statistical Software,
57 (1). http://www.jstatsoft.org/v57/i01
lavaan.survey doberski@uvt.nl
Appendix*
Appendix
lavaan.survey doberski@uvt.nl
Appendix*
Specifying regression relations in lavaan
Dependent variable
``regressed
on''
independent variables
selfconcept ~ efficacy + ESCS + male
efficacy ~ selfconcept + school.type +
ESCS + male
math ~ selfconcept + efficacy +
school.type + ESCS + male
lavaan.survey doberski@uvt.nl
Appendix*
PISA data: Math self-concept (ST32Q02,4,6,7,9)
lavaan.survey doberski@uvt.nl
Appendix*
Measurement model for self-concept
neg.selfconcept =~ ST32Q02 + ST32Q04 +
ST32Q06 + ST32Q07 +
ST32Q09
lavaan.survey doberski@uvt.nl
Appendix*
Measurement model for self-concept
Latent variable
neg.selfconcept =~ ST32Q02 + ST32Q04 +
ST32Q06 + ST32Q07 +
ST32Q09
lavaan.survey doberski@uvt.nl
Appendix*
Measurement model for self-concept
Latent variable ``measured by''
neg.selfconcept =~ ST32Q02 + ST32Q04 +
ST32Q06 + ST32Q07 +
ST32Q09
lavaan.survey doberski@uvt.nl
Appendix*
Measurement model for self-concept
Latent variable ``measured by'' obs. variables
neg.selfconcept =~ ST32Q02 + ST32Q04 +
ST32Q06 + ST32Q07 +
ST32Q09
lavaan.survey doberski@uvt.nl
Appendix*
PISA data: math self-efficicacy (ST31Q01--08)
lavaan.survey doberski@uvt.nl
Appendix*
Measurement model for self-efficacy
neg.efficacy =~ ST31Q01 + ST31Q02 + ST31Q03 +
ST31Q04 + ST31Q05 + ST31Q06 +
ST31Q07 + ST31Q08
lavaan.survey doberski@uvt.nl
Appendix*
[1/3] PISA sampling design: clustering
• Target population in each country: 15-year-old students
attending educational institutions in grades 7+;
• Minimum of 150 schools selected/country;
• Within each participating school, (usually 35) students
randomly selected with equal probability;
• Total sample size at least 4,500+ students.
• Other complications (with sampling):
• Getting a population frame
• School nonresponse
• Student nonresponse
• Small schools (< 35 students): representation vs. costs
• Countries with few schools
lavaan.survey doberski@uvt.nl
Appendix*
[2/3] PISA sampling design: stratification
• Before sampling, schools stratified in the sampling frame;
• Schools classified into ``like'' groups, for purpose:
• improve efficiency of the sample design;
• disproportionate sample allocations to specific groups of
schools, e.g. states, provinces, or other regions;
• ensure all parts of population included in the sample;
• representation of specific groups in sample.
lavaan.survey doberski@uvt.nl
Appendix*
PISA sampling: stratification in Belgium
• Region × Public/private education → 23 strata
• Then sampling within stratum with probability
proportional to:
• (Flanders) ISCED; Retention Rate; Vocational/Special
Education; Percentage of Girls;
• (French Community) National/International School;
Retention Rate; Vocational-Special Education/Other;
• (German Community) Public/Private.
lavaan.survey doberski@uvt.nl
Appendix*
[3/3] Weighting
wij = w1iw2ijf1if2ijt1i
w1i: Base weight for school i;
w2ij: Base weight for student j in school i;
f2ij, f1i: Nonresponse factor for student and school
nonparticipation;
t1i: Weight trimming factor.
lavaan.survey doberski@uvt.nl
Appendix*
Weighting
wij = wbrr
i w1iw2ijf1if2ijt1i
w1i: Base weight for school i;
w2ij: Base weight for student j in school i;
f2ij, f1i: Nonresponse factor for student and school
nonparticipation;
t1i: Weight trimming factor.
wbrr
ij : ``Balanced-repeated replication'' weight (80 replications
per case) created by the WesVar software
lavaan.survey doberski@uvt.nl
Appendix*
Balanced-repeated replication (BRR), Fay's ρ = 0.5
lavaan.survey doberski@uvt.nl
Appendix*
Balanced-repeated replication (BRR), Fay's ρ = 0.5
lavaan.survey doberski@uvt.nl
Appendix*
Balanced-repeated replication (BRR), Fay's ρ = 0.5
lavaan.survey doberski@uvt.nl
Appendix*
Structural parameter estimates and standard errors
taking nonnormality into account
Estimate Std.err Z-value P(>|z|)
Regressions:
neg.selfconcept ~
neg.efficacy -0.021 0.032 -0.657 0.511
ESCS -0.062 0.018 -3.381 0.001
male -0.389 0.027 -14.607 0.000
neg.efficacy ~
neg.selfcncpt 0.568 0.046 12.354 0.000
school.type 0.530 0.022 24.331 0.000
ESCS -0.252 0.016 -16.199 0.000
male -0.360 0.031 -11.668 0.000
math ~
neg.selfcncpt -0.179 0.015 -11.644 0.000
neg.efficacy -0.239 0.015 -16.413 0.000
school.type -0.606 0.019 -32.154 0.000
ESCS 0.312 0.014 22.511 0.000
male 0.009 0.024 0.375 0.708
lavaan.survey doberski@uvt.nl
Appendix*
Model fit taking nonnormality into account
Used Total
Number of observations 7785 8796
Estimator ML Robust
Minimum Function Test Statistic 8088.256 7275.544
Degrees of freedom 158 158
P-value (Chi-square) 0.000 0.000
Scaling correction factor 1.112
for the Satorra-Bentler correction
Comparative Fit Index (CFI) 0.921 0.921
Tucker-Lewis Index (TLI) 0.907 0.907
RMSEA 0.080 0.076
P-value RMSEA <= 0.05 0.000 0.000
SRMR 0.056 0.056
lavaan.survey doberski@uvt.nl
Appendix*
Complex survey SEM: fit measures
Number of observations 7785
Estimator ML Robust
Minimum Function Test Statistic 8187.514 5642.873
Degrees of freedom 158 158
P-value (Chi-square) 0.000 0.000
Scaling correction factor 1.451
for the Satorra-Bentler correction
Comparative Fit Index (CFI) 0.921 0.894
Tucker-Lewis Index (TLI) 0.906 0.875
RMSEA 0.081 0.067
P-value RMSEA <= 0.05 0.000 0.000
SRMR 0.058 0.058
lavaan.survey doberski@uvt.nl

Contenu connexe

Tendances

Tendances (20)

Data Visualization
Data VisualizationData Visualization
Data Visualization
 
3 data visualization
3 data visualization3 data visualization
3 data visualization
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic net
 
Data & Analytics Club - Data Visualization Workshop
Data & Analytics Club - Data Visualization WorkshopData & Analytics Club - Data Visualization Workshop
Data & Analytics Club - Data Visualization Workshop
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
 
Scaling and Normalization
Scaling and NormalizationScaling and Normalization
Scaling and Normalization
 
Lasso regression
Lasso regressionLasso regression
Lasso regression
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)
 
Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning
 
Danish Business Authority: Explainability and causality in relation to ML Ops
Danish Business Authority: Explainability and causality in relation to ML OpsDanish Business Authority: Explainability and causality in relation to ML Ops
Danish Business Authority: Explainability and causality in relation to ML Ops
 
Data analysis
Data analysisData analysis
Data analysis
 
The How and Why of Feature Engineering
The How and Why of Feature EngineeringThe How and Why of Feature Engineering
The How and Why of Feature Engineering
 
Training Series - Intro to Neo4j
Training Series - Intro to Neo4jTraining Series - Intro to Neo4j
Training Series - Intro to Neo4j
 
An Overview of Simple Linear Regression
An Overview of Simple Linear RegressionAn Overview of Simple Linear Regression
An Overview of Simple Linear Regression
 
Smarter Fraud Detection With Graph Data Science
Smarter Fraud Detection With Graph Data ScienceSmarter Fraud Detection With Graph Data Science
Smarter Fraud Detection With Graph Data Science
 
Understanding Feature Space in Machine Learning
Understanding Feature Space in Machine LearningUnderstanding Feature Space in Machine Learning
Understanding Feature Space in Machine Learning
 
Data Visualization: Impact, Intrigue, Value Add for APLIC 2014
Data Visualization: Impact, Intrigue, Value Add for APLIC 2014Data Visualization: Impact, Intrigue, Value Add for APLIC 2014
Data Visualization: Impact, Intrigue, Value Add for APLIC 2014
 
Time Series Analysis Ravi
Time Series Analysis RaviTime Series Analysis Ravi
Time Series Analysis Ravi
 
Data visualisation & analytics with Tableau
Data visualisation & analytics with Tableau Data visualisation & analytics with Tableau
Data visualisation & analytics with Tableau
 

En vedette

Multidirectional survey measurement errors: the latent class MTMM model
Multidirectional survey measurement errors: the latent class MTMM modelMultidirectional survey measurement errors: the latent class MTMM model
Multidirectional survey measurement errors: the latent class MTMM model
Daniel Oberski
 

En vedette (9)

ESRA2015 course: Latent Class Analysis for Survey Research
ESRA2015 course: Latent Class Analysis for Survey ResearchESRA2015 course: Latent Class Analysis for Survey Research
ESRA2015 course: Latent Class Analysis for Survey Research
 
Complex sampling in latent variable models
Complex sampling in latent variable modelsComplex sampling in latent variable models
Complex sampling in latent variable models
 
How good are administrative register data and what can we do about it?
How good are administrative register data and what can we do about it?How good are administrative register data and what can we do about it?
How good are administrative register data and what can we do about it?
 
A measure to evaluate latent variable model fit by sensitivity analysis
A measure to evaluate latent variable model fit by sensitivity analysisA measure to evaluate latent variable model fit by sensitivity analysis
A measure to evaluate latent variable model fit by sensitivity analysis
 
Multidirectional survey measurement errors: the latent class MTMM model
Multidirectional survey measurement errors: the latent class MTMM modelMultidirectional survey measurement errors: the latent class MTMM model
Multidirectional survey measurement errors: the latent class MTMM model
 
Predicting the quality of a survey question from its design characteristics: SQP
Predicting the quality of a survey question from its design characteristics: SQPPredicting the quality of a survey question from its design characteristics: SQP
Predicting the quality of a survey question from its design characteristics: SQP
 
Predicting the quality of a survey question from its design characteristics
Predicting the quality of a survey question from its design characteristicsPredicting the quality of a survey question from its design characteristics
Predicting the quality of a survey question from its design characteristics
 
An introduction to structural equation models in R using the Lavaan package
An introduction to structural equation models in R using the Lavaan packageAn introduction to structural equation models in R using the Lavaan package
An introduction to structural equation models in R using the Lavaan package
 
Detecting local dependence in latent class models
Detecting local dependence in latent class modelsDetecting local dependence in latent class models
Detecting local dependence in latent class models
 

Similaire à lavaan.survey: An R package for complex survey analysis of structural equation models

SQCFramework: SPARQL Query containment Benchmark Generation Framework
SQCFramework: SPARQL Query containment  Benchmark Generation Framework SQCFramework: SPARQL Query containment  Benchmark Generation Framework
SQCFramework: SPARQL Query containment Benchmark Generation Framework
Muhammad Saleem
 
A WHIRLWIND TOUR OF ACADEMIC TECHNIQUES FOR REAL-WORLD SECURITY RESEARCHERS
A WHIRLWIND TOUR OF ACADEMIC TECHNIQUES FOR REAL-WORLD SECURITY RESEARCHERSA WHIRLWIND TOUR OF ACADEMIC TECHNIQUES FOR REAL-WORLD SECURITY RESEARCHERS
A WHIRLWIND TOUR OF ACADEMIC TECHNIQUES FOR REAL-WORLD SECURITY RESEARCHERS
Silvio Cesare
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
South West Data Meetup
 

Similaire à lavaan.survey: An R package for complex survey analysis of structural equation models (20)

SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
SQCFramework: SPARQL Query Containment Benchmarks Generation FrameworkSQCFramework: SPARQL Query Containment Benchmarks Generation Framework
SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
 
[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection
 
Data mining 8 estimasi linear regression
Data mining 8   estimasi linear regressionData mining 8   estimasi linear regression
Data mining 8 estimasi linear regression
 
On the effectiveness of feature set augmentation using clusters of word embed...
On the effectiveness of feature set augmentation using clusters of word embed...On the effectiveness of feature set augmentation using clusters of word embed...
On the effectiveness of feature set augmentation using clusters of word embed...
 
Essay on-data-analysis
Essay on-data-analysisEssay on-data-analysis
Essay on-data-analysis
 
SQCFramework: SPARQL Query containment Benchmark Generation Framework
SQCFramework: SPARQL Query containment  Benchmark Generation Framework SQCFramework: SPARQL Query containment  Benchmark Generation Framework
SQCFramework: SPARQL Query containment Benchmark Generation Framework
 
A WHIRLWIND TOUR OF ACADEMIC TECHNIQUES FOR REAL-WORLD SECURITY RESEARCHERS
A WHIRLWIND TOUR OF ACADEMIC TECHNIQUES FOR REAL-WORLD SECURITY RESEARCHERSA WHIRLWIND TOUR OF ACADEMIC TECHNIQUES FOR REAL-WORLD SECURITY RESEARCHERS
A WHIRLWIND TOUR OF ACADEMIC TECHNIQUES FOR REAL-WORLD SECURITY RESEARCHERS
 
Multimodal Residual Learning for Visual Question-Answering
Multimodal Residual Learning for Visual Question-AnsweringMultimodal Residual Learning for Visual Question-Answering
Multimodal Residual Learning for Visual Question-Answering
 
2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Machine Learning - Simple Linear Regression
Machine Learning - Simple Linear RegressionMachine Learning - Simple Linear Regression
Machine Learning - Simple Linear Regression
 
ECCV WS 2012 (Frank)
ECCV WS 2012 (Frank)ECCV WS 2012 (Frank)
ECCV WS 2012 (Frank)
 
Fast Distributed Online Classification
Fast Distributed Online ClassificationFast Distributed Online Classification
Fast Distributed Online Classification
 
[M3A4] Data Analysis and Interpretation Specialization
[M3A4] Data Analysis and Interpretation Specialization[M3A4] Data Analysis and Interpretation Specialization
[M3A4] Data Analysis and Interpretation Specialization
 
analysis part 02.pptx
analysis part 02.pptxanalysis part 02.pptx
analysis part 02.pptx
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5
 
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
 
Topic Set Size Design with Variance Estimates from Two-Way ANOVA
Topic Set Size Design with Variance Estimates from Two-Way ANOVATopic Set Size Design with Variance Estimates from Two-Way ANOVA
Topic Set Size Design with Variance Estimates from Two-Way ANOVA
 
Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"
 

Dernier

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 

Dernier (20)

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 

lavaan.survey: An R package for complex survey analysis of structural equation models

  • 1. Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions lavaan.survey: An R package for complex survey analysis of structural equation models Daniel Oberski Department of methodology and statistics lavaan.survey doberski@uvt.nl
  • 2. Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions lavaan.survey doberski@uvt.nl
  • 3. Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions What is "complex" about a sample survey? lavaan.survey doberski@uvt.nl
  • 4. Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions 1. Clustering 3. Weighting 2. Stratification lavaan.survey doberski@uvt.nl
  • 5. Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions Why account for complex sampling in structural equation modeling? • Clustering, stratification, and weighting all affect the standard errors. • Weighting affects the point estimates; (Skinner, Holt & Smith 1989; Pfefferman 1993; Muthén & Satorra 1995; Asparouhov 2005; Asparouhov, & Muthén 2005; Stapleton 2006; Asparouhov & Muthén 2007; Fuller 2009; Bollen, Tueller, & Oberski 2013) lavaan.survey doberski@uvt.nl
  • 6. Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions lavaan.survey Input: 1. A lavaan fit object (SEM analysis) 2. A survey design object (complex sample design) Output: A lavaan fit object (SEM analysis), accounting for the complex sample design lavaan.survey doberski@uvt.nl
  • 7. Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions lavaan.survey Input: 1. A lavaan fit object (SEM analysis) 2. A survey design object (complex sample design) Output: A lavaan fit object (SEM analysis), accounting for the complex sample design Example call: lavaan.survey(lavaan.fit, survey.design) lavaan.survey doberski@uvt.nl
  • 8. Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions How to fit a structural equation model accounting for complex sampling using lavaan.survey in 3 steps: ..1 Fit the SEM using lavaan as you normally would → lavaan fit object; ..2 Define the complex sampling design to the survey package → svydesign or svrepdesign object; ..3 Call lavaan.survey with the two objects obtained in (1) and (2). → corrected lavaan fit object. Afterwards, usual lavaan apparatus available. lavaan.survey doberski@uvt.nl
  • 9. Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions Example complex sample SEM analysis from the literature Ferla, Valcke & Cai (2009). Academic self-efficacy and academic self-concept: Reconsidering structural relationships. Learning and Individual Differences, 19 (4). lavaan.survey doberski@uvt.nl
  • 10. Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions Structural equation model of Belgian PISA data lavaan.survey doberski@uvt.nl
  • 11. Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions PISA data in the lavaan.survey package load("pisa.be.2003.rda") > head(pisa.be.2003, 5)[,1:20] PV1MATH1 PV1MATH2 PV1MATH3 PV1MATH4 ST31Q01 ST31Q02 ST31Q03 ST31Q04 1 1.160 1.26 1.19 1.190 2 1 2 2 2 1.293 1.27 1.22 1.247 1 1 1 1 3 1.383 1.44 1.37 1.347 2 1 1 1 4 0.965 1.02 1.10 0.979 2 2 2 2 5 1.375 1.44 1.31 1.471 3 1 2 2 ST31Q05 ST31Q06 ST31Q07 ST31Q08 ST32Q02 ST32Q04 ST32Q06 ST32Q07 1 2 2 2 3 2 3 3 3 2 1 2 1 4 2 3 3 4 3 1 1 2 3 4 2 1 1 4 2 1 2 2 3 3 2 2 5 1 3 3 3 3 2 3 3 ST32Q09 ESCS male school.type 1 3 1.221 1 1 2 4 1.899 2 1 3 2 0.884 2 1 4 3 1.305 2 1 5 3 0.958 2 1 lavaan.survey doberski@uvt.nl
  • 12. Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions Belgium is a complex country. lavaan.survey doberski@uvt.nl
  • 13. Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions In addition to the observed variables, the raw data also contain 80 replicate weights generated by ``balanced-repeated replication'' (BRR) > pisa.be.2003[sample(1:8796), 21:101] W_FSTUWT W_FSTR1 W_FSTR2 W_FSTR3 ... W_FSTR80 2988 1.12 1.64 0.57 1.70 ... 0.528 8666 11.90 17.84 5.95 5.95 ... 5.948 290 16.89 25.33 8.44 8.44 ... 25.333 3505 14.80 8.25 7.90 20.67 ... 8.206 4289 22.70 32.40 10.87 10.80 ... 11.755 2352 13.09 20.62 6.26 18.79 ... 6.301 .... lavaan.survey doberski@uvt.nl
  • 14. Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions Step 1: Fitting the model with lavaan model.pisa <- " math =~ PV1MATH1 + PV1MATH2 + PV1MATH3 + PV1MATH4 neg.efficacy =~ ST31Q01 + ST31Q02 + ST31Q03 + ST31Q04 + ST31Q05 + ST31Q06 + ST31Q07 + ST31Q08 neg.selfconcept =~ ST32Q02 + ST32Q04 + ST32Q06 + ST32Q07 + ST32Q09 neg.selfconcept ~ neg.efficacy + ESCS + male neg.efficacy ~ neg.selfconcept + school.type + ESCS + male math ~ neg.selfconcept + neg.efficacy + school.type + ESCS + male " fit <- lavaan(model.pisa, data=pisa.be.2003, auto.var=TRUE, std.lv=TRUE, meanstructure=TRUE, int.ov.free=TRUE, estimator="MLM") lavaan.survey doberski@uvt.nl
  • 15. Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions Step 2: Letting survey know about the replicate weight design des.rep <- svrepdesign(ids=~1, weights=~W_FSTUWT, data=pisa.be.2003, repweights="W_FSTR[0-9]+", type="Fay", rho=0.5) More information on defining complex sampling objects: Lumley (2004). Complex Surveys: A Guide to Analysis using R. New York: Wiley. Thomas Lumley's website with tutorials: http://r-survey.r-forge.r-project.org/survey/index.html lavaan.survey doberski@uvt.nl
  • 16. Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions Complex survey SEM fit.surv <- lavaan.survey(lavaan.fit=fit, survey.design=des.rep) lavaan.survey doberski@uvt.nl
  • 17. Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions Point and standard error estimates using robust ML with and without correction for the sampling design using BRR replicate weights. Estimate s.e. Naive PML Naive PML creff neg.selfconcept ∼ neg.efficacy -0.021 -0.050 0.032 0.046 1.415 neg.efficacy ∼ neg.selfconcept 0.568 0.609 0.046 0.065 1.421 neg.efficacy ∼ school.type 0.530 0.518 0.022 0.022 1.009 math ∼ neg.selfconcept -0.179 -0.177 0.015 0.021 1.362 math ∼ neg.efficacy -0.239 -0.237 0.015 0.018 1.216 math ∼ school.type -0.606 -0.596 0.019 0.035 1.858 lavaan.survey doberski@uvt.nl
  • 18. Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions Conclusions lavaan.survey doberski@uvt.nl
  • 19. Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions Conclusions • When there are clusters, strata, and/or weights, you should usually adjust the SEM analysis for them; • Especially important for correct standard errors and model fit statistics. • lavaan.survey leverages the power of the lavaan and survey packages to let you do this. • Future: categorical data. lavaan.survey doberski@uvt.nl
  • 20. Complex sample surveys Using lavaan.survey Example lavaan.survey analysis Conclusions doberski@uvt.nl http://daob.nl Oberski, D.L. (2014). lavaan.survey: An R Package for Complex Survey Analysis of Structural Equation Models. Journal of Statistical Software, 57 (1). http://www.jstatsoft.org/v57/i01 lavaan.survey doberski@uvt.nl
  • 22. Appendix* Specifying regression relations in lavaan Dependent variable ``regressed on'' independent variables selfconcept ~ efficacy + ESCS + male efficacy ~ selfconcept + school.type + ESCS + male math ~ selfconcept + efficacy + school.type + ESCS + male lavaan.survey doberski@uvt.nl
  • 23. Appendix* PISA data: Math self-concept (ST32Q02,4,6,7,9) lavaan.survey doberski@uvt.nl
  • 24. Appendix* Measurement model for self-concept neg.selfconcept =~ ST32Q02 + ST32Q04 + ST32Q06 + ST32Q07 + ST32Q09 lavaan.survey doberski@uvt.nl
  • 25. Appendix* Measurement model for self-concept Latent variable neg.selfconcept =~ ST32Q02 + ST32Q04 + ST32Q06 + ST32Q07 + ST32Q09 lavaan.survey doberski@uvt.nl
  • 26. Appendix* Measurement model for self-concept Latent variable ``measured by'' neg.selfconcept =~ ST32Q02 + ST32Q04 + ST32Q06 + ST32Q07 + ST32Q09 lavaan.survey doberski@uvt.nl
  • 27. Appendix* Measurement model for self-concept Latent variable ``measured by'' obs. variables neg.selfconcept =~ ST32Q02 + ST32Q04 + ST32Q06 + ST32Q07 + ST32Q09 lavaan.survey doberski@uvt.nl
  • 28. Appendix* PISA data: math self-efficicacy (ST31Q01--08) lavaan.survey doberski@uvt.nl
  • 29. Appendix* Measurement model for self-efficacy neg.efficacy =~ ST31Q01 + ST31Q02 + ST31Q03 + ST31Q04 + ST31Q05 + ST31Q06 + ST31Q07 + ST31Q08 lavaan.survey doberski@uvt.nl
  • 30. Appendix* [1/3] PISA sampling design: clustering • Target population in each country: 15-year-old students attending educational institutions in grades 7+; • Minimum of 150 schools selected/country; • Within each participating school, (usually 35) students randomly selected with equal probability; • Total sample size at least 4,500+ students. • Other complications (with sampling): • Getting a population frame • School nonresponse • Student nonresponse • Small schools (< 35 students): representation vs. costs • Countries with few schools lavaan.survey doberski@uvt.nl
  • 31. Appendix* [2/3] PISA sampling design: stratification • Before sampling, schools stratified in the sampling frame; • Schools classified into ``like'' groups, for purpose: • improve efficiency of the sample design; • disproportionate sample allocations to specific groups of schools, e.g. states, provinces, or other regions; • ensure all parts of population included in the sample; • representation of specific groups in sample. lavaan.survey doberski@uvt.nl
  • 32. Appendix* PISA sampling: stratification in Belgium • Region × Public/private education → 23 strata • Then sampling within stratum with probability proportional to: • (Flanders) ISCED; Retention Rate; Vocational/Special Education; Percentage of Girls; • (French Community) National/International School; Retention Rate; Vocational-Special Education/Other; • (German Community) Public/Private. lavaan.survey doberski@uvt.nl
  • 33. Appendix* [3/3] Weighting wij = w1iw2ijf1if2ijt1i w1i: Base weight for school i; w2ij: Base weight for student j in school i; f2ij, f1i: Nonresponse factor for student and school nonparticipation; t1i: Weight trimming factor. lavaan.survey doberski@uvt.nl
  • 34. Appendix* Weighting wij = wbrr i w1iw2ijf1if2ijt1i w1i: Base weight for school i; w2ij: Base weight for student j in school i; f2ij, f1i: Nonresponse factor for student and school nonparticipation; t1i: Weight trimming factor. wbrr ij : ``Balanced-repeated replication'' weight (80 replications per case) created by the WesVar software lavaan.survey doberski@uvt.nl
  • 35. Appendix* Balanced-repeated replication (BRR), Fay's ρ = 0.5 lavaan.survey doberski@uvt.nl
  • 36. Appendix* Balanced-repeated replication (BRR), Fay's ρ = 0.5 lavaan.survey doberski@uvt.nl
  • 37. Appendix* Balanced-repeated replication (BRR), Fay's ρ = 0.5 lavaan.survey doberski@uvt.nl
  • 38. Appendix* Structural parameter estimates and standard errors taking nonnormality into account Estimate Std.err Z-value P(>|z|) Regressions: neg.selfconcept ~ neg.efficacy -0.021 0.032 -0.657 0.511 ESCS -0.062 0.018 -3.381 0.001 male -0.389 0.027 -14.607 0.000 neg.efficacy ~ neg.selfcncpt 0.568 0.046 12.354 0.000 school.type 0.530 0.022 24.331 0.000 ESCS -0.252 0.016 -16.199 0.000 male -0.360 0.031 -11.668 0.000 math ~ neg.selfcncpt -0.179 0.015 -11.644 0.000 neg.efficacy -0.239 0.015 -16.413 0.000 school.type -0.606 0.019 -32.154 0.000 ESCS 0.312 0.014 22.511 0.000 male 0.009 0.024 0.375 0.708 lavaan.survey doberski@uvt.nl
  • 39. Appendix* Model fit taking nonnormality into account Used Total Number of observations 7785 8796 Estimator ML Robust Minimum Function Test Statistic 8088.256 7275.544 Degrees of freedom 158 158 P-value (Chi-square) 0.000 0.000 Scaling correction factor 1.112 for the Satorra-Bentler correction Comparative Fit Index (CFI) 0.921 0.921 Tucker-Lewis Index (TLI) 0.907 0.907 RMSEA 0.080 0.076 P-value RMSEA <= 0.05 0.000 0.000 SRMR 0.056 0.056 lavaan.survey doberski@uvt.nl
  • 40. Appendix* Complex survey SEM: fit measures Number of observations 7785 Estimator ML Robust Minimum Function Test Statistic 8187.514 5642.873 Degrees of freedom 158 158 P-value (Chi-square) 0.000 0.000 Scaling correction factor 1.451 for the Satorra-Bentler correction Comparative Fit Index (CFI) 0.921 0.894 Tucker-Lewis Index (TLI) 0.906 0.875 RMSEA 0.081 0.067 P-value RMSEA <= 0.05 0.000 0.000 SRMR 0.058 0.058 lavaan.survey doberski@uvt.nl