1. R をつかったカテゴリカル因子分析
小杉考司
やまだいあ~る
2012/10/05
Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 1/9
2. Why we use...
因子分析をしたいけど,3 件法だったらダメっていわれた
5 件法でデータを取ったけど,データが偏っていた
因子分析をしたけど,項目がどんどん落ちちゃ って・・
・
Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 2/9
3. FA vs categorical FA
因子分析とは,多変量解析のひとつで,たくさんの質問項目に共
通する要因を取り出してくる技術。
具体的な計算手続きは, 次の通りです。
1 データから相関行列を作成
2 相関行列を固有値分解
3 固有値から因子の数を決める。固有ベクトルから因子負荷量
を求める。
ここで,相関行列とは,「ピアソンの積率相関係数」であり,これ
を求めるためには データが間隔尺度水準以上 で得られている必要
がある。
Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 3/9
4. One of the reasons
3 件法は間隔尺度水準とはいえない(統計的には 7 件法以上)
データの偏り=上方・下方のいずれかのカテゴリが弁別でき
てない
分析の元になる相関係数が小さい値=偏っているので分散が
小さい
Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 4/9
12. > fa.parallel(polychoric.cor$rho,n.obs=355)
Parallel analysis suggests that the number of factors = 3 and the number of components =
> fa.result.peason <- fa(peason.cor,n.obs=355,fm="gls",nfactors=3,rotate="promax")
> fa.result.polych <- fa(polychoric.cor$rho,n.obs=355,fm="gls",nfactors=3,rotate="promax")
> print(fa.result.peason,digit=3,sort=T)
Factor Analysis using method = gls
Call: fa(r = peason.cor, nfactors = 3, n.obs = 355, rotate = "promax",
fm = "gls")
Standardized loadings (pattern matrix) based upon correlation matrix
item GLS2 GLS1 GLS3 h2 u2
V8 8 0.695 0.073 -0.076 0.468 0.532
V7 7 0.583 0.032 0.028 0.377 0.623
V6 6 0.529 -0.062 0.121 0.333 0.667
V1 1 0.056 0.886 -0.091 0.721 0.279
V3 3 -0.006 0.453 0.083 0.261 0.739
V4 4 -0.015 0.017 0.696 0.490 0.510
V5 5 0.023 -0.145 0.692 0.377 0.623
V2 2 -0.063 0.289 0.313 0.273 0.727
GLS2 GLS1 GLS3
SS loadings 1.140 1.094 1.065
Proportion Var 0.143 0.137 0.133
Cumulative Var 0.143 0.279 0.412
Proportion Explained 0.346 0.332 0.323
Cumulative Proportion 0.346 0.677 1.000
With factor correlations of
GLS2 GLS1 GLS3
GLS2 1.000 0.409 0.566
GLS1 0.409 1.000 0.688
GLS3 0.566 0.688 1.000
Test of the hypothesis that 3 factors are sufficient.
The degrees of freedom for the null model are 28 and the objective function was 1.469 wit
The degrees of freedom for the model are 7 and the objective function was 0.03
The root mean square of the residuals (RMSR) is 0.014
The df corrected root mean square of the residuals is 0.04
The number of observations was 355 with Chi Square = 10.559 with prob < 0.159
Tucker Lewis Index of factoring reliability = 0.9706
RMSEA index = 0.0388 and the 90 % confidence intervals are NA 0.0814
BIC = -30.546
3
13. Fit based upon off diagonal values = 0.995
Measures of factor score adequacy
GLS2 GLS1 GLS3
Correlation of scores with factors 0.830 0.885 0.851
Multiple R square of scores with factors 0.689 0.783 0.723
Minimum correlation of possible factor scores 0.378 0.565 0.447
> print(fa.result.polych,digit=3,sort=T)
Factor Analysis using method = gls
Call: fa(r = polychoric.cor$rho, nfactors = 3, n.obs = 355, rotate = "promax",
fm = "gls")
Standardized loadings (pattern matrix) based upon correlation matrix
item GLS3 GLS1 GLS2 h2 u2
V5 5 0.806 -0.179 0.029 0.497 0.503
V4 4 0.709 0.024 0.020 0.543 0.457
V2 2 0.383 0.312 -0.069 0.376 0.624
V1 1 -0.138 0.976 0.069 0.826 0.174
V3 3 0.145 0.470 -0.028 0.326 0.674
V7 7 -0.019 0.052 0.657 0.447 0.553
V8 8 -0.038 0.097 0.650 0.452 0.548
V6 6 0.143 -0.083 0.555 0.365 0.635
GLS3 GLS1 GLS2
SS loadings 1.319 1.289 1.226
Proportion Var 0.165 0.161 0.153
Cumulative Var 0.165 0.326 0.479
Proportion Explained 0.344 0.336 0.320
Cumulative Proportion 0.344 0.680 1.000
With factor correlations of
GLS3 GLS1 GLS2
GLS3 1.000 0.716 0.522
GLS1 0.716 1.000 0.392
GLS2 0.522 0.392 1.000
Test of the hypothesis that 3 factors are sufficient.
The degrees of freedom for the null model are 28 and the objective function was 1.986 wit
The degrees of freedom for the model are 7 and the objective function was 0.055
The root mean square of the residuals (RMSR) is 0.017
The df corrected root mean square of the residuals is 0.048
The number of observations was 355 with Chi Square = 19.207 with prob < 0.00756
Tucker Lewis Index of factoring reliability = 0.9265
4
14. RMSEA index = 0.0711 and the 90 % confidence intervals are 0.0335 0.1085
BIC = -21.897
Fit based upon off diagonal values = 0.995
Measures of factor score adequacy
GLS3 GLS1 GLS2
Correlation of scores with factors 0.882 0.928 0.839
Multiple R square of scores with factors 0.779 0.862 0.704
Minimum correlation of possible factor scores 0.557 0.724 0.408
> #
> # sample <- subset(sample,select=c("V11","V13","V20","V5","V4","V17","V12","V15"))
> # write.table(sample,"cEFAsample.csv",sep=",",row.name=F,col.name=F,na="*")
>
>
> # mixed pattern
> sample.cat <- data.frame(lapply(sample[1:3],factor),sample[4:8])
> summary(sample.cat)
V1 V2 V3 V4 V5 V6
1: 2 1: 3 1 : 2 Min. :1.000 Min. :1.000 Min. :1.000
2: 26 2: 13 2 : 18 1st Qu.:3.000 1st Qu.:4.000 1st Qu.:3.000
3: 61 3: 31 3 : 60 Median :4.000 Median :4.000 Median :3.000
4:178 4:197 4 :206 Mean :3.853 Mean :3.955 Mean :3.138
5: 88 5:111 5 : 67 3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:4.000
NA's: 2 Max. :5.000 Max. :5.000 Max. :5.000
NA's :1
V7 V8
Min. :1.00 Min. :1.000
1st Qu.:2.00 1st Qu.:3.000
Median :3.00 Median :4.000
Mean :2.78 Mean :3.442
3rd Qu.:4.00 3rd Qu.:4.000
Max. :5.00 Max. :5.000
NA's :1
> hetcor.cor <- hetcor(sample.cat)
> hetcor.cor$correlations
V1 V2 V3 V4 V5 V6 V7
V1 1.0000000 0.4766232 0.4902862 0.4305458 0.2853987 0.2076320 0.3015123
V2 0.4766232 1.0000000 0.3740222 0.3757560 0.3093574 0.1839428 0.1175596
V3 0.4902862 0.3740222 1.0000000 0.2752806 0.2491626 0.1686548 0.1583849
V4 0.4305458 0.3757560 0.2752806 1.0000000 0.4202661 0.2636989 0.2758351
V5 0.2853987 0.3093574 0.2491626 0.4202661 1.0000000 0.2279503 0.2550014
V6 0.2076320 0.1839428 0.1686548 0.2636989 0.2279503 1.0000000 0.3414939
V7 0.3015123 0.1175596 0.1583849 0.2758351 0.2550014 0.3414939 1.0000000
V8 0.2663878 0.2378540 0.1553257 0.2324400 0.2175612 0.3937855 0.4146257
5
16. V7 7 0.020 0.597 0.034 0.391 0.609
V6 6 -0.031 0.520 0.109 0.328 0.672
V5 5 -0.131 -0.004 0.735 0.420 0.580
V4 4 0.078 0.014 0.613 0.459 0.541
GLS1 GLS2 GLS3
SS loadings 1.364 1.144 0.988
Proportion Var 0.171 0.143 0.123
Cumulative Var 0.171 0.314 0.437
Proportion Explained 0.390 0.327 0.283
Cumulative Proportion 0.390 0.717 1.000
With factor correlations of
GLS1 GLS2 GLS3
GLS1 1.000 0.409 0.704
GLS2 0.409 1.000 0.552
GLS3 0.704 0.552 1.000
Test of the hypothesis that 3 factors are sufficient.
The degrees of freedom for the null model are 28 and the objective function was 1.705 wit
The degrees of freedom for the model are 7 and the objective function was 0.046
The root mean square of the residuals (RMSR) is 0.016
The df corrected root mean square of the residuals is 0.046
The number of observations was 355 with Chi Square = 16.046 with prob < 0.0247
Tucker Lewis Index of factoring reliability = 0.9361
RMSEA index = 0.0613 and the 90 % confidence intervals are 0.0203 0.0998
BIC = -25.059
Fit based upon off diagonal values = 0.994
Measures of factor score adequacy
GLS1 GLS2 GLS3
Correlation of scores with factors 0.892 0.829 0.849
Multiple R square of scores with factors 0.795 0.688 0.721
Minimum correlation of possible factor scores 0.590 0.375 0.441
>
7