YamadaiR(Categorical Factor Analysis)

R をつかったカテゴリカル因子分析

小杉考司

やまだいあ～る

2012/10/05

Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 1/9

Why we use...

因子分析をしたいけど，3 件法だったらダメっていわれた
5 件法でデータを取ったけど，データが偏っていた
因子分析をしたけど，項目がどんどん落ちちゃって・・
・


FA vs categorical FA

因子分析とは，多変量解析のひとつで，たくさんの質問項目に共
通する要因を取り出してくる技術。
具体的な計算手続きは, 次の通りです。
1 データから相関行列を作成
2 相関行列を固有値分解
3 固有値から因子の数を決める。固有ベクトルから因子負荷量
を求める。
ここで，相関行列とは，「ピアソンの積率相関係数」であり，これ
を求めるためにはデータが間隔尺度水準以上で得られている必要
がある。


One of the reasons

3 件法は間隔尺度水準とはいえない（統計的には 7 件法以上）
データの偏り＝上方・下方のいずれかのカテゴリが弁別でき
てない
分析の元になる相関係数が小さい値＝偏っているので分散が
小さい


問題は，相関係数の出し方が「順序尺度水準」「名義尺度水準」に
対応していたら解決される。例えば狩野・三浦 (2002) によると、
順序尺度を分析するには
1 連続とみなす
2 多分相関係数（polychoric correlation coeﬃcient），多分系列相
関係数 (polyserial correlation coeﬃcient) を使う
3 多項分布に基づく方法をとる
の三択になるとしている。


順序尺度水準の相関係数とは

ポリコリック相関係数 Polychoric Correlation は「多分相関
係数」と訳される。順序尺度と順序尺度の相関係数である。
ポリシリアル相関係数 Polyserial Correlation は「多分系列相
関係数」あるいは「重双相関係数」と訳される。順序尺度と
連続尺度の相関係数である。
テトラコリック相関係数 Tetrachoric Correlation は四分相関
係数と訳される。四分は２×２、つまり二値データ同士の相
関係数である。これはポリコリック相関係数の特殊な場合で
ある。


images of latent continuity

Figure : image of latent continuity and expression

変数 x の奥に潜在変数 ξ があり、それが正規分布していると仮定す
る。変数 x と ξ の関係は次のように書ける。
x = 1 ξ < a1
x = 2 a1 ≤ ξ < a2
x = 3 a2 ≤ ξ < a3 (1)
.
. .
.
. .
x = s as−1 ≤ ξ

順序尺度の相関係数
目に見えない潜在変数レベルで二変数が相関しており，それ
がカテゴリカルに表現されていると考える。
そうすると求めるのは，潜在レベルでの相関係数 ρ と変数
X,Y のカテゴリに見られる閾値である。
閾値はクロス集計表の周辺度数から近似することも出来る
(2step-ML)

↓

天井効果・床効果のような歪みを閾値で適切に調節するイ
メージ。
なので，一般的にカテゴリカルな相関係数のほうが（無理矢
理等間隔性を仮定している）ピアソンの相関係数よりも大き
くなる。
相関係数が大きくなるので，因子も引っ張りだしやすくなる。

Follow me with R code...

以下コード


> library(psych)
> library(polycor)
> # sample statistics
> sample <- read.csv("cEFAsample.csv",head=F,na.strings="*")
> head(sample)
V1 V2 V3 V4 V5 V6 V7 V8
1 1 1 1 1 4 1 1 1
2 3 4 4 1 4 4 1 1
3 3 4 4 3 4 3 3 4
4 2 4 5 2 2 4 1 4
5 2 2 2 3 4 2 2 3
6 3 3 5 3 3 2 2 3
> summary(sample)
V1 V2 V3 V4
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:3.500 1st Qu.:4.000 1st Qu.:4.000 1st Qu.:3.000
Median :4.000 Median :4.000 Median :4.000 Median :4.000
Mean :3.913 Mean :4.127 Mean :3.901 Mean :3.853
3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
NA's :2 NA's :1
V5 V6 V7 V8
Min. :1.000 Min. :1.000 Min. :1.00 Min. :1.000
1st Qu.:4.000 1st Qu.:3.000 1st Qu.:2.00 1st Qu.:3.000
Median :4.000 Median :3.000 Median :3.00 Median :4.000
Mean :3.955 Mean :3.138 Mean :2.78 Mean :3.442
3rd Qu.:5.000 3rd Qu.:4.000 3rd Qu.:4.00 3rd Qu.:4.000
Max. :5.000 Max. :5.000 Max. :5.00 Max. :5.000
NA's :1
> table(sample$V1)
1 2 3 4 5
2 26 61 178 88
> describe(sample)
var n mean sd median trimmed mad min max range skew kurtosis se
V1 1 355 3.91 0.87 4 4.00 0.00 1 5 4 -0.70 0.22 0.05
V2 2 355 4.13 0.78 4 4.22 0.00 1 5 4 -1.11 2.08 0.04
V3 3 353 3.90 0.78 4 3.95 0.00 1 5 4 -0.76 0.96 0.04
V4 4 354 3.85 0.90 4 3.94 0.00 1 5 4 -0.82 0.66 0.05
V5 5 355 3.95 0.87 4 4.04 1.48 1 5 4 -0.71 0.24 0.05
V6 6 355 3.14 0.95 3 3.16 1.48 1 5 4 -0.22 -0.12 0.05
V7 7 354 2.78 1.01 3 2.79 1.48 1 5 4 0.14 -0.70 0.05
V8 8 355 3.44 1.00 4 3.47 1.48 1 5 4 -0.42 -0.28 0.05

1

> # peason cor
> peason.cor <- cor(sample,use="complete.obs")
> print(peason.cor,digit=2)

V1 V2 V3 V4 V5 V6 V7 V8
V1 1.00 0.380 0.43 0.40 0.26 0.19 0.285 0.26
V2 0.38 1.000 0.28 0.34 0.27 0.16 0.099 0.21
V3 0.43 0.277 1.00 0.26 0.21 0.15 0.150 0.16
V4 0.40 0.339 0.26 1.00 0.42 0.26 0.276 0.23
V5 0.26 0.265 0.21 0.42 1.00 0.23 0.255 0.22
V6 0.19 0.157 0.15 0.26 0.23 1.00 0.341 0.39
V7 0.29 0.099 0.15 0.28 0.26 0.34 1.000 0.41
V8 0.26 0.212 0.16 0.23 0.22 0.39 0.415 1.00

> # polychoric cor
> polychoric.cor <- polychoric(sample)

> print(polychoric.cor$rho)

V1 V2 V3 V4 V5 V6 V7
V1 1.0000000 0.4693292 0.4993862 0.4702445 0.3260640 0.2015360 0.3172379
V2 0.4693292 1.0000000 0.3661174 0.4283065 0.3544777 0.1925806 0.1164603
V3 0.4993862 0.3661174 1.0000000 0.3131351 0.2971062 0.1704954 0.1565841
V4 0.4702445 0.4283065 0.3131351 1.0000000 0.5128292 0.2805638 0.3020316
V5 0.3260640 0.3544777 0.2971062 0.5128292 1.0000000 0.2612329 0.2785856
V6 0.2015360 0.1925806 0.1704954 0.2805638 0.2612329 1.0000000 0.3832876
V7 0.3172379 0.1164603 0.1565841 0.3020316 0.2785856 0.3832876 1.0000000
V8 0.2939444 0.2544516 0.1885443 0.2562720 0.2513339 0.4138156 0.4444297
V8
V1 0.2939444
V2 0.2544516
V3 0.1885443
V4 0.2562720
V5 0.2513339
V6 0.4138156
V7 0.4444297
V8 1.0000000

> #
> # compare, peason vs polycor
> #
>
> # FA
> fa.parallel(peason.cor,n.obs=355)

Parallel analysis suggests that the number of factors = 3 and the number of components =

2

> fa.parallel(polychoric.cor$rho,n.obs=355)
> fa.result.peason <- fa(peason.cor,n.obs=355,fm="gls",nfactors=3,rotate="promax")
> fa.result.polych <- fa(polychoric.cor$rho,n.obs=355,fm="gls",nfactors=3,rotate="promax")
> print(fa.result.peason,digit=3,sort=T)
Factor Analysis using method = gls
Call: fa(r = peason.cor, nfactors = 3, n.obs = 355, rotate = "promax",
fm = "gls")
Standardized loadings (pattern matrix) based upon correlation matrix
item GLS2 GLS1 GLS3 h2 u2
V8 8 0.695 0.073 -0.076 0.468 0.532
V7 7 0.583 0.032 0.028 0.377 0.623
V6 6 0.529 -0.062 0.121 0.333 0.667
V1 1 0.056 0.886 -0.091 0.721 0.279
V3 3 -0.006 0.453 0.083 0.261 0.739
V4 4 -0.015 0.017 0.696 0.490 0.510
V5 5 0.023 -0.145 0.692 0.377 0.623
V2 2 -0.063 0.289 0.313 0.273 0.727

GLS2 GLS1 GLS3
SS loadings 1.140 1.094 1.065
Proportion Var 0.143 0.137 0.133
Cumulative Var 0.143 0.279 0.412
Proportion Explained 0.346 0.332 0.323
Cumulative Proportion 0.346 0.677 1.000

With factor correlations of
GLS2 GLS1 GLS3
GLS2 1.000 0.409 0.566
GLS1 0.409 1.000 0.688
GLS3 0.566 0.688 1.000

Test of the hypothesis that 3 factors are sufficient.

The degrees of freedom for the null model are 28 and the objective function was 1.469 wit
The degrees of freedom for the model are 7 and the objective function was 0.03

The root mean square of the residuals (RMSR) is 0.014
The df corrected root mean square of the residuals is 0.04
The number of observations was 355 with Chi Square = 10.559 with prob < 0.159

Tucker Lewis Index of factoring reliability = 0.9706
RMSEA index = 0.0388 and the 90 % confidence intervals are NA 0.0814
BIC = -30.546

3

Fit based upon off diagonal values = 0.995
Measures of factor score adequacy
GLS2 GLS1 GLS3
Correlation of scores with factors 0.830 0.885 0.851
Multiple R square of scores with factors 0.689 0.783 0.723
Minimum correlation of possible factor scores 0.378 0.565 0.447

> print(fa.result.polych,digit=3,sort=T)

Call: fa(r = polychoric.cor$rho, nfactors = 3, n.obs = 355, rotate = "promax",
fm = "gls")
V5 5 0.806 -0.179 0.029 0.497 0.503
V4 4 0.709 0.024 0.020 0.543 0.457
V2 2 0.383 0.312 -0.069 0.376 0.624
V1 1 -0.138 0.976 0.069 0.826 0.174
V3 3 0.145 0.470 -0.028 0.326 0.674
V7 7 -0.019 0.052 0.657 0.447 0.553
V8 8 -0.038 0.097 0.650 0.452 0.548
V6 6 0.143 -0.083 0.555 0.365 0.635

GLS3 GLS1 GLS2
SS loadings 1.319 1.289 1.226

GLS3 GLS1 GLS2
GLS3 1.000 0.716 0.522
GLS1 0.716 1.000 0.392
GLS2 0.522 0.392 1.000





4

RMSEA index = 0.0711 and the 90 % confidence intervals are 0.0335 0.1085
BIC = -21.897
GLS3 GLS1 GLS2

> #
> # sample <- subset(sample,select=c("V11","V13","V20","V5","V4","V17","V12","V15"))
> # write.table(sample,"cEFAsample.csv",sep=",",row.name=F,col.name=F,na="*")
>
>
> # mixed pattern
> sample.cat <- data.frame(lapply(sample[1:3],factor),sample[4:8])
> summary(sample.cat)

V1 V2 V3 V4 V5 V6
1: 2 1: 3 1 : 2 Min. :1.000 Min. :1.000 Min. :1.000
2: 26 2: 13 2 : 18 1st Qu.:3.000 1st Qu.:4.000 1st Qu.:3.000
3: 61 3: 31 3 : 60 Median :4.000 Median :4.000 Median :3.000
4:178 4:197 4 :206 Mean :3.853 Mean :3.955 Mean :3.138
5: 88 5:111 5 : 67 3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:4.000
NA's: 2 Max. :5.000 Max. :5.000 Max. :5.000
NA's :1
V7 V8
Min. :1.00 Min. :1.000
1st Qu.:2.00 1st Qu.:3.000
Median :3.00 Median :4.000
Mean :2.78 Mean :3.442
3rd Qu.:4.00 3rd Qu.:4.000
Max. :5.00 Max. :5.000
NA's :1

> hetcor.cor <- hetcor(sample.cat)
> hetcor.cor$correlations

V1 V2 V3 V4 V5 V6 V7
V1 1.0000000 0.4766232 0.4902862 0.4305458 0.2853987 0.2076320 0.3015123
V2 0.4766232 1.0000000 0.3740222 0.3757560 0.3093574 0.1839428 0.1175596
V3 0.4902862 0.3740222 1.0000000 0.2752806 0.2491626 0.1686548 0.1583849
V4 0.4305458 0.3757560 0.2752806 1.0000000 0.4202661 0.2636989 0.2758351
V5 0.2853987 0.3093574 0.2491626 0.4202661 1.0000000 0.2279503 0.2550014
V6 0.2076320 0.1839428 0.1686548 0.2636989 0.2279503 1.0000000 0.3414939
V7 0.3015123 0.1175596 0.1583849 0.2758351 0.2550014 0.3414939 1.0000000
V8 0.2663878 0.2378540 0.1553257 0.2324400 0.2175612 0.3937855 0.4146257

5

V8
V1 0.2663878
V2 0.2378540
V3 0.1553257
V4 0.2324400
V5 0.2175612
V6 0.3937855
V7 0.4146257
V8 1.0000000

> hetcor.cor$type

[,1] [,2] [,3] [,4] [,5]
[1,] "" "Polychoric" "Polychoric" "Polyserial" "Polyserial"
[2,] "Polychoric" "" "Polychoric" "Polyserial" "Polyserial"
[3,] "Polychoric" "Polychoric" "" "Polyserial" "Polyserial"
[4,] "Polyserial" "Polyserial" "Polyserial" "" "Pearson"
[5,] "Polyserial" "Polyserial" "Polyserial" "Pearson" ""
[6,] "Polyserial" "Polyserial" "Polyserial" "Pearson" "Pearson"
[,6] [,7] [,8]
[1,] "Polyserial" "Polyserial" "Polyserial"
[4,] "Pearson" "Pearson" "Pearson"
[5,] "Pearson" "Pearson" "Pearson"
[6,] "" "Pearson" "Pearson"
[7,] "Pearson" "" "Pearson"
[8,] "Pearson" "Pearson" ""

> fa.parallel(hetcor.cor$correlations,n.obs=355)


> fa.result.hetcor <- fa(hetcor.cor$correlations,n.obs=355,fm="gls",nfactors=3,rotate="proma
> print(fa.result.hetcor,digit=3,sort=T)

Call: fa(r = hetcor.cor$correlations, nfactors = 3, n.obs = 355, rotate = "promax",
fm = "gls")
V1 1 0.868 0.082 -0.101 0.695 0.305
V3 3 0.599 -0.029 0.017 0.359 0.641
V2 2 0.459 -0.058 0.235 0.384 0.616
V8 8 0.077 0.686 -0.075 0.460 0.540

6

V7 7 0.020 0.597 0.034 0.391 0.609
V6 6 -0.031 0.520 0.109 0.328 0.672
V5 5 -0.131 -0.004 0.735 0.420 0.580
V4 4 0.078 0.014 0.613 0.459 0.541

GLS1 GLS2 GLS3
SS loadings 1.364 1.144 0.988

GLS1 GLS2 GLS3
GLS1 1.000 0.409 0.704
GLS2 0.409 1.000 0.552
GLS3 0.704 0.552 1.000




RMSEA index = 0.0613 and the 90 % confidence intervals are 0.0203 0.0998
BIC = -25.059
GLS1 GLS2 GLS3

>

7

YamadaiR(Categorical Factor Analysis)

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (8)

Similaire à YamadaiR(Categorical Factor Analysis)

Similaire à YamadaiR(Categorical Factor Analysis) (20)

Plus de 考司小杉

Plus de 考司小杉 (20)

Dernier

Dernier (20)