4. Purpose
Detect and correct data errors
Detect and treat missing data
Detect and handle insufficiently sampled variables
Conduct transformations and standardizations
Detect and handle outliers
5. First concern
Accuracy of data file
Descriptive statistics
Graphic representations
Honest correlations
Missing data
Pattern or amount
Random or not
Outliers
7. Why is missing data a problem?
Systematical problem
Bias sampling
Demographic variables
Inappropriate measuring procedure
Behavioral items
Insufficient amount for analysis
Small sample
Misleading research results
Biased data in, _______ out
8. Probability distribution of missingness
Consider the probability of missingness
Are certain groups more likely to have missing values?
Respondents in female less likely to report age?
Are certain responses more likely to be missing?
Respondents with high SPA less likely to report anxiety?
Certain analysis methods assume a certain probability
distribution
9. Missing completely at random (MCAR)
Missing data is independent of any other
measured variable (y2) and independent of the
variable itself (y1)
I.e., SES=y2; depression=y1.
If participants dropped out across a range of SES
levels, then the missing on depression would be
independent of SES
Little’s MCAR test in MVA indicates whether MCAR
or not (want ns)
10. Missing at random (MAR)
Missing data may be dependent on another
measured variable (y2), but is independent of the
variable itself (y1).
I.e., SES=y2; depression=y1.
If participants only from high levels of SES dropped
out , then the missing on depression would be
dependent on SES. SES.
MAR can be inferred if Little’s test is significant but
missingness predictable from other vars (other than
the variable itself) –tested by Separate Variance Test.
MNAR indicated if this test reveals missingness
related to the DV
11. Treatment for missing data
Deleting cases or variables
Descriptive statistics
Estimating missing data
Using missing data correlation matrix
Treating missing data as data
Repeating analyses with and without missing data
Choosing among methods for dealing with
missing data
Pattern or amount
12. Deletion or preservation?
Deletion
<5%
MCAR/MAR
Preservation
MNAR
Small sample
Replacement
Mean (grand or group)
Regression (predict missing value by other IVs)
Expectation Maximization (form missing data r matrix by
assumed distribution)
14. Why is outlier a problem?
Systematical problem
Bias sampling
Wrong population
Statistical problem
↑error variance
↓statistical power
↑typeⅠ, Ⅱ error
↓normality
Misleading research results
Biased data in, _______ out
28. Data transformations
Directio
n
Skewness Treatment
+
Moderate New X = SQRT (X)
Substantial New X = LG10 (X)
Substantial with zero New X = LG10 (X+C)
Severe New X = 1/X
L-shaped with zero New X = 1 (X+C)
-
Moderate New X = SQRT (K-X)
Substantial New X = LG10 (K-X)
J-shaped New X = 1 (K-X)
C = a constant added to each score so that the smallest score is 1.
K = a constant from which each score is subtracted so that the smallest score is 1;
usually equal to the largest score + 1.