A case study that explains how quality of data is much better in case of online surveys, with guidelines on how sampling and non-sampling errors are eliminated.
A Comprehensive Guide to Technical SEO | Banyanbrain
Quality of data
1. Quality of data is definitely
better in case of online surveys
2. Types of errors
There are two kinds of errors that can creep in during a
survey – sampling errors and non-sampling (human)
errors.
2
3. Sampling errors
Sampling errors are those that occur when the
statistical characteristics of a population are estimated
from a sample of that population.
A way to lower this error is to have randomized
sampling. Now, in online surveys, the number of
contacts is really high, and with low incidence rates and
low completion rates, the level of randomness that is
achieved is really not possible in an offline study.
3
4. Sampling errors
Also, if required, we do a process known as “weighting”.
.
Every year, we conduct a baseline study covering 109 urban
centres, 196 villages, 80 out of 88 NSSO regions, covering 30,066
households and 1,21,311 individuals, covering 28 states and 4 UTs.
Using this baseline study “Juxt India Consumer Landscape”, we
create a matrix of unique weights for each age-gender-location
combination.
Using this matrix, we can project the data for any survey to a
nationwide population and remove the sampling error and the selfselection bias also in this weighting process.
4
5. Non-sampling (human/system) errors
In an offline study, the questionnaire administration is
done by a human, who reads it out in his
interpretation, which may result in bias and errors.
However, in the online study, it is the respondent's
interpretation, which is why we use extremely simple
english, and the survey can even be done in local
languages, thus removing this non-sampling error.
5
6. Non-sampling (human/system) errors
Now, for there can be “bad respondents” also. So, to “clean” this
data,
We clear out the junk respondents, we just don‟t believe in
„response cleaning‟, we delete the case/respondent itself
We remove all the “straight liners”, respondents who fill the
surveys in patterns
We also do “mode time cleaning”. The completion times for
majority of responses fall within the 2/3 to 4/3 region of the mode
time, this can be flexible depending on type of questionnaire.
Outliers outside this band are discarded. A sample of the mode
time cleaning can be seen in the next slide.
6
7. Typical scatter plot of survey response times
Time
30
Mode time (most commo
occurring completion
time) – 13 minutes
Clean Outliers lying outside
4/3rd of mode time
25
20
Time
15
Most of responses occur
within 2/3rd and 4/3rd of
mode time
10
5
0
200
400
600
800
Clean Outliers lying
outside 2/3rd of mode
1000
1200
time
7
8. Normality, reliability and validity tests
There are also some tests that can be done at
client‟s request for ensuring statistical validity
of data. Let us see them one by one.
8
9. Normality Test
The objective of sample normality tests is to ensure the sample is
normally distributed and randomly selected.
It is important that the normality of the sample will be confirmed
before subjecting it to inferential and differential analyses.
Let us take the example of a normality test on the age of
respondents
9
10. Histogram – graphical method
An initial impression of the normality of the distribution can be gained by
examining the histogram. From the above Figure, it is evident that the
collected data (of age) is very near to normally distributed curve.
10
11. Normal Q-Q Plot of Age
In this Normal Q-Q plot, if the variable were normally distributed, the dots
would fit the line very closely. In this case, the points in the upper right of
the chart indicate the some skewing caused by the extremely large data
values, otherwise data seems to be normally distributed.
11
12. Reliability test
It is the extent to which a measuring procedure
yields consistent results on repeated
administrations of the scale.
The objective of the reliability test is to ensure
that the measurable items of each variable
were measuring the same underlying construct.
The reliability test of this instrument will be
examined through Cronbach‟s Alpha
Coefficient.
12
13. Cronbach alpha (α)
The average of all possible split-half‟ correlation coefficients
resulting from different ways of splitting the scale items
It‟s value varies from 0 to 1
α < 0.6 indicates unsatisfactory internal consistency reliability (see
Malhotra & Birks, 2007, p.358)
Note: alpha tends to increase with an increase in the number of
items in scale
The Cronbach alpha reliability coefficient for the choice factors
scale (in our sample questionnaire) as a whole was
0.78071, indicating that the scale as a whole has acceptable
internal consistency and reliability and no items were deleted.
13
14. Validity test
While the reliability test is necessary, it is not sufficient
The objective of the validity test is to identify whether
the proposed items in a study are valid for measuring
the underlying concept, how accurately the concept
corresponds to the real world
In a test case, the concept referred to the respondents‟
perceived importance of factors influencing their
intention to study at X
14
15. Sample validity test
Importance of the aspects related to content & structure of course offered
a12_7
Correlations
a12_1
a12_4
a12_2
a12_5
a12_6
a12_3
1.00
-0.07
-0.06
0.00
-0.09
-0.17
-0.12
-0.07
Adaptability
to
professional
environment (a12_7)
1.00
-0.05
-0.18
-0.13
0.04
-0.21
-0.06
-0.05
1.00
-0.17
-0.12
-0.33
-0.16
0.00
-0.18
-0.17
1.00
0.01
-0.11
-0.28
-0.09
-0.13
-0.12
0.01
1.00
-0.25
-0.26
-0.17
0.04
-0.33
-0.11
-0.25
1.00
-0.06
-0.12
-0.21
-0.16
-0.28
-0.26
-0.06
1.00
Reasonableness of the
minimum qualification
requirement (a12_1)
Specialized programs in
the offing (a12_4)
Range
of
courses
offered (a12_2)
Reasonableness of the
course duration (a12_5)
Topicality of course
content (a12_6)
Flexibility in selection of
course (a12_3)
15
16. Validity test
The questionnaire for the test study was
developed using choice factors from similar
studies as a point of reference, which was then
adapted to the Indian context and in fact
correlation between the factors was minimum
Thus, the content validity of the questionnaire
was addressed
16