Quality of data

Quality of data is definitely
better in case of online surveys

Types of errors

There are two kinds of errors that can creep in during a
survey – sampling errors and non-sampling (human)
errors.

2

Sampling errors
 Sampling errors are those that occur when the

statistical characteristics of a population are estimated
from a sample of that population.
 A way to lower this error is to have randomized

sampling. Now, in online surveys, the number of
contacts is really high, and with low incidence rates and
low completion rates, the level of randomness that is
achieved is really not possible in an offline study.

3

Sampling errors


Also, if required, we do a process known as “weighting”.
.



Every year, we conduct a baseline study covering 109 urban
centres, 196 villages, 80 out of 88 NSSO regions, covering 30,066
households and 1,21,311 individuals, covering 28 states and 4 UTs.
Using this baseline study “Juxt India Consumer Landscape”, we
create a matrix of unique weights for each age-gender-location
combination.



Using this matrix, we can project the data for any survey to a
nationwide population and remove the sampling error and the selfselection bias also in this weighting process.

4

Non-sampling (human/system) errors
In an offline study, the questionnaire administration is
done by a human, who reads it out in his
interpretation, which may result in bias and errors.
However, in the online study, it is the respondent's
interpretation, which is why we use extremely simple
english, and the survey can even be done in local
languages, thus removing this non-sampling error.

5

Non-sampling (human/system) errors
Now, for there can be “bad respondents” also. So, to “clean” this
data,


We clear out the junk respondents, we just don‟t believe in
„response cleaning‟, we delete the case/respondent itself



We remove all the “straight liners”, respondents who fill the
surveys in patterns



We also do “mode time cleaning”. The completion times for
majority of responses fall within the 2/3 to 4/3 region of the mode
time, this can be flexible depending on type of questionnaire.
Outliers outside this band are discarded. A sample of the mode
time cleaning can be seen in the next slide.

6

Typical scatter plot of survey response times
Time
30

Mode time (most commo
occurring completion
time) – 13 minutes

Clean Outliers lying outside
4/3rd of mode time

25

20

Time
15

Most of responses occur
within 2/3rd and 4/3rd of
mode time

10

5
0

200

400

600

800

Clean Outliers lying
outside 2/3rd of mode
1000
1200
time
7

Normality, reliability and validity tests

There are also some tests that can be done at
client‟s request for ensuring statistical validity
of data. Let us see them one by one.

8

Normality Test


The objective of sample normality tests is to ensure the sample is
normally distributed and randomly selected.



It is important that the normality of the sample will be confirmed
before subjecting it to inferential and differential analyses.



Let us take the example of a normality test on the age of
respondents

9

 Histogram – graphical method



An initial impression of the normality of the distribution can be gained by
examining the histogram. From the above Figure, it is evident that the
collected data (of age) is very near to normally distributed curve.
10

 Normal Q-Q Plot of Age



In this Normal Q-Q plot, if the variable were normally distributed, the dots
would fit the line very closely. In this case, the points in the upper right of
the chart indicate the some skewing caused by the extremely large data
values, otherwise data seems to be normally distributed.
11

Reliability test
 It is the extent to which a measuring procedure

yields consistent results on repeated
administrations of the scale.
 The objective of the reliability test is to ensure

that the measurable items of each variable
were measuring the same underlying construct.
 The reliability test of this instrument will be

examined through Cronbach‟s Alpha
Coefficient.

12

Cronbach alpha (α)


The average of all possible split-half‟ correlation coefficients
resulting from different ways of splitting the scale items



It‟s value varies from 0 to 1



α < 0.6 indicates unsatisfactory internal consistency reliability (see
Malhotra & Birks, 2007, p.358)



Note: alpha tends to increase with an increase in the number of
items in scale



The Cronbach alpha reliability coefficient for the choice factors
scale (in our sample questionnaire) as a whole was
0.78071, indicating that the scale as a whole has acceptable
internal consistency and reliability and no items were deleted.
13

Validity test
 While the reliability test is necessary, it is not sufficient
 The objective of the validity test is to identify whether

the proposed items in a study are valid for measuring
the underlying concept, how accurately the concept
corresponds to the real world
 In a test case, the concept referred to the respondents‟

perceived importance of factors influencing their
intention to study at X

14

Sample validity test
Importance of the aspects related to content & structure of course offered

a12_7

Correlations

a12_1

a12_4

a12_2

a12_5

a12_6

a12_3

1.00

-0.07

-0.06

0.00

-0.09

-0.17

-0.12

-0.07

Adaptability
to
professional
environment (a12_7)

1.00

-0.05

-0.18

-0.13

0.04

-0.21

-0.06

-0.05

1.00

-0.17

-0.12

-0.33

-0.16

0.00

-0.18

-0.17

1.00

0.01

-0.11

-0.28

-0.09

-0.13

-0.12

0.01

1.00

-0.25

-0.26

-0.17

0.04

-0.33

-0.11

-0.25

1.00

-0.06

-0.12

-0.21

-0.16

-0.28

-0.26

-0.06

1.00

Reasonableness of the
minimum qualification
requirement (a12_1)

Specialized programs in
the offing (a12_4)
Range
of
courses
offered (a12_2)

Reasonableness of the
course duration (a12_5)

Topicality of course
content (a12_6)

Flexibility in selection of
course (a12_3)

15

Validity test
 The questionnaire for the test study was

developed using choice factors from similar
studies as a point of reference, which was then
adapted to the Indian context and in fact
correlation between the factors was minimum
 Thus, the content validity of the questionnaire

was addressed

16

Thank you
www.juxtconsult.com

www.getcounted.net
17

Quality of data

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Quality of data

Similaire à Quality of data (20)

Plus de JuxtConsult

Plus de JuxtConsult (20)

Dernier

Dernier (20)

Quality of data