2. How can we Define “Data”…..???
Terminologies
Types of Data
Data Collection…???
How to Analyze & Represent Data….???
What is Sample & Sampling…???
Terminologies in Sampling
Types of Sampling
How to Calculate Sample Size…..???
3. The word data is the plural of datum, which
literally means "to give“ or "something given".
“Data is a collection of facts, such as values
or measurements.”
“Data are measurements or observations
that are collected as a source of
information.”
It can be numbers, words, measurements,
observations or even just descriptions of
things.
4. Data Unit
A data unit is one entity (such as a
person or business) in the population being
studied, about which data are collected. A
data unit is also referred to as a unit record or
record.
Data Item
A data item is a characteristic of a
data unit which is measured or counted,
such as height, country of birth, or income.
A data item is also referred to as a variable
because the characteristic may vary
between data units, and may vary over time.
5. Observation
An observation is an occurrence of a
specific data item that is recorded about
a data unit. It may also be referred to as
datum, which is the singular form of data.
An observation may be numeric or non-
numeric.
Dataset
A dataset is a complete collection of
all observations.
6.
7. There are main two types of data with
respect to its characteristics:
Qualitative Data
Quantitative Data
8. “Data that is not given numerically.”
It deals with description.
It can be observed but not measured.
Qualitative → Quality
Example: Favorite Color, Place of Birth,
Favorite Food, Type of Car
9. It is given in numerical form.
It deals with numbers.
It can be measured.
Quantitative → Quantity
Example: Length, Height, Area, Volume,
Weight, Speed, Time, Temperature,
Humidity, Sound Levels, Cost, Ages, etc.
10. Quantitative data can be divided into:
Discrete Data
Continuous Data
Discrete data is counted, Continuous
data is measured
11. Discrete Data
Discrete data can only take certain
values (like whole numbers).
Example: The number of students in a class
(you can't have half a student).
Continuous Data
Continuous Data is data that can take
any value (within a range).
Example: A person's height: could be any
value (within the range of human heights),
not just certain fixed heights,
12.
13.
14. Univariate Data
It means "one variable" (one type of data).
Example: Travel Time (minutes): 15, 29, 8,
42, 35, 21, 18, 42, 26
The variable is Travel Time.
15. Bivariate or Multivariate Data
It means "two or more than two variables“.
With bivariate or multivariate data you have
two or more than two sets of related data that
you want to compare.
Example:
The two variables are
Ice Cream Sales and
Temperature.
Univariate Data Bivariate or Multivariate Data
Involving a single variable Involving two or more variables
Does not deal with causes or
relationships
Deals with causes or relationships
16. There are main two types of data with
respect to data collection techniques
Primary Data
Secondary Data
17. Primary data means original data that
has been collected specially for the
purpose in mind. It means someone
collected the data from the original source
first hand. Data collected this way is called
Primary Data.
Example: Questionnaire, Surveys,
Experiments, Interviews.
18. Secondary data is data that has been
collected for another purpose. When we use
Statistical Method with Primary Data from
another purpose for our purpose we refer to
it as Secondary Data.
Example: Books, Journals, Magazines,
Newspapers, E-journals, General Websites,
Web-blogs.
20. “Data Collection is a process of obtaining
useful information for a defined purpose
from various sources.”
The issue is not: How do we collect data?
It issue is: How do we collect useful data?
21. The purpose of data collection is:
To obtain information to keep on record
To make decisions about important issues
To pass information on to others
22. “A document that defines all the details
concerning data collection, including how
much and what type of data is required and
when and how it should be collected.”
Why do we want the data?
What purpose will they serve?
Where will we collect the data?
What type of data will we collect?
Who will collect the data?
How do we collect the right data?
23. Tools used to collect data are
Mail
Telephone
In-person and Web-based Surveys
Direct or Participatory Observation
Interviews
Focus Groups
Expert Opinion
Case Studies
Literature Search
Content Analysis of Internal and External Records
The data collection tools must be strong
enough to support the findings of the evaluation.
24. “Analysis of data is a process of
inspecting, cleaning, transforming, and
modeling data with the goal of
highlighting useful information,
suggesting conclusions, and supporting
decision making.”
25. Bar Graphs
Pie Charts
Line Graphs
Scatter (x,y) Plots
Pictographs
Histograms
Frequency
Distribution
Stem and Leaf Plots
Cumulative Tables and
Graphs
Relative Frequency
Check Sheet
26. A Bar Graph (also called Bar Chart) is a
graphical display of data using bars of
different heights.
27. A Histogram is a graphical display of data
using bars of different heights.
It is similar to a Bar
Chart, but a histogram
groups numbers into
ranges.
28. A special chart that uses "pie slices" to
show relative sizes of data.
29. A graph that shows information that is
connected in some way (such as change
over time)
30. A graph of plotted points that show the
relationship between two sets of data.
32. Frequency:
Frequency is how often something
occurs.
By counting frequencies we can make
a Frequency Distribution table.
Example: Sam's team has
scored the following
numbers of goals in recent
football games:
33. A special table where each data value is
split into a "leaf" (usually the last digit)
and a "stem" (the other digits).
Like in this example:
34. Suppose you have the following list of values: 12, 13,
21, 27, 33, 34, 35, 37, 40, 40, 41. You could make a
frequency distribution table showing how many tens,
twenties, thirties, and forties you have:
Frequency
Class
Frequency
10 - 19 2
20 - 29 2
30 - 39 4
40 - 49 3
35. Cumulative means "how much so far". To
have cumulative totals, just add up the
values as you go.
Example: Jamie has earned
this much in the last 6
months:
37. “A generic tool that can be adapted for
a wide variety of purposes, the check
sheet is a structured, prepared form for
collecting and analyzing data.”
38.
39. Census
A Census is when we collect data for every
member of the group (the whole "population").
Sample
“A Sample is when we collect data just for
selected members of the group.”
Example: There are 120 people in your local
football club.
We can ask everyone (all 120) what their age
is. That is a census.
Or you could just choose the people that are
there this afternoon. That is a sample.
Sample
40. Sampling is the process of selecting
units from population of interest so that
by studying the sample we may fairly
generalize our results back to the
population from which they were chosen.
41. Sampling reduce expenses and time by
allowing researchers to estimate information
about a whole population without having to survey
each member of the population.
Sampling is like taking out and testing a few grains
of rice from the cooking vessel to know if the dish
is done or not.
42. Sampling Universe
Population from which we are sampling.
Sampling Unit
The unit selected during the process of
sampling.
Example: If we select households from a list of all
units in the population, the sampling unit is in this
case the household.
43. Basic Sampling Unit or Elementary Unit
The sampling unit selected at the last
stage of sampling.
In a multi-stage survey if we first select
villages and then select household within those
selected villages, the basic sampling unit would
be the household.
Respondent
Person who’s responding to our
questionnaires on the field.
44. Survey Subject
Entity or person from whom we are
collecting data.
Sampling Frame
Description of the sampling universe,
usually in the form of the list of sampling
units.
Example: Villages, Households or Individuals.
45. There are main two types of Sampling Technique:
Probability Sampling
Non-Probability Sampling
46. A probability sampling is one in which
every unit in the population has a chance
(greater than zero) of being selected in
the sample.
Probability Sampling can be further sub-
classified into:
Stratified Sampling
Simple Random Sampling
Systematic Sampling
Cluster Sampling
47. Simple Random Sampling (SRS)
In a simple random sampling (SRS) of a
given size, all such subsets of the frame are
given an equal probability. Each element of
the frame thus has an equal probability of
selection: the frame is not subdivided or
partitioned.
Simple random sampling is always an EPS
design (equal probability of selection), but not all
EPS designs are simple random sampling.
48. SRS may also be cumbersome and tedious when
sampling from an unusually large target
population.
Example: N college students want to get a ticket for
a basketball game, but there are not enough tickets
(X) for them, so they decide to have a fair way to
see who gets to go.
Then, everybody is given a number (1 to N), and
random numbers are generated, either
electronically or from a table of random numbers.
49. Systematic Sampling
A method of selecting sample members
from a larger population according to a
random starting point and a fixed, periodic
interval called the sampling interval.
The sampling interval (sometimes known as
the skip) is calculated as:
where n is the sample size, and N is the
population size.
50. Example: Suppose you want to sample 8 houses
from a street of 120 houses.
Skip = k = 120/8 =15
So, every 15th house is chosen after a random
starting point between 1 and 15.
If the random starting point is 11, then the
houses selected are 11, 26, 41, 56, 71, 86, 101, and
116.
51. Stratified Sampling
Where the population embraces a
number of distinct categories, the frame can
be organized by these categories into
separate "strata." Each stratum is then
sampled as an independent sub-population,
out of which individual elements can be
randomly selected.
52. Example: Suppose that in a company there are
the following staff: Total: 180
Male (Full-time): 90 Male (Part-time): 18
Female (Full-time): 9 Female (Part-time): 63
we are asked to take a sample of 40 staff,
stratified according to the above categories.
Male (Full-time) = 90 x (40 / 180) = 20
Male (Part-time) = 18 x (40 / 180) = 4
Female (Full-time) = 9 x (40 / 180) = 2
Female (Part-time) = 63 x (40 / 180) = 14
53. Cluster Sampling
Cluster sampling is exactly what its title
implies. You randomly select clusters or
groups in a population instead of
individuals.
The objective of this method is to choose a
limited number of smaller geographic areas in
which simple or systematic random sampling
can be conducted.
54. It’s completed in 2 stages:
1st Stage: Random Selection of Clusters: The
entire population of interest is divided into
small distinct geographic areas, such as
villages, camps, etc. We then need to find an
approximate size of the population for each
“village”.
2nd Stage = Random Selection of Households
within Clusters: Households are chosen
randomly within each cluster using simple
or systematic random sampling.
55. Advantages Disadvantages
Simple
Random
Sampling
(SRS)
Estimates are easy to calculate.
Simple random sampling is always an
EPS design, but not all EPS designs are
simple random sampling.
If sampling frame large, this method
impracticable.
Minority subgroups of interest in
population may not be present in sample
in sufficient numbers for study.
Systematic
Sampling
Sample easy to select
Suitable sampling frame can be
identified easily
Sample evenly spread over entire
reference population
Sample may be biased if hidden
periodicity in population coincides with
that of selection.
Difficult to assess precision of estimate
from one survey.
Stratified
Sampling
Low Cost
Greater accuracy
Better coverage
Sampling frame of entire population has
to be prepared separately for each stratum
When examining multiple criteria,
stratifying variables may be related to
some, but not to others, further
complicating the design, and potentially
reducing the utility of the strata.
In some cases. stratified sampling can
potentially require a larger sample than
would other methods
Cluster
Sampling
Cuts down on the cost of preparing a
sampling frame.
This can reduce travel and other
administrative costs.
sampling error is higher for a simple
random sample of same size.
Often used to evaluate vaccination
coverage in EPI
56. Non-probability sampling is any
sampling method where some elements
of the population have no chance of
selection or where the probability of
selection can't be accurately determined.
Probability Sampling can be further sub-
classified into:
Quota Sampling
Accidental Sampling
57. Quota Sampling
In quota sampling, the population is first
segmented into mutually exclusive sub-
groups, just as in stratified sampling. Then
judgment is used to select the subjects or
units from each segment based on a
specified proportion.
Example: An interviewer may be told to
sample 200 females and 300 males between
the age of 45 and 60.
58. In quota sampling the selection of the
sample is non-random.
Interviewers might be tempted to
interview those who look most helpful.
The problem is that these samples may
be biased because not everyone gets a
chance of selection.
59. Accidental Sampling
Accidental sampling (sometimes known
as Grab, Convenience or Opportunity
sampling) is a type of non-probability
sampling which involves the sample being
drawn from that part of the population
which is close to hand.
60. Example: If the interviewer were to
conduct such a survey at a shopping center
early in the morning on a given day, the
people that he/she could interview would
be limited to those given there at that given
time, which would not represent the views
of other members of society in such an area.
If the survey were to be conducted at
different times of day and several times per
week. This type of sampling is most useful
for pilot testing.
61. Sample size depends upon :
Population size
Confidence Interval
Confidence Level
By increasing sample size, accuracy
increases and margin of error decreases
62. Confidence Level
The confidence level tells you how
sure you can be.
It is expressed as a percentage and
represents how often the true percentage of
the population who would pick an answer
lies within the confidence interval.
The 95% confidence level means you can
be 95% certain; the 99% confidence level
means you can be 99% certain. Most
researchers use the 95% confidence level.
63. Confidence Interval
It expresses the degree of uncertainty
associated with a sample statistic. A
confidence interval is an interval estimate
combined with a probability statement.
Interval Estimate
An interval estimate is defined by
two numbers, between which a
population parameter is said to lie.
For example, a < μ < b is an interval
estimate for the population mean μ. It
indicates that the population mean is greater
than a but less than b.
64.
65.
66. “What is data..??” available from:
http://www.mathsisfun.com/data/data.html (20 March 2013)
“Sampling” available from:
http://en.wikipedia.org/wiki/Sampling_statistics (21 March
2013)
“Qualitative data analysis ” available from:
http://www.learnhigher.ac.uk/analysethis/main/qualitative.ht
ml (14 March 2013)
“Calculating the Sample Size ” available from:
http://www.ifad.org/gender/tools/hfs/anthropometry/ant_3.ht
m (21 March 2013)
“Sampling Strategies” available from: http://www.dissertation-
statistics.com/sampling-strategies.html (21 March 2013)
“Univariate vs Bivariate Data” available from:
http://regentsprep.org/REgents/math/ALGEBRA/AD1/unidat.
htm (21 March 2013)
3/18/2015
Editor's Notes
“Content analysis” steps:
Transcribe data (if audio taped)
Read transcripts
Highlight quotes and note why important
Code quotes according to margin notes
Sort quotes into coded groups (themes)
Interpret patterns in quotes
Describe these patterns
Check Sheet Procedure
Decide what event or problem will be observed. Develop operational definitions.
Decide when data will be collected and for how long.
Design the form. Set it up so that data can be recorded simply by making check marks or Xs or similar symbols and so that data do not have to be recopied for analysis.
Label all spaces on the form.
Test the check sheet for a short trial period to be sure it collects the appropriate data and is easy to use.
Each time the targeted event or problem occurs, record data on the check sheet.
A census is accurate, but hard to do. A sample is not as accurate, but may be good enough, and is a lot easier.
If you are selecting districts during the first stage of cluster sampling, the sampling unit (also called primary sampling unit) at the first sampling stage is therefore the district.
Sampling frame: description of the sampling universe, usually in the form of the list of sampling units (for example, villages, households or individuals). Sometimes, it may be outdated or otherwise not accurate, and thus would not provide an accurate description of the sampling universe (census data not recent, recent population movements, etc.)
advantages
Estimates are easy to calculate.
Simple random sampling is always an EPS design, but not all EPS designs are simple random sampling.
Disadvantages
If sampling frame large, this method impracticable.
Minority subgroups of interest in population may not be present in sample in sufficient numbers for study.
ADVANTAGES:
Sample easy to select
Suitable sampling frame can be identified easily
Sample evenly spread over entire reference population
DISADVANTAGES:
Sample may be biased if hidden periodicity in population coincides with that of selection.
Difficult to assess precision of estimate from one survey.
Drawbacks to using stratified sampling.
First, sampling frame of entire population has to be prepared separately for each stratum
Second, when examining multiple criteria, stratifying variables may be related to some, but not to others, further complicating the design, and potentially reducing the utility of the strata.
Finally, in some cases (such as designs with a large number of strata, or those with a specified minimum sample size per group), stratified sampling can potentially require a larger sample than would other methods
1. The main difference between cluster sampling and stratified sampling is that in cluster sampling the cluster is treated as the sampling unit so analysis is done on a population of clusters (at least in the first stage).2. In stratified sampling, the analysis is done on elements within strata. Stratified sampling techniques are generally used when the population is heterogeneous, or dissimilar, where certain homogeneous, or similar, sub-populations can be isolated (strata)3. In stratified sampling, a random sample is drawn from each of the strata, whereas in cluster sampling only the selected clusters are studied. 4. The main objective of cluster sampling is to reduce costs by increasing sampling efficiency. This contrasts with stratified sampling where the main objective is to increase precision.Here's an example of each Sampling, so you can see some of these differences in words:
Cluster Sampling
Suppose that the Department of Agriculture wishes to investigate the use of pesticides by farmers in England. A cluster sample could be taken by identifying the different counties in England as clusters. A sample of these counties (clusters) would then be chosen at random, so all farmers in those counties selected would be included in the sample. It can be seen here then that it is easier to visit several farmers in the same county than it is to travel to each farm in a random sample to observe the use of pesticides.
Stratified Sampling
Suppose a farmer wishes to work out the average milk yield of each cow type in his herd which consists of Ayrshire, Friesian, Galloway and Jersey cows. He could divide up his herd into the four sub-groups and take samples from these.
Advantages :
Cuts down on the cost of preparing a sampling frame.
This can reduce travel and other administrative costs.
Disadvantages:
sampling error is higher for a simple random sample of same size.
Often used to evaluate vaccination coverage in EPI