Statistical techniques for interpreting and reporting quantitative data i
1. Statistical Techniques for
Interpreting and Reporting
Quantitative Data - I
M. Vijayalakshmi
M.Sc., M.Phil. (Life Sciences), M.Ed., M.Phil.
(Education), NET (Education), PGDBI
Assistant Professor (Former),
Sri Ramakrishna Mission Vidyalaya College of Education
(Autonomous),
Coimbatore – 641020.
2. Meaning of 'Statistics’
• The Word 'Statistics' appears to have been
derived from the Latin word “Status” meaning
“a political state”.
• Some believe that the word has its root in the
German word 'Statistik'.
• Statistics was simply the collection of
numerical data
3. Definition
• Defined as the scientific study of handling
quantitative information.
• It embodies the methodology of collection,
Classification, Description and Interpretation
of data obtained through the conduct of
surveys and experiments.
• The essential purpose is to describe and draw
inferences about the numerical properties of
populations.
4. • Croxton and Cowden –
• “The collection, presentation, analysis and
interpretation of numerical data"
5. • Examples of Statistics are:
- The number of teachers recruited every year;
- The number of colleges functioning in
Tamilnadu;
- The number of science graduates produced in
a year;
- The number of candidates selected for IAS in a
year etc.,
6. • Statistics deals with facts and figures.
• It is the scientific method of collecting the
appropriate data, classifying and tabulating
the collected data, analyzing them with
appropriate statistical techniques and finally
drawing truthful inferences and conclusions.
7. Importance of the study of Statistics
• Knowing the performance of his students in
different subjects
• Comparing their achievements with students
of other institutions
• Identifying those students who require his
help in order to secure more marks
8. • Selecting them for admission to higher
courses or for jobs based on their
performance in entrance/competitive
examinations
• Developing norms for achievement and
psychological tests
• Constructing and standardizing scholastic
ability tests etc.
9. Steps involved in the Statistical Method
I. Collection of Data
II. Classification & Tabulation
III. Statistical Analysis of data
IV. Drawing of inferences
10. I. Collection of Data :
i) Identify the variables and their nature.
ii) Select the appropriate scales of
measurement.
iii) Obtain the accurate quantitative
measurements.
11. II. Classification & Tabulation :
• Transforming the raw data into a suitable frequency
distribution.
III. Statistical Analysis of data :
A) Descriptive Analysis
B) Inferential Analysis
14. IV. Drawing of lnferences :
• Avoiding Type I & Type II errors
• Levels of significance of Inferences.
15. Descriptive Analysis
• To describe the properties of the given group
taken for study, we calculate certain measures
like the averages (Mean, Median, Mode), or
measures of dispersion or measures of
association.
• These are called 'descriptive statistics'.
16. • The three major aspects of descriptive
statistics are –
1) Measures of central tendency
2) Measures of dispersion I deviation
3) Measures of association I relationship.
17. Inferential Statistics
• Analysis in inferential statistics is based on
'sampling technique'.
• To study a large population, we normally
choose from it a small random sample and
obtain the descriptive statistics (measures
pertaining to the sample) from which we try
to infer the measures pertaining to the large
population.
18. • Example
• To estimate the mean performance of +2 students in
Maths in Tamil Nadu.
• we may choose a small sample of 500 students from
among those who are about to appear for the public
examination;
• conduct a maths test for the group of 500 students
selected and calculate the mean of the scores
obtained from the test.
• This is a sample 'statistic'.
• From this, we infer the mean of the population of +2
students in Tamil Nadu, which is called 'parameter'.
• Thus inferential statistics is the technique of
estimating the population parameter from the known
sample statistic.
19. Types of Variables
• Any quantity or trait whose value will go on
changing is called a 'variable' ·
• Eg.:
i) Height of pupils
ii) Weight of students
iii) Achievement scores etc.
• A 'constant' is one which has a fixed value at all
times, in all places.
• Variables are of two types –
i) Continuous
ii) Discrete
20. • Continuous variable:
• Variable which can have all possible values from
- α to + α is called "Infinite Continuous variable".
• Variables which can have all possible values
between any two specified limits are called
"Finite Continuous Variables".
• Eg:
• Expenditure - Infinite Continuous Variable
• Achievement scores - Finite Continuous Variable.
21.
22. • Discrete Variable :
• Variables which can take only certain specified
or allowed values, (and not any other values)
are called Discrete Variables.
23.
24. Scales of Measurement
• There are four types of scales of measurement.
They are
i) Nominal Scale
ii) Ordinal Scale
iii) Interval Scale
iv) Ratio Scale
• Depending upon the nature of the variable, the
suitable scale of measurement should be
employed.
25. Nominal Scale
• Meant for variables which can be merely
labeled or categorized like:
• Males & Females
• Married & Single
• Rural & Urban People
• Hostlers & Day Scholars
• High & Low Socio-Economic status etc
26. Ordinal Scale
• Meant for variables which cannot be
accurately measured but can be rated and
ordered.
• Ordinal Scale is better than Normal Scale.
• Eg: Variables like Beauty, Singing Ability,
Selling Ability, Oratorical Skill etc, can only be
rated and ranked.
27. Interval Scale
• Meant for variables which can be measured
accurately.
• Eg: Achievement scores, Height, Weight,
Temperature measurements in Fahrenheit
scale etc.
• In Interval scale the origin of the scale or the
absolute zero is not known.
28. Ratio Scale
• Meant for variables which can be measured
accurately as in Interval scale; apart from
that the absolute zero value (absence of the
trait) is also meaningfully known.
• Eg: C.G.S. Scales in Physics, Centigrade Scale
of Thermometry etc.
• Here two values of a variable can be
expressed as a ratio.
29. Primary and Secondary Data
• Primary Data :
i) When the data is collected for the first time,
directly from the sources, then it is called
primary data.
ii) It is original in character
iii) The shape of the primary data is like the
shape of the raw material. It must be
classified, tabulated and interpreted.
30. • Methods of collecting Primary Data:
i) Direct Personal Contact
ii) Post and correspondence
iii) Schedules through enumerators
iv) Combination of the above methods
31. • Secondary Data :
• Secondary data is called second-hand data.
• Data which is already collected for some
purpose is made use of now, for a totally
different purpose.
• It is in the shape of a finished product.
32. • Sources of Secondary Data:
i) Official publications like U.N.0. reports, I.M.F.
(International Monetary Fund) reports,
Central and State Government Publications
etc.
ii) Semi-official publications like reports of city
corporation, L.I.C, Reserve Bank etc.
iii) Private Publications.
33. iv) Journals, Newspapers, Published Research
articles etc.
v) Unpublished data like registers of
companies, schools, Govt. Audit Reports.
Unpublished Research Theses etc.
34. Raw and Grouped Data
• A group of obtained individual scores is
known as 'Raw Data'.
• If the number of such scores is small, then
we can handle them as such to calculate
sample statistics like the Mean, Standard
Deviation, Correlation Co-efficient etc.
35. • However if the group of scores is large
(Usually if the group contains 30 or more
scores, then it is referred as large group) then
it is very difficult to handle them as such to
compute the required sample Statistics.
• In such cases we organize the scores in a
number of classes and find how many items
get placed in each of these classes.
36. • This table in which raw scores are arranged in
the form of classes and class frequencies is
called 'Frequency Distribution'.
• Data that is present in the form of a frequency
distribution is known as 'Grouped Data'.
37. Classification of data
• Classification is the grouping of related facts
into different classes.
• Facts in one class differ from those of another
class with respect to some characteristics
called a basis of classification
• Sorting facts on one basis of classification and
then on another basis is called cross-
classification
38. Types of classification
• Geographical – area-wise Ex: cities, districts
• Chronological – on the basis of time
• Qualitative – according to some attributes
• Quantitative – in terms of magnitudes
42. Forming a Frequency Distribution
• Frequency distribution is a table in which raw
scores are arranged in the form of classes and
class frequencies.
• In a frequency distribution table, there will be
number of classes of equal size. The number
of score values which fall in a particular class
interval is known as the frequency of that
class.
43.
44. Example
• Step I: Find the maximum and the minimum
values. The difference between the two is
called the Range. Here the Range is 96-4 = 92.
• Usually we smoothen the maximum and
minimum values such that the range becomes a
multiple of 5.
• So, taking the maximum value as 100 and the
minimum value as 0, we have the Range
100-0 = 100.
45. • Step II: Determine the width ‘i’ of the class
interval. Usually it is desirable to have i = 5,
10, 15, 20, 25, 50, 100 and the multiples of
100.
46. • Step III: Determine the number of class
intervals (n), using the relation
Range
n = -----------------
(i)
• Usually, it is desirable to have 'n' ranging
between 5 and 15. Of course it is not a hard
and fast rule.
• When considered Step II & Ill together, in
our example we can have i = 10;
hence n = 100/10 = 10.
47. Classification according to class intervals
• Class Limits – lower and upper limit
• Class intervals – difference between upper and
lower limit
• Class frequency – number of observations
corresponding to the particular class
• Class mid-point – upper limit of the class + lower
limit of the class / 2
• Two methods of class intervals
A. Exclusive method
B. Inclusive method
48. • Step IV : Write the class intervals (C.I) either
in the Exclusive type (where the upper limit
of the class becomes the lower limit of the
succeeding class) or Inclusive type (Where
both the upper and lower limits of the class
are included in the same class interval;
naturally the upper limit of a class is one score
less than the lower limit of the succeeding
class).
49. Exclusive class interval type Inclusive class interval type
0-10 0-9
10-20 10-19
20-30 20-29
30-40 30-39
40-50 40-49
50-60 50-59
60-70 60-69
70-80 70-79
80-90 80-89
90-100 90-99
50. • Step V: Check the individual values, and mark
each one as a 'tally' against the C.I. in which it
falls. For making counting easy, every fifth tally
mark against any class interval is made as a
horizontal line.
• Step VI: Count the tally marks against each
and every class interval and put the number,
which is the frequency of that class.
51.
52.
53.
54. • Relative Frequency Table
• Relative frequency = class frequency
sum of all frequencies
55. Cumulative Frequency Table
Rating Frequency
0-2 20
3-5 14
6-8 15
9-11 2
12-14 1
Total Frequency 52
Rating Cumulative
Frequency
0-2 20
3-5 34
6-8 49
9-11 51
12-14 52
56. Relative Frequency Table
Rating Frequency
0-2 20
3-5 14
6-8 15
9-11 2
12-14 1
Total Frequency 52
Rating Relative
Frequency
0-2 38.5%
3-5 26.9%
6-8 28.8%
9-11 3.8%
12-14 1.9%
20/52 = 38.5%
14/52 = 26.9%
etc.
57.
58.
59. Fiducial limits
• [fə¦dü·shəl ′lim·əts] (statistics)
• The boundaries within which a
parameter is considered to be
located; a concept in fiducial
inference.
60. What is a fiducial confidence interval?
• A fiducial confidence interval is a confidence interval
based on fiducial statistical theory, which considers
unknown population parameters to be random variables.
Fiducial confidence intervals are primarily used in probit
analysis.
• For a 100(x)% fiducial confidence interval, the probability
that the population parameter falls within the interval is (x).
• This interpretation is fundamentally different from that of
standard confidence intervals. Standard confidence
intervals do not consider population parameters to be
random variables, but fixed values, and consider the
confidence interval itself to be random, because the
interval is derived from a random sample.
61.
62. What is a difference between fiducial
limits and confidence limits/intervals?
• Confidence limits (95% or 99%) are calculated either for mean
and proportion. In either case the underlying distribution is
Normal when sample size is adequately large.
• Fiducial limits is applicable only in the case of lethal dose
required for 50% mortality(LD50) or 90% mortality (LD90). The
underlying distribution is logistic growth or S-shaped curve.
• Even though we transform the data (% values into probit and
dose values into logdose) to fit a linear regression equation
(Y(probit)=a+b*logdose), the upper limit of conventional
confidence limits may be beyond the value (say 110%), which is
not true in the case of logistic growth. In such situations Fiducial
limits is more appropriate than confidence limits.
63. Tabulation of Data
• A table is a systematic arrangement of statistical
data in columns and rows
• One of the simplest and most revealing devices
for summarising data and presenting them in
meaningful fashion is the statistical table
• Tables are the devices, that are used to present
the data in a simple form. It is probably the first
step before the data is used for analysis or
interpretation.
64. General principals of designing tables
• The tables should be numbered e.g table 1, table 2 etc.
• A title must be given to each table, which should be brief and
self explanatory.
• The headings of columns or rows should be clear and concise.
• The data must be presented according to size or importance
chronologically, alphabetically, or geographically.
• If percentages or averages are to be compared, they should be
placed as close as possible.
• No table should be too large
• Most of the people find a vertical arrangement better than a
horizontal one because, it is easier to scan the data from top
to bottom than from left to right
• Foot notes may be given, where necessary, providing
explanatory notes or additional information.
65. Parts of a Table
• Table number – top or bottom
• Title of the table – suitable
• Caption – column heading
• Stub – row heading
• Body of the table – numerical information
• Headnote – brief explanatory statement
• Footnote – explanations - to understand the
reader
68. General Purpose and Special Purpose
• General Purpose
Reference or repository tables
Provide information for general use or
references
• Special Purpose
Summary or analytical or derivative tables
Provide information for particular discussion
69. Charting Data
• Most convincing and appealing ways in which
data may be presented is through charts
• A picture is said to be worth 10,000 words
• Presented in an interesting form greater
memorizing effect
i. Diagrams and
ii. Graphs
70. Diagrams
• General Rules :
Title
Proportion between width and height
Selection of appropriate scale
Footnotes
Index
Neatness and cleanliness
Simplicity
71. Types of Diagrams
1. One – dimensional diagrams
• Ex: Bar diagrams
2. Two - dimensional diagrams
• Ex: Rectangles, squares and circles
3. Pictograms and Cartograms
72. Presentation of data
Tabular
Simple table
Complex
table
Graphical
For
quantitative
data
Histogram
Frequency polygon
Frequency curve
Line chart
Normal distribution curve
Cumulative distribution curve
Scatter diagram
For qualitative
data
Bar chart
Pictogram
Pie chart
Map diagram
73. 1. One – dimensional diagrams
Bar diagrams
• The data presented is categorical
• Data is presented in the form of rectangular bar of
equal breadth.
• Each bar represent one variant /attribute.
• Suitable scale should be indicated and scale starts from
zero.
• The width of the bar and the gaps between the bars
should be equal throughout.
• The length of the bar is proportional to the magnitude/
• frequency of the variable.
• The bars may be vertical or horizontal.
74. Types of Bar Diagrams
Simple Bar Diagrams
Subdivided Bar Diagrams
Multiple Bar Diagrams
Percentage Bar Diagrams
Deviation Bars
Broken Bars
78. subdivision of a single bar to indicate the composition of
the total divided into sections according to their relative
proportion.
79. Multiple Bar Diagrams
Each observation has more than one value, represented by a
group of bars. Percentage of males and females in different
countries, percentage of deaths from heart diseasesin old and
young age
80.
81. Deviation Bars
Representing net quantities – excess or deficit
Net profit, net loss, net exports or imports etc.,
Have both positive and negative values
83. Two-dimensional Diagrams
• Length as well as width of the bars is considered
• Area of the bar represents the given data
• Also known as surface diagrams or area diagrams
• Types –
• Rectangles
• Squares
• Circles
87. Pie diagram
• Consist of a circle whose area represents the
total frequency (100%) which is divided into
segments.
• Each segment represents a proportional
composition of the total frequency.
90. Pictogram Diagram
• Popular method of presenting data to those
who cannot understand orthodox charts.
• Small pictures or symbols are used to present
the data, e.g a picture of a doctor to represent
the population physician.
• Fraction of the picture can be used to
represent numbers smaller than the value of
whole symbol
91.
92. Cartogram Diagram
• Statistical maps
• Used to give quantitative information on a
geographical basis
• Represent special distribution
• Shown in many ways – shades of colours, dots,
placing pictograms, numerical figure in
geographical unit
99. Line Graphs
• It is diagram showing the relationship
between two numeric variables (as the
scatter) but the points are joined together to
form a line (either broken line or smooth
curve. Used to show the trend of events
• with the passage of time.
105. Band graphs
• It is a type of line graph which shows the total
for successive time periods broken up into
sub-totals for each of the component parts of
the total.
• The various component parts of the whole are
plotted one over the other and the gaps
between the successive lines are filled by
different shades, colours, etc., so that the
appearance of a series of bands
108. Histogram
• It is very similar to the bar chart with the
difference that the rectangles or bars are adherent
(without gaps).
• It is used for presenting class frequency table
(continuous data).
• Used for Quantitative, Continuous, Variables.
• It is used to present variables which have no gaps e.g
age, weight, height, blood pressure, blood sugar etc.
• It consist of a series of blocks. The class intervals are
given along horizontal axis and the frequency along the
vertical axis.
110. Frequency Polygon
• Derived from a histogram by connecting the mid
points of the tops of the rectangles in the
histogram.
• The line connecting the centers of
histogram rectangles is called frequency polygon.
• We can draw polygon without rectangles so we
will get simpler form of line graph.
• A special type of frequency polygon is the
Normal Distribution Curve.
115. Cumulative frequency diagram or O’give
• Here the frequency of data in each category
represents the sum of data from the category
and the preceding categories.
• Cumulative frequencies are plotted opposite
the group limits of the variable.
• These points are joined by smooth free hand
curve to get a cumulative frequency diagram
or Ogive.