2. Introduction:
COURSE OBJECTIVE:
•
•
The objective of this paper is to acquaint the students with various statistical tools and techniques used
to business decision making
•
•
•
Unit-I
Construction of frequency distributions and their analysis in the form of
measures of central tendency
and variations; types of measures, their relative merits, limitations and
characteristics; skewness :
meaning and co-efficient of skewness.
•
•
•
•
•
Unit-II
Correlation analysis - meaning & types of correlation, Karl Pearson’s coefficient of correlation and
spearman’s rank correlation; regression analysis -meaning and two lines of regression; relationship
between correlation and regression co-efficients. Time series analysis - measurement of trend and
seasonal variations; time series and forecasting.
•
3. Introduction:
•
•
•
•
•
Unit-III
Probability: basic concepts and approaches, addition, multiplication and Bayes’
theorem. Probability
distributions - meaning, types and applications, Binomial, Poisson and Normal
distributions.
Unit-IV
Tests of significance; Hypothesis testing; Large samples, Small samples: Chi-square
test, Analysis of variance.
4. STATISTICS
THE SCIENCE OF COLLECTING, ORGANIZING,
PRESENTING, ANALYZING AND
INTERPRETING DATA TO ASSIST IN
MAKING MORE EFFECTIVE DECISIONS
6. Statistical Concepts
• Population- The entire set of individuals or
objects of intrest
• Sample- A portion or part of the population of
interest
• Why we do sampling?
7. Purpose of Sampling
• To contact the whole population would be
time consuming
• Cost would be prohibitive
• Physical Impossibility of checking all items in a
population- Like we cannot test all water in
Ganga for pollution
• Destructive Nature of some tests- stress test
• Sample Results are adequate
8. Characteristics of a Good Sample
•
•
•
•
Representativeness
Adequate Size
Replication
Precision of research study matched to
sample precision
9. Different Types of Sampling
•
•
•
•
•
Random or Statistical Sampling
Convenience Sampling
Purposive Sampling
Snowball Sampling
Multistage sampling
10. Different Types of Random
Sampling
• Simple Random Sample: A sample selected so
that each item or person in the population has
the same chance of being included
11. Different Types of Random
Sampling
• Systematic Random Sample: A random
starting point is selected and the every kth
member of the population is selected
12. Different Types of Random
Sampling
• Stratified Random Sample: A population is
divided into subgroups, called strata and a
sample is randomly selected from each
stratum
13. Different Types of Random
Sampling
• Cluster Sample: A population is divided into
clusters using naturally occurring geographic
or other boundaries. The clusters are
randomly selected and a sample is collected
by randomly selecting from each cluster.
Suppose, we divide delhi into 6 regions (E, W, N, S, SE and others) and randomly selected 3 regions N, E, SE and
take sample of residents in each of the region.
14. Types of Variables
Two Types of Basic Variable
• Qualitative : non-numeric (Gender, Religion,
State of Birth, color of skin)
• Quantitative: numeric
15. Types of Variables
Qunantitative
• Discrete : can assume only certain values and
there are gaps between the values. (number
of rooms in a house, no of children)
• Continuous: Can assume any value within a
specified range (CGPA, rate of interest, weight
of an individual)
16. Measurement
Measurement means assigns numbers or other symbols to
characteristics of objects according to certain pre-specified
rules. We donot measure objects, but some characteristics of
it. In marketing research, we donot measure consumers, but
their perceptions, attitudes, preferences.
Numbers permit statistical analysis and it faciliate
communication of measurement rules and results.
17. Levels of Measurement
Data can be classified according to the level of measurements.
Level of measurement dictates the type of calculation that
can be done to summarise and present the data.
There are 4 levels of measurement:
• Nominal
• Ordinal
• Interval
• Ratio
18. Levels of Measurement
NominalObservation of a qualitative variable can only be classified and
counted. There is no particular logical order to the labels.
Like: Colour of chocolate bar
Gender of students
19. Levels of Measurement
Ordinal1.Data classifications are represented by sets of labels or names
(high, medium low) that have relative values.
2. Because of the relative value, the data classified can be
ordered or ranked
3. But we cannot say the magnitue of difference between the
labels.
Like: Rating
20. Levels of Measurement
Interval1. Next Higher level- it possses all the qualities of ordinal level, in
addition the difference between values is a constant size.
Like: Temperature, dress size
21. Levels of Measurement
Ratio1. Practically all quantitative data is recorded in ratio level. It is
highest level. In addition, the 0 point is meaningful and the
ratio between two numbers is meaningful
Like: Money, weight (0 point in the scale is important and ratio
of two numbers is important. If A is earning $20,000, and B is
earning $40,000, then B is earning twice as much as A)
22. Hypothesis
• A statement about a population parameter subject to
verification
• Data are used to check the reasonable of the hypothesis
• In statistical analysis, we make a claim (hypothesis), collect
the data and use the data to test the hypothesis
23. Hypothesis
• Hypothesis are derived form research problem and research
questions. Hypothesis should pass the following test
• Relevant
is pertinent to the issue; provides new insights; if true,
helps explain what’s going on
• Specific
is detailed enough to provide value and direction, but
not so general as to be “universal true-isms”
• Testable
can be fully investigated within the time and resources available; is
not stated in the future tense, since it is not possible to test the future
• Coverage
together a set of hypotheses is “necessary & sufficient”
to completely answer the issue question
24. Forming Proper Hypothesis
Apply the RSTC test to the following issue and hypotheses
• Issue: What criteria are most important to business travelers selecting a hotel?
• Hypotheses:
– Spacious rooms with upgraded features, broadband access and premiere
loyalty programs are features most desired by business travelers.
– Business travelers will demand better hotel service in the future.
– Vacation travelers prefer all-inclusive resorts by oceans or mountains by a 2-to1 margin.
– To grow revenue and profit, Canyon Sky Hotels must get itself included on
corporate and travel agent preferred hotel lists.
25. Forming Proper Hypothesis
Apply the RSTC test to the following issue and hypotheses
• Issue: What criteria are most important to business travelers selecting a hotel?
• Hypotheses:
– Spacious rooms with upgraded features, broadband access and premiere
loyalty programs are features most desired by business travelers.
– Business travelers will demand better hotel service in the future.
– Vacation travelers prefer all-inclusive resorts by oceans or mountains by a 2-to1 margin.
– To grow revenue and profit, Canyon Sky Hotels must get itself included on
corporate and travel agent preferred hotel lists.
26. Types of Hypothesis
Null Hypothesis: A statement about the value of a population parameter developed for
the purpose of testing numerical evidence
Alternate Hypothesis- A statement that is accepted if sample data provides sufficient
evidence that the null hypothesis is false.
Example:
The mean age of Indian commercial aircarft is 20 years.
Ho: Mean = 20
H1: Mean not equal to 20
= sign always appear in Null Hypothesis but never in alternate hypotheis
27. Hypothesis testing
A procedure based on sample evidence and probability theory to determine whether
the hypothesis is a reasonable statement.
Five step procedure of Testing a Hypothesis:
• State Null and alternate hypothesis
• Select a level of significance
• Identify the test statistic
• Formulate a decision rule
• Take a sample, arrive at a decision
• Donot reject Ho or (reject Ho and accept H1)
28. Hypothesis testing
A procedure based on sample evidence and probability theory to determine whether
the hypothesis is a reasonable statement.
Five step procedure of Testing a Hypothesis:
• State Null and alternate hypothesis
• Select a level of significance
• Identify the test statistic
• Formulate a decision rule
• Take a sample, arrive at a decision
• Donot reject Ho or (reject Ho and accept H1)
29. Hypothesis testing
Level of Significance:
The probability of rejecting the null hypothesis, when it is true.
It is also called the level of risk.
The researcher needs to select level of significance for his tests, generally for consumer
research it is .05, for quality assurance it is 0.01 and for polling, it is 0.1
33. Statistics
Descriptive Statistics:
Method of oraganising, summarizing and presenting data in an
informative way
Inferential Statistics- the method used to estimate the property
of a population based on a sample
34. Describing Data -Qualitative
Frequency Table: A grouping of qualitative data into
mutually exclusive classes showing the number of
observations in each class.
The number of observations in each class in called class
frequency
35. Describing Data- Qualitative
Bar Chart: A graph in which the classes are reported on
the horizontal axies and the class frequencies on the
vertical axis. The class frequencies are proportional
to the heights of the bars.
37. Describing Data- Quantitative
Frequency distribution: A grouping of data into
mutually exclusive classes showing the number of
observations in each class.
Class interval- Difference between lower limit of the
class and lower limit of the next class.
Class midpoint- halfway between the lower limit of the
two consecutive classes/
38. Descriptive Statistics
Measurements of Central Tendency
Mean:
Weighted Mean:
Weakness of mean- it gets impacted by one or two very
large or small values – in that case, mean might not
represent the appropriate average data.
39. Descriptive Statistics
Measurements of Central Tendency
Median: Midpoint of the values after they have been
ordered from the smallest to the largest, or the
largest to the smallest.
Advantages:
1. It is not impacted by extremely large or small values.
2. It can be computed for ordinal level data or higher.
40. Descriptive Statistics
Mode : the value of the observation that appear most
frequently.
Advantages:
1. It is not impacted by extremely large or small values.
2. It can be computed even for nominal.
Disadvantage:
For some data, there is no mode.
42. Descriptive Statistics
Measurements of Dispersion
Why do we need to measure dispersion?
Mean, Median, mode- only describes the centre of the
data, but it doesnot tell about the spread of the data.
43. Dispersion
Suppose, you are crossing a river and the mean depth is
3 meters.
There are two scenarios:
A. Depth of the river ranges from 3.25 to 2.75
B.
Depth of river ranges from 6 to 1
44. Dispersion
A small value for a measure of dispersion indicates that
the data are clustered closely and the mean is
considered representative of data.
A large measure of dispersion indicates that the mean is
not reliable
45. 2nd use of Dispersion
We can compare the spread in two data sets.
Example: Factory output
46. Measure of Dispersion
Range:
Range = Largest Value- Smallest value
Coeficient of Range: (H-L)/(H+L)
The problem is that it is based on only two values,
largest and smallest and doesnot consider other
values
47. Measure of Dispersion
Mean Deviation:
The arithmetic mean of the absolute values of the
deviation from the mean.
Example:
48. Measure of Dispersion
Standard Deviation and variance:
Variance is the most popular method of dispersion.
Variance: The arithmetic mean of the squared
deviation from the mean.
Std Deviation is the square root of the variance.
49. Other measures of Dispersion
Here, we try to determine the location of values that divide a set
of observations into equal parts. They are called quartiles,
deciles and percentiles.
Quartiles divide a set of observations into 4 equal parts. First
quartile is called Q1
50. Skewness
It talks about the shape of the data.
There are four shapes commonly observed: symmetric, positively
Skewed, negatively skewed and bimodal.
Bimodal has two or more peaks.
Peason’s coefficient of skewness sk = (3(Mean- median))/
standard deviation.
It would be 0, when mean = median and it can vary from -3 to +3
51. Statistical Inference
Study of two variables (also called Multivariate Data
Analysis)
Relationship between two variables
–
–
–
Is the relationship strong or week?
Is it direct or Inverse?
Can we develop an equation to express the relationship
between two variables?
52. Typical Examples
•
•
•
Is there a relationship between the amount HUL
spends per month on advertising and its sales in
that month?
Is there a relationship between the number of
hours students studied for an exam and the score
earned?
Two most widely used analysis are corelation and
regression
53. Correlation
•
•
•
A group of techniques to measure the association
between the two variables
Plotting the data in scatter diagram
Examples of sales call and sales made
54. Dependent and Independent Variable
Dependent Variable- The variable that is being
predicted or estimated. It is scaled on Y-axis.
Independent Variable- The variable that provides the
basis for estimation. It is the predictor variable. It is
plotted on X-Axis
55. The coefficient of Correlation
Originated by Karl Pearson in 1900, the coefficient of
correlation describes the strength of the
relationship between two sets of interval or ratio
scaled variables.
Designated r, it is called Pearson’s r and can assume any
value from -1 to +1
56. The coefficient of Correlation
•
What does a correlation coefficient of +1 mean?
•
What does a correlation coefficient of -1 mean?
•
What does a correlation coefficient of 0 mean?
58. How to Calculate Coefficient of
determination
The proportion of the total variation in the dependent
variable Y that is explained or accounted for, by the
variation in the independent variable X.
It is computed by squaring the coefficient of correlation.
This is more precise measure, instead of strong, weak
correlation.
Like 57.6% of the variation in the number of copiers sold
is explained or accounted for by the variation of
number of sales call
59. •
•
•
Correlation and cause
Spurious correlation
When we find that two variables with a strong
correlation is that there is a relationship or
association between two variables, not that a
change in one causes a change in the other.
60. Probability
Special Rules of Addition:
P(A or B) = P(A) + P(B)
P(A or B or C)= P(A) + P(B) + P(C)
To apply this rule, the events must be mutually
exclusive
61. Probability
General Rules of Addition:
When events are not mutually exclusive
P(A or B) = P(A) + P(B)- P(A and B)
P(A and B) here is called Joint probability, a probability
that measures the likelihood of two or more events
will happen cocurrently.
The concept of Venn Diagram
62. Probability
Special Rules of Multiplication:
This is for combining two events, likelihood that two
events both happen (example: a person is 21 years
old and buy Pepsi)
When two or more events are independent
P (A and B)= P(A) P(B)
P(A and B and C)= P(A) P(B) P(C)
64. Probability
General Rule of Multiplication:
P(A and B) = P(A) P(B/A)
For two events, A and B, the joint probability that both
events will happen is found by multiplying the
probability that event A will happen by the
conditional probability of event B occurring given
that A has occurred.
66. Bayes’ Theorem
P(A1/B) = P(A1) P(B/A1)
P(A1)P(B/A1)+ P(A2) P(B/A2)
Prior Probability- The initial probability based on the
present level of information
Posterior Probability- A revised probability based on
additional information