2. WHAT IS STATISTICS?
Statistics is concerned with developing and studying different
methods for collecting, analyzing and presenting the empirical data.
The field of statistics is composed of two broad categories-
Descriptive and inferential statistics.
3. Population
Population is the group that is targeted to collect the
data from. Our data is the information collected from
the population. Population is always defined first,
before starting the data collection process for any
statistical study. Population is not necessarily be people
rather it could be batch of batteries, measurements of
rainfall in an area or a group of people.
Parameter: A numerical value summarizing all the data
of an entire population.
4. Sample
• It is the part of population which is selected
randomly for the study. The sample should be
selected such that it represents all the characteristics
of the population. The process of selecting the
subset from the population is called sampling and the
subset selected is called the sample.
• Statistic: A numerical value summarizing the sample
data.
5. Example: A college dean is interested in learning about the average age
of faculty. Identify the basic terms in this situation.
The population - all faculty members at the college.
A sample - might select 10% faculty members.
The parameter of interest is the “average” age of all faculty members
at the college.
The statistic is the “average” age for the sample.
6. Descriptive Statistics
Descriptive statistics includes statistical procedures that we use
to describe the population we are studying. The data could be
collected from either a population or a sample but the results
help us organize and describe data numerically or graphically.
Descriptive statistics can only be used to describe the group that
is being studying. That is, the results cannot be generalized to
any larger group.
7. Inferential Statistics
• Inferential statistics is concerned with making
predictions or inferences about a population from
observations and analyses of a sample. That is, we
can take the results of an analysis using a sample and
can generalize it to the larger population that the
sample represents. In order to do this, however, it is
imperative that the sample is representative of the
group to which it is being generalized.
9. Types of variables
• Quantitative variables are measured and expressed
numerically, have numeric meaning, and can be used
in calculations. (That’s why another name for them is
numerical variables.)
• Although zip codes are written in numbers, the
numbers are simply convenient labels and don’t have
numeric meaning (for example, you wouldn’t add
together two zip codes).
10. Types of variables
• A qualitative or categorical variable doesn’t have
numerical or quantitative meaning but simply
describes a quality or characteristic of something.
• The numbers used in qualitative or categorical data
designate a quality rather than a measurement or
quantity. For example, you can assign the number 1
to a person who’s married and the number 2 to a
person who isn’t married. The numbers themselves
don’t have meaning — that is, you wouldn’t add the
numbers together.
11. Example
1. The amount of gasoline pumped by the next 10 customers at
the SHELL pump station. (quantitative)
2. The color of the baseball cap worn by each of 20 students.
(qualitative)
3. The length of time to complete a mathematics homework
assignment. (quantitative)
4. The state in which each truck is registered when stopped and
inspected at a weigh station. (qualitative)
12. Two Types of Quantitative Variables
1. Discrete - Discrete variables are numeric variables
that have a countable number of values between any
two values. A discrete variable is always numeric. For
example, the number of customer complaints or the
number of flaws or defects.
2. Continuous - Continuous variables are numeric
variables that have an infinite number of values
between any two values. A continuous variable can be
numeric or date/time. For example, the length of a
part or the date and time a payment is received.
13. Lets Try
Identify of the following as qualitative/categorical or
quantitative. If quantitative, continuous or discrete?
1) Length of a pen?
2) Type of pen?
3) Number of pens in box?
4) Flow of ink in ml/sec?
5) Color of pen’s ink?
Quantitative, Continuous
qualitative
Quantitative, Discrete
Qualitative
Quantitative, Continuous
Qualitative
15. The Levels of Measurement
• Nominal : A qualitative variable that categorizes (or
describes, or names) an element of a population.
Unordered categorical variables. These can be either binary
(only two categories), ) or multinomial (more than two
categories). The key thing here is that there is no logical
order to the categories.
• Example
• gender: male or female,
• marital status: married, divorced, never married, widowed,
separated.
16. The Levels of Measurement
• Ordinal : A qualitative variable that incorporates an
ordered position, or ranking. Still categorical, but in an
order. Distances between categories do not have any
meaning.
• Example:
• High school class ranking: 1st, 9th, 87th…
• Socioeconomic status: poor, middle class, rich.
17. The Levels of Measurement
• Interval : Numerical values without a true zero point.
Example : IQ and temperature.
• Intervals of equal length signify equal differences in
the characteristic. " The difference in 90° and 100°
Fahrenheit is the same as the difference between 80°
and 90° Fahrenheit.
• Occurs when a numerical scale does not have a ‘true
zero’ start point. "Zero does not signify an absence of
the characteristic. "Does 0° Fahrenheit represent an
absence of heat?”
18. The Levels of Measurement
• Ratio : Same as the interval scale except that the zero
on the scale means: does not exist. Zero represents the
total absence of the variable being measured.
• Example Age, Weight, Height, Sales Figures, Income
earned in a week, Number of children.
19. Lets try
A study was conducted to assess student eating patterns in high
institutions in Malaysia. The study analyzed the impact of vending
machines and institution policies on student food consumption. A total
of 1088 students in 20 institutions were surveyed. Determine the level
of measurement of the following variables considered in the study.
a. Number of snack and soft drink vending machines.
b. Whether or not the institution has a closed campus policy during
lunch.
c. Class rank (Freshman, Sophomore, Junior, Senior).
d. Number of days per week a student eats lunch.
20. Data
• Data refers to a set of values, which are usually
organized by variables (what is being measured) and
observational units (members of the
sample/population).
21. Data Collection
• Data collection is a process of collecting information
from all the relevant sources to find answers to the
research problem, test the hypothesis and evaluate
the outcomes. Data collection methods can be
divided into two categories: secondary methods of
data collection and primary methods of data
collection.
22. Secondary Data
• Secondary data is a type of data that has already been
published in books, newspapers, magazines, journals,
online portals etc. There is an abundance of data
available in these sources about your research area in
business studies, almost regardless of the nature of
the research area. Therefore, application of
appropriate set of criteria to select secondary data to
be used in the study plays an important role in terms
of increasing the levels of research validity and
reliability.
23. Primary Data
• Primary data collection methods can be divided into two
groups: quantitative and qualitative.
• Quantitative data collection methods include questionnaires
with closed-ended questions, experiments.
• Qualitative data collection methods include interviews,
questionnaires with open-ended questions, focus groups,
observation, game or role-playing, case studies etc.
24. Sampling Frame
• Prior to selecting a sample you need to define a
sampling frame, which is a list of all the units of the
population of interest. You can only apply your
research findings to the population defined by the
sampling frame.
• Units (also referred to as cases) that you are
interested in studying. Units could be people, school,
organizations, or existing documents.
25. Sampling Techniques
• Sample Design: The process of selecting sample
elements from the sampling frame.
• There are lot of sampling techniques which are
grouped into two categories.
• Probability Sampling and non-Probability Sampling
26. Probability Sampling
• This Sampling technique uses randomization to make
sure that every element of the population gets an
equal chance to be part of the selected sample.
27. Simple Random Sampling
• Every element has an equal chance of getting selected to be the
part sample. It is used when we don’t have any kind of prior
information about the target population.
• Example: Random selection of 20 students from class of 50
student. Each student has equal chance of getting selected.
Here probability of selection is 1/50
28. Stratified Sampling
• This technique divides the elements of the population into small
subgroups (strata) based on the similarity in such a way that the elements
within the group are homogeneous and heterogeneous among the other
subgroups formed. And then the elements are randomly selected from
each of these strata. We need to have prior information about the
population to create subgroups.
• Stratified random sampling is also called proportional random sampling..
• Some of the most common strata used in stratified random sampling
include age, gender, religion, race, educational attainment, socioeconomic
status, and nationality.
29. Cluster Sampling
• Our entire population is divided into clusters or sections and
then the clusters are randomly selected. All the elements of
the cluster are used for sampling. Clusters are identified using
details such as age, gender, location etc.
• Cluster sampling can be done in following ways:
• Single Stage Cluster Sampling
-Entire cluster is selected randomly for sampling.
30. Cluster Sampling
• Two Stage Cluster Sampling
-Here first we randomly select clusters and then from
those selected clusters we randomly select elements for
sampling.
31. Systematic Sampling
• Here the selection of elements is systematic and not random except the first element. Elements
of a sample are chosen at regular intervals of population. All the elements are put together in a
sequence first where each element has the equal chance of being selected.
• For a sample of size n, we divide our population of size N into subgroups of k elements.
• We select our first element randomly from the first subgroup of k elements.
• To select other elements of sample, perform following:
• We know number of elements in each group is k i.e N/n
• So if our first element is n1 then
• Second element is n1+k i.e n2
• Third element n2+k i.e n3 and so on..
• example
• N=20, n=5, No of elements in each of the subgroups is N/n i.e 20/5 =4= k. Now, randomly
select first element from the first subgroup.
• If we select n1= 3
• n2 = n1+k = 3+4 = 7
• n3 = n2+k = 7+4 = 11
32. Multi-Stage Sampling
• It is the combination of one or more methods described above.
• Population is divided into multiple clusters and then these
clusters are further divided and grouped into various sub groups
(strata) based on similarity. One or more clusters can be
randomly selected from each stratum. This process continues
until the cluster can’t be divided anymore. For example country
can be divided into states, cities, urban and rural and all the
areas with similar characteristics can be merged together to
form a strata.
33. Non-Probability Sampling
• It does not rely on randomization. It’s about the
researcher’s ability to select elements for a sample.
Outcome of sampling might be biased and makes
difficult for all the elements of population to be part
of the sample equally. This type of sampling is also
known as non-random sampling.
34. Quota Sampling
• This type of sampling depends of some pre-set standard.
It selects the representative sample from the population.
Proportion of characteristics in sample should be same as
population. Elements are selected until exact proportions
of certain types of data is obtained or sufficient data in
different categories is collected.
• For example: If our population has 45% females and 55%
males then our sample should reflect the same percentage
of males and females.
35. Referral /Snowball Sampling
• This technique is used in the situations where the population is
completely unknown and rare.
• Therefore we will take the help from the first element which we
select for the population and ask him to recommend other
elements who will fit the description of the sample needed.
• So this referral technique goes on, increasing the size of
population like a snowball.
• For example: It’s used in situations of highly sensitive topics like
HIV Aids where people will not openly discuss and participate in
surveys to share information about HIV Aids.
• Not all the victims will respond to the questions asked so
researchers can contact people they know or volunteers to get in
touch with the victims and collect information
37. A manager associated each employee's name with a number on one ball in a
container, then drew balls without looking to select a sample of 55 employees.
What type of sample is this?
A Simple random sample
B Stratified random sample
C Cluster random sample
D None of the above
38. A school chooses 33 randomly selected athletes from each
of its sports teams to participate in a survey about athletics
at the school. What type of sample is this?
A Simple random sample
B Stratified random sample
C Cluster random sample
D None of the above
39. An airline company wants to survey its customers one
day, so they randomly select 5 flights that day and
survey every passenger on those flights.
What type of sample is this?
A Simple random sample
B Stratified random sample
C Cluster random sample
D None of the above