2. Statistics Defined
Statistics is a branch of science that
deals with collecting, sorting, editing,
analyzing, interpreting, and storing
data, information, knowledge, etc
2
3. Two Branches of Statistics
Descriptive statistics
◦ Involves those methods involving the collection, presentation
and characterization of a set of data in order to properly
describe the various features of that set of data
◦ Organize, summarize, and communicate numerical information
Inferential statistics
◦ Involves those methods that make possible the estimation of
a characteristic of a population or the making of a decision
concerning a population based only on sample results
◦ Use representative sample data to draw conclusions about a
population
◦ The fundamental concepts of statistical inference consist of
two major areas known as parameter estimation and
hypothesis testing.
3
4. Branches of Statistics
Descriptive: M = 80.2, SD = 4.5
◦ Describes the average score on the first test
Inferential: t(45) = 4.50, p = .02, d = .52
◦ Infers that this score is higher than a normal statistics average
4
5. Samples and Populations
A population is the whole set of measurements or counts
about which we want to draw conclusion.
◦ Could be any size
A sample is a set of observations drawn from a subset of
the population of interest OR a sub set of a population, a
set of some of the measurements or counts that comprises
the population
◦ A portion of the population
Sample results are used to estimate the population
5
6. Samples and Populations
So, why would we use samples rather than test everyone?
◦ What would be more accurate?
◦ What would be more efficient?
6
7. Accuracy Vs Precision
◦Accuracy and precision are used synonymously in
everyday speech, but in statistics they are defined more
rigorously.
◦Precision is the closeness of repeated
measurements
Where as
◦Accuracy is the closeness of a measured or
computed value to its true value
7
8. Statistics = Numbers
Mostly, statistics is all about numbers.
So … how can we make these observations into numbers?
◦ Think about all the different types of things you can measure…
8
9. Hypothesis
Hypothesis is an assertion or conjecture
concerning one or more populations.
The truth or falseness of a statistical
hypothesis is never known with absolute
certainty unless the entire population is
examined
9
10. Hypothesis
◦The structure of the hypothesis testing will be
formulated with the use of the term null
hypothesis.
◦This refers to any hypothesis to be tested and is
denoted by H0.
H0: 1 = 2 = 3
◦ The rejection of H0 leads to the acceptance
of an alternate hypothesis, denoted by H1 or
HA.
H1: Not all means are equal
10
11. Variables
Variables
◦ Observations that can take on a range of values
◦ An example: Reaction time in the Stroop Task
◦ The time to say the colors compared to the time to say the
word
11
12. Sources of Data
Primary Vs. Secondary Data Sources
◦ There are many methods by which researchers can get the
required data set.
◦ Firstly, they may seek data already published by governmental
organizations (ministries, departments, agencies, etc.) or by
non-governmental organization (international research and
development organizations, regional networks, private
companies, etc.).
◦ Such sources of data are categorized as secondary data
sources.
◦ A second method of obtaining data is through designed
experiments, dubbed as primary data sources.
12
13. Types of Variables
Qualitative Variables
◦ Variables used when the characteristic under study concerns a
traits/characters that can only be classified in categories and
not numerically measured.
◦ The resulting data are called categorical data.
◦ Color, employment status and blood types are few examples.
13
14. Types of Variables
Quantitative Variables
◦ If a characteristic is measured on a numerical scale, the resulting
data consist of a set of numbers and are called measurement
data.
◦ The term ‘quantitative variable’ is used to refer to a
characteristic that is measured on numerical scale.
◦ A few examples of numerically valued variables are height, weight
and yield.
◦ The variables that can only take integers are called discrete
variables.
◦ The name discrete is drawn from the fact that the scale is made
up of distinct numbers with gaps.
◦ On the other hand, variables that can take any value in an
interval are called continuous variables.
14
15. Types of Variables
Discrete
◦ Variables that can only take on specific values
◦ Number of students
◦ Tricky part … we can assign discrete values to things we’d
normally consider words.
◦ Political party
15
17. More Classification of Variables
Discrete quantitative data are numerical responses, which arise
from a counting process, while continuous quantitative data are
numerical responses, which arise from a measuring process.
Discrete Variables
◦ Nominal: is the simplest and most elementary type of
measurement where numbers are assigned for the sole purpose of
differentiating one object from another. When numbers are used in
a nominal scale, it cannot be added them together, or it is not
possible to calculate an average, because the scale does not have
the necessary properties to do so
◦ Ordinal: implies the measurement that has the property of order.
Here one object can be differentiated from the other and the
direction of the difference can also be specified. Statements like
‘more than’ or ‘less than’ can be used because the measuring
system has the property of order
◦ ranking of data
17
18. More Classification of
Variables
Continuous Variables
◦ Interval: used with numbers that are equally spaced
◦ Interval scale is known for its character to have equality of
units. There are equal distances between observation points
on the scale. This scale specifies not only the direction of the
difference, as in the ordinal scale, but also indicates the
amount of the difference as well.
◦ Ratio: has all the characteristics of interval scale plus an
absolute zero. With an absolute zero point, statements can
be made on ratios of two observations, such as ‘twice as long’
or ‘half as fast’. Most physical scales such as time, length and
weight are ratio scales.
18
19. Examples of Variables
Nominal: name of cookies
Ordinal: ranking of favorite cookies
Interval: temperature of cookies
Ratio: How many cookies are left?
19
20. A distinction
The previous information talks about the type of number
you have with your variable.
◦ This type leads to the type of statistical test you should use
20
21. Variables
Independent Variables (IVs)
◦ Variable you manipulate or categorize
◦ For a true experiment: must be manipulated – meaning you
changed it
◦ Generally dichotomous variables (nominal) like experimental
group versus control group
◦ For quasi experiment: used naturally occurring groups, like
gender
◦ Still dichotomous, but you didn’t assign the group
21
22. Variables
Independent Variables
◦ Special case: when IVs are categorical, the groups are called
levels
◦ If political party is an IV, levels could be Democrat or
Republican
22
23. Variables
Dependent Variables (DVs)
◦ The outcome information, what you measured in the study to
find differences/changes based on the IV
◦ Generally, these are interval/ratio variables (t-tests, ANOVA,
regression), but you can use nominal ones too (chi-square)
23
24. Variables
Confounding Variables
◦ Variables that systematically vary with the IV so that we cannot
logically determine which variable is at work
◦ Try to control or randomize them away
◦ Confounds your other measures!
24
25. Reliability and Validity
A reliable measure is consistent
◦ Measure your height today and then again tomorrow
Standardized tests are supposed to be reliable
25
26. Reliability and Validity
A valid measure is one that measures what it was intended
to measure
◦ A measuring tape should accurately measure height
A good variable is both reliable and valid
◦ How do we measure this?
26
27. Hypothesis Testing
Process of drawing conclusions about whether a
relationship between variables is supported or not
supported by the evidence
27
28. Types of Research Designs
Experiments: studies in which participants are randomly
assigned to a condition or level of one or more independent
variables
28
30. One Goal, Two Strategies
Between-groups designs
◦ Different people complete the tasks, and comparisons are
made between groups
Within-groups designs
◦ The same participants do things more than once, and
comparisons are made over time
30
31. Other Research Designs
Not all research can be done through experimentation
◦ Unethical or impractical to randomly assign participants to
conditions
Correlational studies do not manipulate either variable
◦ Variables are assessed as they exist
◦ Cannot determine causality
31
32. Correlation Analysis
Correlation analysis attempts to measure the strength of
relationships between two variables by means of a single number
called a correlation coefficient.
It is important to understand the physical interpretation of this
correlation coefficient and the distinction between correlation
and regression.
Correlation coefficients close to +1 or –1 indicate a close fit to a
straight line (strong correlation) and values closer to zero indicate
a very poor fit to a straight line or no correlation.
There is no convention as to what values of correlation should be
described as strong or weak. The negative correlation values tell
that the values of one variable tend to get larger as the values of
the variable get smaller and vice versa.
32
33. Regression Analysis
Regression is similar to correlation in that testing for a linear
relationship between two types of measurements is made on the same
individuals.
However, regression goes further in that we can also produce an
equation describing the line of best fit through the points on the graph.
Regression analysis concerns the study of the relationships between
variables with the objective of identifying, estimating and validating the
relationship.
When using regression analysis, unlike in correlation, the two variables
have different roles. Regression is used when the value of one of the
variables is considered to be dependent on the other, or at least reliably
predicted from the other.
In correlation, we take measurement on individuals at random for both
variables, but in regression we usually choose a set of fixed values for
the independent variable (the one controlling the other).
33
35. Outlier Analysis
Outlier: an extreme score - very high or very low compared
to the rest of the scores
Outlier analysis: study of the factors that influence the
dependent variable
35