An Introduction to
Statistics is a branch of science that
deals with collecting, sorting, editing,
analyzing, interpreting, and storing
data, information, knowledge, etc
Two Branches of Statistics
◦ Involves those methods involving the collection, presentation
and characterization of a set of data in order to properly
describe the various features of that set of data
◦ Organize, summarize, and communicate numerical information
◦ Involves those methods that make possible the estimation of
a characteristic of a population or the making of a decision
concerning a population based only on sample results
◦ Use representative sample data to draw conclusions about a
◦ The fundamental concepts of statistical inference consist of
two major areas known as parameter estimation and
Branches of Statistics
Descriptive: M = 80.2, SD = 4.5
◦ Describes the average score on the first test
Inferential: t(45) = 4.50, p = .02, d = .52
◦ Infers that this score is higher than a normal statistics average
Samples and Populations
A population is the whole set of measurements or counts
about which we want to draw conclusion.
◦ Could be any size
A sample is a set of observations drawn from a subset of
the population of interest OR a sub set of a population, a
set of some of the measurements or counts that comprises
◦ A portion of the population
Sample results are used to estimate the population
Samples and Populations
So, why would we use samples rather than test everyone?
◦ What would be more accurate?
◦ What would be more efficient?
Accuracy Vs Precision
◦Accuracy and precision are used synonymously in
everyday speech, but in statistics they are defined more
◦Precision is the closeness of repeated
◦Accuracy is the closeness of a measured or
computed value to its true value
Statistics = Numbers
Mostly, statistics is all about numbers.
So … how can we make these observations into numbers?
◦ Think about all the different types of things you can measure…
Hypothesis is an assertion or conjecture
concerning one or more populations.
The truth or falseness of a statistical
hypothesis is never known with absolute
certainty unless the entire population is
◦The structure of the hypothesis testing will be
formulated with the use of the term null
◦This refers to any hypothesis to be tested and is
denoted by H0.
H0: 1 = 2 = 3
◦ The rejection of H0 leads to the acceptance
of an alternate hypothesis, denoted by H1 or
H1: Not all means are equal
◦ Observations that can take on a range of values
◦ An example: Reaction time in the Stroop Task
◦ The time to say the colors compared to the time to say the
Sources of Data
Primary Vs. Secondary Data Sources
◦ There are many methods by which researchers can get the
required data set.
◦ Firstly, they may seek data already published by governmental
organizations (ministries, departments, agencies, etc.) or by
non-governmental organization (international research and
development organizations, regional networks, private
◦ Such sources of data are categorized as secondary data
◦ A second method of obtaining data is through designed
experiments, dubbed as primary data sources.
Types of Variables
◦ Variables used when the characteristic under study concerns a
traits/characters that can only be classified in categories and
not numerically measured.
◦ The resulting data are called categorical data.
◦ Color, employment status and blood types are few examples.
Types of Variables
◦ If a characteristic is measured on a numerical scale, the resulting
data consist of a set of numbers and are called measurement
◦ The term ‘quantitative variable’ is used to refer to a
characteristic that is measured on numerical scale.
◦ A few examples of numerically valued variables are height, weight
◦ The variables that can only take integers are called discrete
◦ The name discrete is drawn from the fact that the scale is made
up of distinct numbers with gaps.
◦ On the other hand, variables that can take any value in an
interval are called continuous variables.
Types of Variables
◦ Variables that can only take on specific values
◦ Number of students
◦ Tricky part … we can assign discrete values to things we’d
normally consider words.
◦ Political party
Types of Variables
◦ Can take on a full range of values (usually decimals)
◦ How tall are you?
More Classification of Variables
Discrete quantitative data are numerical responses, which arise
from a counting process, while continuous quantitative data are
numerical responses, which arise from a measuring process.
◦ Nominal: is the simplest and most elementary type of
measurement where numbers are assigned for the sole purpose of
differentiating one object from another. When numbers are used in
a nominal scale, it cannot be added them together, or it is not
possible to calculate an average, because the scale does not have
the necessary properties to do so
◦ Ordinal: implies the measurement that has the property of order.
Here one object can be differentiated from the other and the
direction of the difference can also be specified. Statements like
‘more than’ or ‘less than’ can be used because the measuring
system has the property of order
◦ ranking of data
More Classification of
◦ Interval: used with numbers that are equally spaced
◦ Interval scale is known for its character to have equality of
units. There are equal distances between observation points
on the scale. This scale specifies not only the direction of the
difference, as in the ordinal scale, but also indicates the
amount of the difference as well.
◦ Ratio: has all the characteristics of interval scale plus an
absolute zero. With an absolute zero point, statements can
be made on ratios of two observations, such as ‘twice as long’
or ‘half as fast’. Most physical scales such as time, length and
weight are ratio scales.
Examples of Variables
Nominal: name of cookies
Ordinal: ranking of favorite cookies
Interval: temperature of cookies
Ratio: How many cookies are left?
The previous information talks about the type of number
you have with your variable.
◦ This type leads to the type of statistical test you should use
Independent Variables (IVs)
◦ Variable you manipulate or categorize
◦ For a true experiment: must be manipulated – meaning you
◦ Generally dichotomous variables (nominal) like experimental
group versus control group
◦ For quasi experiment: used naturally occurring groups, like
◦ Still dichotomous, but you didn’t assign the group
◦ Special case: when IVs are categorical, the groups are called
◦ If political party is an IV, levels could be Democrat or
Dependent Variables (DVs)
◦ The outcome information, what you measured in the study to
find differences/changes based on the IV
◦ Generally, these are interval/ratio variables (t-tests, ANOVA,
regression), but you can use nominal ones too (chi-square)
◦ Variables that systematically vary with the IV so that we cannot
logically determine which variable is at work
◦ Try to control or randomize them away
◦ Confounds your other measures!
Reliability and Validity
A reliable measure is consistent
◦ Measure your height today and then again tomorrow
Standardized tests are supposed to be reliable
Reliability and Validity
A valid measure is one that measures what it was intended
◦ A measuring tape should accurately measure height
A good variable is both reliable and valid
◦ How do we measure this?
Process of drawing conclusions about whether a
relationship between variables is supported or not
supported by the evidence
Types of Research Designs
Experiments: studies in which participants are randomly
assigned to a condition or level of one or more independent
Experiments and Causality
Experiments: able to make causal statements
◦ Control the confounding variables
Importance of randomization
One Goal, Two Strategies
◦ Different people complete the tasks, and comparisons are
made between groups
◦ The same participants do things more than once, and
comparisons are made over time
Other Research Designs
Not all research can be done through experimentation
◦ Unethical or impractical to randomly assign participants to
Correlational studies do not manipulate either variable
◦ Variables are assessed as they exist
◦ Cannot determine causality
Correlation analysis attempts to measure the strength of
relationships between two variables by means of a single number
called a correlation coefficient.
It is important to understand the physical interpretation of this
correlation coefficient and the distinction between correlation
Correlation coefficients close to +1 or –1 indicate a close fit to a
straight line (strong correlation) and values closer to zero indicate
a very poor fit to a straight line or no correlation.
There is no convention as to what values of correlation should be
described as strong or weak. The negative correlation values tell
that the values of one variable tend to get larger as the values of
the variable get smaller and vice versa.
Regression is similar to correlation in that testing for a linear
relationship between two types of measurements is made on the same
However, regression goes further in that we can also produce an
equation describing the line of best fit through the points on the graph.
Regression analysis concerns the study of the relationships between
variables with the objective of identifying, estimating and validating the
When using regression analysis, unlike in correlation, the two variables
have different roles. Regression is used when the value of one of the
variables is considered to be dependent on the other, or at least reliably
predicted from the other.
In correlation, we take measurement on individuals at random for both
variables, but in regression we usually choose a set of fixed values for
the independent variable (the one controlling the other).
Video game playing and aggression are related
No evidence that playing video games causes aggression
Outlier: an extreme score - very high or very low compared
to the rest of the scores
Outlier analysis: study of the factors that influence the
Apparemment, vous utilisez un bloqueur de publicités qui est en cours d'exécution. En ajoutant SlideShare à la liste blanche de votre bloqueur de publicités, vous soutenez notre communauté de créateurs de contenu.
Vous détestez les publicités?
Nous avons mis à jour notre politique de confidentialité.
Nous avons mis à jour notre politique de confidentialité pour nous conformer à l'évolution des réglementations mondiales en matière de confidentialité et pour vous informer de la manière dont nous utilisons vos données de façon limitée.
Vous pouvez consulter les détails ci-dessous. En cliquant sur Accepter, vous acceptez la politique de confidentialité mise à jour.