Understanding Data Analysis

INTRODUCTION.
• Analysis and interpretation of data is the most important phase of
research process.
• Data collection is followed by the analysis and interpretation of data,
where collected data are analyzed and interpreted in accordance with
study objectives.
• Analysis and interpretation of data includes
compilation,editing,coding, classification and presentation of data.
• The collected data are known as raw data, the raw data are
meaningless unless certain statistical treatment are given to them

DEFINITONS.
• Analysis is the process of organizing and synthesizing the data so as to
answer research questions and test hypothesis.
• Analysis is referred as a method of organizing data in such a way that
research questions can be answered and hypothesis can be tested.
• Analysis is the process of breaking a complex topic into smaller parts
to gain better understanding of it.

HYPOTHESIS
• Hypothesis is a tentative prediction or explanation of the relationship
between two variables or more

• Quantitative data: Quantitative research involves analysis of
numerical data, data is collected and analyzed by using descriptive or
inferential statistics.
• Qualitative data: Data is collected in descriptive form rather than
numerical form and analyzed by descriptive coding, indexing and
narrations.

ANALYSIS OF QUANTITATIVE DATA.
• Analysis of quantitative data deals with information collected during
research study ,which can be quantified , and statistical calculations
can be computed.

STEPS OF QUANTITATIVE DATA ANALYSIS.
• Data analysis process includes the following four steps:
1. Data preparation (cleaning and organizing data for analysis)
2. Describing The Data(Descriptive or Summary Statistic)
3. Drawing the inferences of Data(Inferential Statistics)
4. Interpretation of Data

1. Data preparation (cleaning and organizing
data for analysis)
• It involves logging or checking the data in, checking the data for
correctness, entering the data into the computer, transforming the
data, and documenting as well as developing a database structure to
integrate different measures.

Data preparation involves the following steps:
A. Compilation.
B. Editing.
C. Coding.
D. Classification.
E. Tabulation

Contd.
1. Compilation: It includes gathering together all the collected data in
a manner that a process analysis can be initiated.
2. Editing: It implies the checking of the gathered data for accuracy,
utility and completeness
3. Coding: Coding is important for analysis as numerous replies can be
reduced to a small number of classes through coding.
4. Classification: The classification of data is necessary as many
researches result in large volumes of raw data which must be
reduced to homogenous groups.

Contd…
• The classification of data could be:
Geographic classification: Areas of residence: Urban, semi-urban,
rural etc..
Chronological classification: Classified based on time period, such as
days, months, years,etc…
Qualitative classification: The data are classified based on certain
attributes such as gender, religion,type of diseases, etc….
Quantitative: Such as age, height, weight, income, Hb level, are
classifies based on quantitative classes such as: Monthly income in
rupees: <5000,5001-10000

Contd.
5. Tabulation: It is the recording of the classified data in accurate
mathematical terms. A table is a tabular representation of statistical
data. Basically tables are of 4 types:
1.Frequency distribution
2. Contingency Tables
3. Multiple response tables
4. Miscellaneous tables.

2.Describing the data(Descriptive or summary
statistics)
• Descriptive statistics is used to describe the basic features of data to
provide simple summaries about the sample and the measures used
in a study.
• Classification of the descriptive statistics that includes:
1.Measures to condense data(frequency and percentage distribution
through tabulation and graphic presentations
2.Measures of central tendency
3. Measures of dispersion
4.Measure of relationships(Correlation coefficient)

3. Drawing the Inferences of Data(Inferential Statistics)
• Inferential statistics helps in drawing inferences from the data e.g.,
finding the differences, relationships and association between two or
more variables by the help of the parametric and non parametric
statistical tests.
• The most commonly used inferential statistical tests are Z-test,t-test
,ANOVA, chi-square tests,etc
• An inference is a conclusion or judgment based on evidence.

Contd.
• Choice of inferential Statistical tests:
1.Type-I and type –II Errors
Type-1 error occurs when null hypothesis is rejected, when it should
have been accepted. It is also called alpha error. Type-II error occur
when null hypothesis is accepted ,when it should actually have been
rejected.

• In statistics, a Type I error is a false positive conclusion, while
a Type II error is a false negative conclusion.
Example:
You decide to get tested for COVID-19 based on mild symptoms.
There are two errors that could potentially occur:
• Type I error (false positive): the test result says you have
coronavirus, but you actually don’t.
• Type II error (false negative): the test result says you don’t
have coronavirus, but you actually do.

Contd
2.Level of significance:
Probability of making Type –I error is called level of significance. It is
represented by α or p. Level of significance is probability of rejecting
null hypothesis when it is true. In health sciences, we generally
consider the level of significance at either 1%(.01)or 5%(.05).A
significance level of .05 means that the researcher is willing to take
the risk of being wrong 5% times ,or 5 times out of 100,when
rejecting the null hypothesis.

Contd
3.Confidence interval(CI):
It is a range of values that with a specified degree of probability is
thought to contain the population value.CI contains a lower and an
upper limit.
4.Degree of Freedom:
The interpretation of a statistical test depends on the degree of
freedom. It is denoted by the abbreviation df and a number(e.g.
df=3)Although degree of freedom indicates the number of values that
can vary, the focus is actually on the number of values that are not
free to vary.

Contd.
5.Test of significance: There are several parametric(t-test, Z-test,
ANOVA) and nonparametric tests(chi-square test, median test,
McNemar’s test, Mann-Whitney test,Wilcoxon test, Fisher's exact
test)available to establish the statistical significance.

4.Interpretation of Data
• It refers to the critical examination of the analysed study results to
draw inferences and conclusions. Interpretation of the research
findings of a study involves a search for their meaning in relation to
the research problem, objectives, conceptual framework, and
hypotheses.

Strategiesforeffectiveinterpretations:
• Interpretation must be made in light of research problem, objectives,
conceptual framework, and hypotheses, and assumptions.
• Critical examination of each element of study results before framing the
interpretations
• Careful consideration and recognition of the limitations of the research
study so that inappropriate interpretation can be avoided.
• Interpretations must be based on the study results only, so that chances of
misinterpretations or over interpretations of the unstudied facts can be
avoided.
• Each part, aspect, and segment of the analysed result must receive close
attention, so that misinterpretation can be avoided.

PARAMETRICTESTS
• These tests are also known as normal distribution statistical tests.
• The statistical methods of inference make certain assumptions about
the populations from which the samples are drawn.
• Parametric tests are the type of inferential statistic tests, which
assume that data have come from a type of normal and makes
inferences about the parameters distribution.

Commonly Used Parametric Tests.
Paired t-tests: It is used to compare two quantitative measurements
taken from the same group individuals
Unpaired t-test: It is used to compare means between two
distinct/independent groups.
Z-test: It is used to compare the differences in population mean and a
sample mean or the difference between two independent sample
means

• One way ANOVA- It is used to compare means between three or
more distinct/independent groups but may be used for more than
two repeat measures of same group.
• Pearson coefficient of correlation: It is used to estimate the degree of
relationship/association between two quantitative variables.

NONPARAMETRICTESTS
• Many times in the observation presented in numerical figures, the
scale of measurements may not be really numerical, such as grading
bedsores, or ranks given to analgesic drug’s effectiveness in cancer
pain management.
• In these situations, parametric tests may not be suitable, and a
researcher may need different types of tests to draw inferences;
those tests are known nonparametric tests.

Commonly Used Nonparametric Tests.
• Chi-square test: It is used to find out the association between two
nominal or ordinal sets of data/variables
• The sign test: It is used as an alternative test to t-test where median
is compared rather mean.
• Median test: It is used to test the null hypothesis that two
independent samples have drawn from populations with equal
median.
• Mann-Whitney test: The median test do not make full use of all the
information measured on ordinary scale, therefore, Mann Whitney
test is used for better use of data.

• Wilcoxon signed rank test: If a small size sample(n<30) is drawn from
a grossly non-normally distributed population and t-test and Z-test
cannot be applied, then a best alternative non-parametric test is
Wilcoxon signed rank test; because sign test may be used when data
consists of a single sample or have paired data.
• Spearman’s rank correlation: A nonparametric test used to estimate
degree of correlation between two variables measured on ordinal
scale.

• The key difference between parametric and nonparametric
test is that the parametric test relies on statistical distributions
in data whereas nonparametric do not depend on any
distribution. Non-parametric does not make any assumptions
and measures the central tendency with the median value.

• Mean
• the sum of all measurements divided by the number of observations in the
data set.
• Median
• the middle value that separates the higher half from the lower half of the
data set. The median and the mode are the only measures of central
tendency that can be used for ordinal data, in which values are ranked
relative to each other but are not measured absolutely.
• Mode
• the most frequent value in the data set. This is the only central tendency
measure that can be used with nominal data, which have purely qualitative
category assignments

Presentation of Data:
• The final steps of research process are very important. The
presentation can be both in a narrative form and in tables.
• Narrative presentation: The presentation should be clear & concise
as much attention is paid to data that fail to support a particular
study hypothesis as is given to data that support a hypothesis.
Certain information should always be included in the text when
discussing the study hypothesis. The statistical test that was used, the
best result, degrees of freedom & the probability value should be
listed.

• Tables: They are a means of organizing data so they may be more
easily understood & interpreted. The discussion of the table should
be as clear as possible in the text. If a table is being used to present
the results of hypothesis testing, the results should be placed in the
table or a footnote added that provides the tests results, degrees of
freedom & the probability level.

Interpretation of Data:
• It is the task of drawing conclusions or inferences and of explaining
their significance, after careful analysis of the collected data.
• The process of interpretation is essentially one of stating that what
the findings show.
• The findings of the study are the results, conclusions, interpretations
recommendations, generalisations, implications.
• Interpretation is by no means a mechanical process
• It calls for critical examination of the results of one’s analysis in thr
light of all limitations of data gathering.

Understanding Data Analysis

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Understanding Data Analysis

Similaire à Understanding Data Analysis (20)

Dernier

Dernier (20)

Understanding Data Analysis