Data analysis and Presentation

Research Fundamentals and Terminology
1-1 Excel Books4-1
4UNIT
Data Analysis
and
Presentation

1-2
Data Preparation
and
Data Presentation

1-3
1. Introduction to Data Preparation Process
The data preparation process is as shown in the diagram below.
We are going to discuss each element of data preparation
process in subsequent sections.

1-4
1. Prepare preliminary plan data analysis
2. Check questionnaire
3. Edit
4. Code
5. Transcribe
6. Clean data
7. Statistically adjust the data
8. Select data analysis strategy

1-5
2. Check questionnaire
A questionnaire returned from the field may be unacceptable for
several reasons.
• Parts of the questionnaire may be incomplete.
•The pattern of responses may indicate that the respondent did
not understand or follow the instructions.
•The responses show little variance.
•One or more pages are missing.
•The questionnaire is received after the pre-established cutoff
date.
•The questionnaire is answered by someone who does not qualify
for participation.

1-6
Information gathered during data collection may lack uniformity. Example: Data
collected through questionnaire and schedules may have answers which may
not be ticked at proper places, or some questions may be left unanswered.
Sometimes information may be given in a form which needs reconstruction in a
category designed for analysis, e.g., converting daily/monthly income in annual
income and so on. The researcher has to take a decision as to how to edit it.
Editing also needs that data are relevant and appropriate and errors are
modified. Occasionally, the investigator makes a mistake and records and
impossible answer. “How much red chilies do you use in a month” The answer
is written as “4 kilos”. Can a family of three members use four kilo chilies in a
month? The correct answer could be “0.4 kilo”.
3. Editing

1-7
Care should be taken in editing (re-arranging) answers to open-
ended questions. Example: Sometimes “don’t know” answer is
edited as “no response”. This is wrong. “Don’t know” means that
the respondent is not sure and is in a double mind about his
reaction or considers the questions personal and does not want to
answer it. “No response” means that the respondent is not familiar
with the situation/object/event/individual about which he is asked.

1-8
It is the process of checking and adjusting the data for omissions,
legibility and consistency and readying them for coding and
storage.
Reasons for editing is to ensure completeness, consistency and to
detect questions answered out of order.
Editing means the following treatment of unsatisfactory results.
To ensure consistency in response, the questionnaires with
unsatisfactory responses may be returned to the field, where the
interviewers re-contact the respondents.

1-9
To ensure completeness, assigning missing values if returning the
questionnaires to the field is not feasible, the editor may assign
missing values to unsatisfactory responses.
If the questions answered are out of order, discarding
unsatisfactory respondents.
In this approach, the respondents with unsatisfactory responses
are simply discarded.

1-10
Coding is translating answers into numerical values or assigning
numbers to the various categories of a variable to be used in data
analysis. Coding is done by using a code book, code sheet, and a
computer card. Coding is done on the basis of the instructions
given in the codebook. The code book gives a numerical code for
each variable.
Now-a-days, codes are assigned before going to the field while
constructing the questionnaire/schedule. Pose data collection;
pre-coded items are fed to the computer for processing and
analysis. For open-ended questions, however, post-coding is
necessary. In such cases, all answers to open-ended questions
are placed in categories and each category is assigned a code.
4. Coding

1-11
Manual processing is employed when qualitative methods are
used or when in quantitative studies, a small sample is used, or
when the questionnaire/schedule has a large number of open-
ended questions, or when accessibility to computers is difficult
or inappropriate. However, coding is done in manual processing
also.

1-12
• Coding is the process of identifying and assigning a numerical
score or other character symbol to previously edited data.
• Coding means assigning a code, usually a number , to each
possible response to each question asked.
• The code includes an indication of the column position (field) and
data record it will occupy.

1-13
Coding Questions
• Fixed field codes, which mean that the number of records for
each respondent is the same and the same data appear in the
same column(s) for all respondents, are highly desirable.
• If possible standard codes should be used for missing data.
• Coding of structured questions is relatively simple, since the
responses options are predetermined.
• In questions that permit a large number of responses, each
possible response option should be assigned a separate column.

1-14
Guidelines for coding unstructured questions:
• Category codes should be mutually exclusive and collectively
exhaustive.
• Only a few of the responses should fall in to the other category.
• Category codes should be assigned for critical issues even if no
one has mentioned them.
• Data should be coded to retain as much detail as possible.

1-15
A codebook contains coding instructions and the necessary
information about variables in the data set. A codebook generally
contains the following information:
• Column number
• Variable number
• Question number
• Record number
• Variable number
• Instructions for coding

1-16
Coding Questionnaire:
The respondent code and the record number appear on
each record in the data.
The first record contains the additional codes: project code,
interview code, date and time codes, and validation code.
It is a good practice to insert blanks between parts.

1-17
Converting raw data into transcribe data
The raw data can be converted into transcribed data by using
various devices.
First the raw data input is done by using devices such as optical
scanning, mark sense forms, computerized sensory analysis, Key
punching and verification to correct etc.
The input data is stored in computer memory, hard disks and
magnetic tapes which can be retrieved as transcribed data as and
when required.

1-18
Data Cleaning
Consistency checks identify data that are out of range, logically
inconsistent, or have extreme values.
Computer packages like MINITAB, SPSS, SAS, and EXCEL can
be programmed to identify out-of-range values for each variable
and print out respondent code, variable code, variable name,
record number, column number and out-of-range value.

1-19
Data cleaning Treatment of Missing Responses
• Substitute a Neutral Value: A neutral value, typically the mean
response to the variable, is substituted for the missing responses.
• Substitute an Imputed Response: The respondents pattern of
responses to other questions are used to impute or calculate a
suitable response to the missing questions.
•In case wise deletion, cases or respondents with any missing
responses are discarded from the analysis.
• In Pair wise deletion, instead of discarding all cases with any
missing values, the researcher uses only the cases or
respondents with complete responses for each calculation

1-20
Statistically adjusting the data
In weighting each case or respondent in the database is assigned
a weight to reflect its importance relative to other cases or
respondents.
Weighting is most widely used to make the sample data more
representative of a target population on specific characteristics.
Yet another use of weighting is to adjust the sample so greater
importance is attached to respondents with certain characteristics.
Statistically adjusting the data for variable re-specification.
Variable re-specification involves the transformation of data to
create new variables or modify existing variables.

1-21
Selecting data analysis strategy
The first data analysis strategy is the transformation of raw data
into a form that that will make them easy to understand and
interpret; rearranging, ordering, and manipulating data to
generate descriptive information is the starting point of data
analysis strategy.
Second data analysis strategy is to perform basic data analysis by
using descriptive statistics.
Third data analysis strategy is to perform basic analysis using
univariate statistics.
Fourth data analysis strategy is to perform data analysis using
bivariate analysis such as testing the differences.

1-22
Fifth data analysis strategy is to perform data analysis using
measures of association such as regression and correlation and
nonparametric statistics.
Sixth data analysis strategy is to perform data analysis using
multivariate statistics.

1-23
Data Presentation Tools
This is the first category of Research Methodology tools and techniques which
are used to present data.
The list of data presentation tools is as below.
1. Bar chart 2. Pie chart 3. Frequency Distribution
4. Histogram 5. Frequency Polygon 6. Ogive
7. Stem and Leaf Plot 8. Cross Tabulation table
To summarize Qualitative and quantitative data various charts
and graphs are developed.

1-24
Summarizing Qualitative Data
Here we are using various charts and graphs.
1. Frequency distribution
2. Relative frequency distribution
3. Percent frequency distribution
4. Bar graphs
5. Pie charts

1-25
Summarizing Quantitative Data
1. Frequency distribution
2. Relative frequency distribution
3. Percent frequency distribution
4. Cumulative frequency distribution
5. Cumulative relative frequency
6. Cumulative percent frequency distribution
7.Histogram
8.Frequency Polygon
9.Ogive

1-26
• Tabulation is the final stage in collection and compilation of
data, and it is thestepping-stone to theanalysis and
interpretation of data. In deciding about the type of tabulation
one has to keep in mind the nature, scope and the object of
research problem. Tabulation of data should be done in such a
form that it suits the nature and object of the research
investigation.
• Tabulation simply means presenting of data through tables. To
be more precise “Tabulation is an orderly arrangement of data
in column and rows‟.
Tabulation

1-27
OBJECTIVES, IMPORTANCE AND
ADVANTAGES OF TABULATION
• Proper tabulation of data is of great importance because if the
tabulation of data is not satisfactory its analysis is bound to be
defective.
• Tabulation of data is done in order to achieve simplicity and
convenience in processing and interpretation of data collected
for research problem.

1-28
Advantages Of Tabulation
1. Ease in Understanding: It is easy to
understand tabulated data than the
unorganized data.
2. Time Savings: Tabulation of Data leads
to immense saving of time during analysis.
3. Ease in Drawing Diagrams:
Diagrammatic representation of data is more
convenient if it is done on the basis of a
tabulated data.

1-29
4. Ease in Comparison: Through tabulated data,
it becomes easy to undertake comparative study
because it is systematically displayed.
5. Detection of Errors: Errors and omissions in data are
easily detected in tabulated data.
6. Space-Saving: In tabulation, the data is displayed in
column and rows and so it uses less space.
7. No Chances of Repetition: Repetition of data may
occur if it is displayed in an unorganized fashion.
Tabulation saves the data from being repeated.

1-30
Cross-Tabulation
• A cross tabulation is the merging of the frequency
distribution of 2 or more variables in a single table. It
helps to understand how 1 variable relates to another
variables
• Example: brand loyalty relates to gender

1-31
Example:- Cross-tabulation
Sample
Gender Handedness
1 Female Right-handed
2 Male Left-handed
4 Male Right-handed
5 Male Left-handed
6 Male Right-handed
8 Female Left-handed
9 Male Right-handed

1-32
Left-
handed
Right-
handed
Total
Males 2 3 5
Females 1 4 5
Total 3 7 10

1-33
Advantages
1. Cross tabulation analysis & results can be easily
interpreted
2. The clarity of interpretation provides a stronger link
between research result & managerial action
3. It is simple
4. Easy to understand by manager because Not
statistically oriented

1-34
SIGNIFICANCE OF REPORT WRITING
Research report is considered a major component of the research
study for the research task remains incomplete till the report has
been presented and/or written.
As a matter of fact even the most brilliant hypothesis, highly well
designed and conducted research study, and the most striking
generalizations and findings are of little value unless they are
effectively communicated to others.
The purpose of research is not well served unless the findings are
made known to others. Research results must invariably enter the
general store of knowledge.

1-35
DIFFERENT STEPS IN WRITING REPORT
Research reports are the product of slow, painstaking, accurate
inductive work. The usual steps involved in writing report are:
(a)logical analysis of the subject-matter;
(b) preparation of the final outline;
(c) preparation of the rough draft;
(d) rewriting and polishing;
(c) preparation of the final bibliography; and
(f) writing the final draft.

1-36
Logical analysis of the subject matter:
It is the first step which is primarily concerned with the
development of a subject. There are two ways in which to develop
a subject
(a)logically and (b) chronologically.
The logical development is made on the basis of mental
connections and associations between the one thing and
another by means of analysis.
Logical treatment often consists in developing the material from
the simple possible to the most complex structures.
Chronological development is based on a connection or
sequence in time or occurrence.
The directions for doing or making something usually follow the
chronological order.

1-37
Preparation of the final outline:
It is the next step in writing the research report “Outlines are the
framework upon which long written works are constructed.
They are an aid to the logical organisation of the material and a
reminder of the points to be stressed in the report.

1-38
Preparation of the rough draft
This follows the logical analysis of the subject and the preparation
of the final outline. Such a step is of utmost importance for the
researcher now sits to write down what he has done in the context
of his research study.
He will write down the procedure adopted by him in collecting the
material for his study along with various limitations faced by him,
the technique of analysis adopted by him, the broad findings and
generalizations and the various suggestions he wants to offer
regarding the problem concerned.

1-39
Rewriting and polishing of the rough draft:
This step happens to be most difficult part of all formal writing. Usually this step
requires more time than the writing of the rough draft.
The careful revision makes the difference between a mediocre and a good
piece of writing. While rewriting and polishing, one should check the report for
weaknesses in logical development or presentation.
The researcher should also “see whether or not the material, as it is presented,
has unity and cohesion; does the report stand upright and firm and exhibit a
definite pattern, like a marble arch? Or does it resemble an old wall of
moldering cement and loose brick.”
In addition the researcher should give due attention to the fact that in his rough
draft he has been consistent or not. He should check the mechanics of
writing—grammar, spelling and usage.

1-40
Preparation of the final bibliography:
Next in order comes the task of the preparation of the final bibliography. The
bibliography, which is generally appended to the research report, is a list of
books in some way pertinent to the research which has been done. It should
contain all those works which the researcher has consulted. The bibliography
should be arranged alphabetically and may be divided into two parts; the first
part may contain the names of books and pamphlets, and the second part may
contain the names of magazine and newspaper articles.
Generally, this pattern of bibliography is considered convenient and satisfactory
from the point of view of reader, though it is not the only way of presenting
bibliography.
The entries in bibliography should be made adopting the following order:

1-41
For books and pamphlets the order may be as under:
1. Name of author, last name first.
2. Title, underlined to indicate italics.
3. Place, publisher, and date of publication.
4. Number of volumes.

1-42
Writing the final draft: This constitutes the last step. The final draft should
be written in a concise and objective style and in simple language, avoiding
vague expressions such as “it seems”, “there may be”, and the like ones.
While writing the final draft, the researcher must avoid abstract terminology
and technical jargon. Illustrations and examples based on common
experiences must be incorporated in the final draft as they happen to be most
effective in communicating the research findings to
others.
A research report should not be dull, but must enthuse people and maintain
interest and must show originality. It must be remembered that every report
should be an attempt to solve some intellectual problem and must contribute
to the solution of a problem and must add to the knowledge of both the
researcher and the reader.

1-43
LAYOUT OF THE RESEARCH REPORT
Anybody, who is reading the research report, must necessarily be
conveyed enough about the study so that he can place it in its
general scientific context, judge the adequacy of its methods and
thus form an opinion of how seriously the findings are to be taken.
For this purpose there is the need of proper layout of the report.
The layout of the report means as to what the research report
should contain. A comprehensive layout of the research report
should comprise
(A) preliminary pages;
(B) the main text; and
(C) the end matter. Let us deal with them separately.

1-44
(A) Preliminary Pages
In its preliminary pages the report should carry a title and date,
followed by acknowledgements in the form of ‘Preface’ or
‘Foreword’. Then there should be a table of contents followed by
list of tables and illustrations so that the decision-maker or
anybody interested in reading the report can easily locate the
required information in the report.

1-45
(B) Main Text
The main text provides the complete outline of the research report along with all
details. Title of the research study is repeated at the top of the first page of the
main text and then follows the other details on pages numbered consecutively,
beginning with the second page.
Each main section of the report should begin on a new page. The main text of
the report should have the following sections:
(i) Introduction;
(ii) Statement of findings and recommendations;
(iii) The results;
(iv) The implications drawn from the results; and
(v) The summary.

1-46
(i) Introduction: The purpose of introduction is to introduce the
research project to the readers. It should contain a clear
statement of the objectives of research i.e., enough
background should be given to make clear to the reader why
the problem was considered worth investigating.
A brief summary of other relevant research may also be
stated so that the present study can be seen in that context.
The hypotheses of study, if any, and the definitions of the
major concepts employed in the study should be explicitly
stated in the introduction of the report.

1-47
(ii) Statement of findings and recommendations:
After introduction, the research report must contain a statement of findings and
recommendations in non-technical language so that it can be easily understood
by all concerned. If the findings happen to be extensive, at this point they
should be put in the summarised form.

1-48
(iii) Results:
A detailed presentation of the findings of the study, with supporting data
in the form of tables and charts together with a validation of results, is
the next step in writing the main text of the report.
This generally comprises the main body of the report, extending over
several chapters. The result section of the report should contain
statistical summaries and reductions of the data rather than the raw
data. All the results should be presented in logical sequence and splitted
into readily identifiable sections. All relevant results must find a place in
the report. But how one is to decide about what is relevant is the basic
question.

1-49
(iv) Implications of the results: Toward the end of the main text, the
researcher should again put down the results of his research clearly and
precisely. He should, state the implications that flow from the results of
the study, for the general reader is interested in the implications for
understanding the human behaviour. Such implications may have three
aspects as stated below:
(a) A statement of the inferences drawn from the present study which
may be expected to apply in similar circumstances.
(b) The conditions of the present study which may limit the extent of
legitimate generalizations of the inferences drawn from the study.
(c) The relevant questions that still remain unanswered or new
questions raised by the study along with suggestions for the kind of
research that would provide answers for them.

1-50
It is considered a good practice to finish the report with a short
conclusion which summarises and recapitulates the main points of the
study. The conclusion drawn from the study should be clearly related
to the hypotheses that were stated in the introductory section.
At the same time, a forecast of the probable future of the subject and
an indication of the kind of research which needs to be done in that
particular field is useful and desirable.
(v) Summary: It has become customary to conclude the research
report with a very brief summary, resting in brief the research problem,
the methodology, the major findings and the major conclusions drawn
from the research results.

1-51
(C) End Matter
At the end of the report, appendices should be enlisted in respect
of all technical data such as questionnaires, sample information,
mathematical derivations and the like ones. Bibliography of
sources consulted should also be given. Index (an alphabetical
listing of names, places and topics along with the numbers of the
pages in a book or report on which they are mentioned or
discussed) should invariably be given at the end of the report.
The value of index lies in the fact that it works as a guide to the
reader for the contents in the report.

1-52
Types of Reports
Research reports vary greatly in length and type. In each
individual case, both the length and the form are largely dictated
by the problems at hand. For instance, business firms prefer
reports in the letter form, just one or two pages in length. Banks,
insurance organisations and financial institutions are generally
fond of the short balance-sheet type of tabulation for their annual
reports to their customers and shareholders.
Mathematicians prefer to write the results of their investigations in
the form of algebraic notations. Chemists report their results in
symbols and formulae.

1-53
A technical report is used whenever a full written report of the
study is required whether for recordkeeping or for public
dissemination.
A popular report is used if the research results have policy
implications. We give below a few details about the said two types
of reports

1-54
In the technical report the main emphasis is on
(i) the methods employed,
(ii)assumptions made in the course of the study,
(iii) the detailed presentation of the findings including their
limitations and supporting data.
A general outline of a technical report can be as follows:
1. Summary of results: A brief review of the main findings just in
two or three pages.
2. Nature of the study: Description of the general objectives of
study, formulation of the problem in operational terms, the working
hypothesis, the type of analysis and data required, etc.
3. Methods employed: Specific methods used in the study and their
limitations. For instance, in sampling studies we should give details of
sample design viz., sample size, sample selection, etc.
1. Technical Reports

1-55
4. Data: Discussion of data collected, their sources,
characteristics and limitations. If secondary data are used, their
suitability to the problem at hand be fully assessed. In case of a
survey, the manner in which data were collected should be fully
described.
5. Analysis of data and presentation of findings: The analysis
of data and presentation of the findings of the study with
supporting data in the form of tables and charts be fully narrated.
This, in fact, happens to be the main body of the report usually
extending over several chapters.
6. Conclusions: A detailed summary of the findings and the
policy implications drawn from the results be explained.

1-56
7. Bibliography: Bibliography of various sources consulted be
prepared and attached.
8. Technical appendices: Appendices be given for all technical
matters relating to questionnaire, mathematical derivations,
elaboration on particular technique of analysis and the like ones.
9. Index: Index must be prepared and be given invariably in the
report at the end.

1-57
The popular report is one which gives emphasis on simplicity
and attractiveness.
The simplification should be sought through clear writing,
minimization of technical, particularly mathematical, details and
liberal use of charts and diagrams. Attractive layout along with
large print, many subheadings, even an occasional cartoon now
and then is another characteristic feature of the popular report.
Besides, in such a report emphasis is given on practical aspects
and policy implications.
We give below a general outline of a popular report.
2. Popular Report

1-58
1. The findings and their implications: Emphasis in the report is
given on the findings of most practical interest and on the
implications of these findings.
2. Recommendations for action: Recommendations for action
on the basis of the findings of the study is made in this section of
the report.
3. Objective of the study: A general review of how the problem
arise is presented along with the specific objectives of the project
under study.
4. Methods employed: A brief and non-technical description of
the methods and techniques used, including a short review of the
data on which the study is based, is given in this part of the report.

1-59
5. Results: This section constitutes the main body of the report
wherein the results of the study are presented in clear and non-
technical terms with liberal use of all sorts of illustrations such as
charts, diagrams and the like ones.
6. Technical appendices: More detailed information on methods
used, forms, etc. is presented in the form of appendices. But the
appendices are often not detailed if the report is entirely meant for
general public.

1-60
All statistical techniques which simultaneously analyse more than
two variables on a sample of observations can be categorized as
multivariate techniques.
We may as well use the term ‘multivariate analysis’ which is a
collection of methods for analyzing data in which a number of
observations are available for each object
Multi variate Analysis Technique

1-61
VARIABLES IN MULTIVARIATE ANALYSIS
(i) Explanatory variable and criterion variable: If X may be considered to be
the cause of Y, then X is described as explanatory variable (also termed as
causal or independent variable) and Y is described as criterion variable
(also termed as resultant or dependent variable).
In some cases both explanatory variable and criterion variable may consist
of a set of many variables in which case set (X1, X2, X3, …., Xp) may be
called a set of explanatory variables and the set (Y1, Y2, Y3, …., Yq) may
be called a set of criterion variables if the variation of the former may be
supposed to cause the variation of the latter as a whole. In economics, the
explanatory variables are called external or exogenous variables and the
criterion variables are called endogenous variables. Some people use the
term external criterion for explanatory variable and the term internal criterion
for criterion variable.

1-62
ii) Observable variables and latent variables: Explanatory variables described above
are supposed to be observable directly in some situations, and if this is so, the same are
termed as observable variables. However, there are some unobservable variables which
may influence the criterion variables.
We call such unobservable variables as latent variables.
(iii) Discrete variable and continuous variable: Discrete variable is that variable which
when measured may take only the integer value whereas continuous variable is one
which, when measured, can assume any real value (even in decimal points).
(iv) Dummy variable (or Pseudo variable): This term is being used in a technical
sense and is useful in algebraic manipulations in context of multivariate analysis. We call
Xi ( i = 1, …., m) a dummy variable, if only one of Xi is 1 and the others are all zero

1-63
IMPORTANT MULTIVARIATE TECHNIQUES
(i) Multiple regression
(ii) Multiple discriminant analysis
(iii) Multivariate analysis of variance
(iv) Canonical correlation analysis
(v) Factor Analysis

1-64
(i) Multiple regression: In multiple regression we form a linear
composite of explanatory variables in such way that it has maximum
correlation with a criterion variable. This technique is appropriate when the
researcher has a single, metric criterion variable. Which is supposed to be
a function of other explanatory variables.
The main objective in using this technique is to predict the variability the
dependent variable based on its covariance with all the independent
variables. One can predict the level of the dependent phenomenon through
multiple regression analysis model, given the levels of independent
variables. Given a dependent variable, the linear-multiple regression
problem is to estimate constants B1, B2, ... Bk and A such that the
expression Y = B1X1 + B2X2 + ... + BkXk + A pare provides a good
estimate of an individual’s Y score based on his X scores.

1-65
In practice, Y and the several X variables are converted to standard scores; zy,
zl, z2, ... zk; each z has a mean of 0 and standard deviation of 1. Then the
problem is to estimate constants, bi , such That z¢y = b1z1 + b2z2 + ...+ bk zk
where z'y stands for the predicted value of the standardized Y score, zy. The
expression on the right side of the above equation is the linear combination of
explanatory variables.
The constant A is eliminated in the process of converting X’s to z’s.
The least-squares-method is used, to estimate the beta weights in such a way
that the sum of the squared prediction errors is kept as small as possible

1-66
Sometimes the researcher may use step-wise regression
techniques to have a better idea of the independent contribution of
each explanatory variable. Under these techniques, the
investigator adds the independent contribution of each
explanatory variable into the prediction equation one by one,
computing betas and R2 at each step.
Formal computerized techniques are available for the purpose
and the same can be used in the context of a particular problem
being studied by the researcher.

1-67
Multiple discriminant analysis
Through discriminant analysis technique, researcher may classify
individuals or objects into one of two or more mutually exclusive
and exhaustive groups on the basis of a set of independent
variables. Discriminant analysis requires interval independent
variables and a nominal dependent variable.
For example, suppose that brand preference (say brand x or y) is
the dependent variable of interest and its relationship to an
individual’s income, age, education, etc. Is being investigated,
then we should use the technique of discriminant analysis.

1-68
Regression analysis in such a situation is not suitable because the dependent
variable is, not intervally scaled.
Thus discriminant analysis is considered an appropriate technique when the
single dependent variable happens to be non-metric and is to be classified into
two or more groups, depending upon its relationship with several independent
variables which all happen to be metric.
The objective in discriminant analysis happens to be to predict an object’s
likelihood of belonging to a particular group based on several independent
variables. In case we classify the dependent variable in more than two groups,
then we use the name multiple discriminant analysis; but in case only two
groups are to be formed, we simply use the term discriminant analysis.

1-69
(iii) Multivariate analysis of variance:
Multivariate analysis of variance is an extension of bivariate analysis of
variance in which the ratio of among-groups variance to within-groups variance
is calculated on a set of variables instead of a single variable.
This technique is considered appropriate when several metric dependent
variables are involved in a research study along with many non-metric
explanatory variables.

1-70
In other words, multivariate analysis of variance is specially applied
whenever the researcher wants to test hypotheses concerning
multivariate differences in group responses to experimental
manipulations.
For instance, the market researcher may be interested in using one test
market and one control market to examine the effect of an advertising
campaign on sales as well as awareness, knowledge and attitudes. In
that case he should use the technique of multivariate analysis of
variance for meeting his objective.

1-71
(iv) Canonical correlation analysis:
This technique was first developed by Hotelling wherein an effort is
made to simultaneously predict a set of criterion variables from their
joint co-variance with a set of explanatory variables. Both metric and
non-metric data can be used in the context of this multivariate
technique.
The procedure followed is to obtain a set of weights for the dependent
and independent variables in such a way that linear composite of the
criterion variables has a maximum correlation with the linear composite
of the explanatory variables.

1-72
For example, if we want to relate grade school adjustment to health and
physical maturity of the child, we can then use canonical correlation
analysis, provided we have for each child a number of adjustment
scores (such as tests, teacher’s ratings, parent’s ratings and so on) and
also we have for each child a number of health and physical maturity
scores (such as heart rate, height, weight, index of intensity of illness
and so on).
The main objective of canonical correlation analysis is to discover
factors separately in the two sets of variables such that the multiple
correlation between sets of factors will be the maximum possible

1-73
(v) Factor analysis:
Factor analysis is by far the most often used multivariate technique of
research studies, specially pertaining to social and behavioural
sciences.
It is a technique applicable when there is a systematic
interdependence among a set of observed or manifest variables and
the researcher is interested in finding out something more fundamental
or latent which creates this commonality.
For instance, we might have data, say, about an individual’s income,
education, occupation and dwelling area and want to infer from these
some factor (such as social class) which summarises the commonality
of all the said four variables

1-74
The technique used for such purpose is generally described as factor analysis.
Factor analysis, thus, seeks to resolve a large set of measured variables in
terms of relatively few categories, known as factors.
This technique allows the researcher to group variables into factors (based on
correlation between variables) and the factors so derived may be treated as
new variables (often termed as latent variables) and their value derived by
summing the values of the original variables which have been grouped into the
factor.
The meaning and name of such new variable is subjectively determined by the
researcher. Since the factors happen to be linear combinations of data, the
coordinates of each observation or variable is measured to obtain what are
called factor loadings.
Such factor loadings represent the correlation between the particular variable
and the factor, and are usually place in a matrix of correlations between the
variable and the factors.

1-75
A Classification of Multivariate Techniques

1-76
More Than One
Dependent
Variable
* Multivariate
Analysis of
Variance
and
Covariance
* Canonical
Correlation
* Multiple
Discriminant
Analysis
* Cross-
Tabulation
* Analysis of
Variance and
Covariance
* Multiple
Regression
* Conjoint
Analysis
* Factor
Analysis
One Dependent
Variable
Variable
Interdependenc
e
Interobject
Similarity
*Cluster Analysis
*Multidimensiona
l Scaling
Dependence
Technique
Interdependen
ce Technique
Multivariate Techniques

1-77
• When a single variable is analyzed alone is called Univariate
analysis.
• Univariate data is used for the simplest form of analysis. It is
the type of data in which analysis are made only based on one
variable
Eg: sample statistic such as “mean” which might refers to the
average consumption of a particular kind of food or the age of
certain group of students or people, it is known as univariate
analysis
Univariate Analysis

1-78
A Classification of Univariate Techniques

1-79
Independe
nt
Related
Independe
nt
Related
* Two-
Group
test
* Z test
* One-Way
ANOVA
* Paired
t test
* Chi-Square
* Mann-
Whitney
* Median
* K-S
* Sign
* Wilcoxon
* McNemar
* Chi-
Metric Data Non-numeric Data
Univariate Techniques
One Sample Two or More
Samples
Samples
* t test
* Z test
* Frequency
* Chi-
Square
* K-S
* Runs
* Binomial

1-80
• Bivariate data is used for little complex analysis than as
compared with univariate data.
• Bivariate data is the data in which analysis are based on two
variables per observation simultaneously.
Eg: cross-classification of age group & cross consumption of two
products.
Bivariate Analysis

1-81
HYPOTHESIS TESTING

1-82
Steps involved in
hypothesis testing

1-83 Draw Marketing Research Conclusion
Formulate H0 and H1
Select Appropriate Test
Choose Level of Significance
Determine Probability
Associated with Test
Statistic
Determine Critical
Value of Test
Statistic TSCR
Determine if TSCR
falls into (Non)
Rejection Region
Compare with Level of
Significance, 
Reject or Do not Reject H0
Collect Data and Calculate Test Statistic

1-84
A Broad Classification of Hypothesis Tests

1-85
Median/
Rankings
Distribution
s
Means Proportions
Tests of
Association
Tests of
Differences
Hypothesis Tests

1-86
A Classification of Hypothesis Testing Procedures
for Examining Differences

1-87
Hypothesis Testing Related to Differences
• Parametric tests assume that the variables of interest are measured
on at least an interval scale.
• Nonparametric tests assume that the variables are measured on a
nominal or ordinal scale.
• These tests can be further classified based on whether one or two or
more samples are involved.
• The samples are independent if they are drawn randomly from
different populations. For the purpose of analysis, data pertaining to
different groups of respondents, e.g., males and females, are
generally treated as independent samples.
• The samples are paired when the data for the two samples relate to
the same group of respondents.

1-88
Independent
Samples
Paired
Samples Independent
Samples
Paired
Samples
* Two-
Group t
test
* Z test
* Paired
t test * Chi-Square
* Mann-Whitney
* Median
* K-S
* Sign
* Wilcoxon
* McNemar
* Chi-Square
Hypothesis Tests
Samples
Samples
* t test
* Z test
* Chi-
Square * K-
S
* Runs
* Binomial
Parametric Tests
(Metric Tests)
Non-parametric Tests
(Nonmetric Tests)

1-89
Type I and Type II errors
Because the hypothesis testing process uses sample statistics
calculated from random data to reach conclusion about population
parameters, it is possible to make an incorrect decision about the null
hypothesis, in particular, two types of errors can be made in testing
hypotheses:
•Type I and
•Type II

1-90
Type I
It is committed by rejecting a true null hypothesis. With a type I error, the
null hypothesis is true, but the business researcher decides that it is not.
For example, suppose the flour packaging process actually is “in control” and is
averaging 40 ounces of flour per package.
Suppose also that a business researcher randomly selects 100 packages,
weighs the contents of each, and computes a sample mean.
It is possible, by chance, to randomly select 100 of the more extreme packages
(mostly heavy weighted or mostly light weighted) resulting in a mean that falls in
the rejection area.
The decision is to reject the null hypothesis even though the population mean is
actually 40 ounces. In this case, the business researcher has committed a Type
I error.

1-91
If a manager fires an employee because some evidence indicates that
she is stealing from the company and if she really is not stealing from
the company, then the manager has committed Type I error.
Non rejection
areaCritical
Value
Rejection Area
40 ounces

1-92
The preceded figure shows, the rejection regions represents the
possibility of committing a Type I error.
Mean that fall beyond the critical values will be considered so extreme
that the business researcher chooses to reject the null hypothesis.
However, if the null hypothesis is true, any mean that falls in a rejection
region will result in a decision that produces a Type I error.
An analogous courtroom example of a Type I error is when an innocent
person is sent to jail.

1-93
The probability of committing a Type I error is called alpha (α)or
level of significance.
Alpha equals the area under that is in the rejection region beyond
the critical value(s).
The value of alpha is always set before the experiment or study is
undertaken, common values of alpha are 0.05, 0.01, 0.10,and
0.001.

1-94
Type II
A Type II error is committed when a business researcher fails to reject a
false null hypothesis. In this case, the null hypothesis is false, but a
decision is made to not reject it.
Suppose in the case of flour problem that the packaging process is
actually producing a population mean of 41 ounces even though the null
hypothesis is 40 ounces.
A sample of 100 packages yields a sample mean 40.2 ounces, which
falls in the no rejection region.
The business decision maker decides not to reject the null hypothesis. A
type II error has been committed.
The packaging procedure is out of control and hypothesis testing
process does not identify it.

1-95
Suppose in the business world an employee is stealing from the
company. A manager sees some evidence that the stealing is occurring
but lacks enough evidence to conclude that the employee is stealing
from the company.
The manager decides not to fire the employee based on theft.
The manager has committed Type II error.
The probability of committing a Type II error is beta (β). Unlike alpha,
beta is not usually stated at the beginning of the hypothesis testing
procedure. Because beta occurs only when the null hypothesis is not
true, the computation of beta varies with the many possible alternative
parameters might occur.

Data analysis and Presentation

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (15)

Similaire à Data analysis and Presentation

Similaire à Data analysis and Presentation (20)

Plus de Jignesh Kariya

Plus de Jignesh Kariya (14)

Dernier

Dernier (20)

Data analysis and Presentation