Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Brm unit iv - cheet sheet
1. Hallmark Business School www.hbs.ac.in
UNIT IVData Preparation And Analysis
Data Preparation:includes editing, coding, and data entry and is the activity that
ensures the accuracy of the data and their conversion from raw form to reduced
and classified forms that are more appropriate for analysis. Preparing a
descriptive statistical summary is another preliminary step leading to an
understanding of the collected data.
Editing, Coding, Data Entry: Editingdetects errors and omissions, corrects them
when possible, and certifies that maximum data quality standards are achieved.
Types of Editing – Field Editing and Central Editing. Coding involves assigning
numbers or other symbols to answers so that the responses can be grouped into
a limited number of categories. In coding, categories are the partitions of a data
set of a given variable (e.g., if the variable is gender, the partitions are male and
female). Categorization is the process of using rules to partition a body of data.
Both closed- and open-response questions must be coded.A codebook, or coding
scheme, contains each variable in the study and specifies the application of
coding rules to the variable. It is used by the researcher or research staff to
promote more accurate and more efficient data entry or data analysis. It is also
the definitive source for locating the positions of variables in the data file during
analysis.Coding rules - Four rules guide the precoding and postcoding and
categorization of a data set. The categories within a single variable should be: •
Appropriate to the research problem and purpose. • Exhaustive. • Mutually
exclusive. • Derived from one classification dimension. Content analysisfollows a
systematic process for coding and drawing inferences from texts. It starts by
determining which units of data will be analyzed. Content Analysis Types: 1)
Syntacticalunitscan be words, phrases, sentences, or paragraphs; words are the
smallest and most reliable data units to analyze; 2) Referentialunits are
described by words, phrases, and sentences; they may be objects, events,
persons, and so forth, to which a verbal or textual expression refers; 3)
Propositionalunits are assertions about an object, event, person, and so on; 4)
Thematicunits are topics contained within (and across) texts; they represent
higher-level abstractions inferred from the text and its context.Missing data are
information from a participant or case that is not available for one or more
variables of interest. In survey studies, missing data typically occur when
participants accidentally skip, refuse to answer, or do not know the answer to an
item on the questionnaire.Data entry converts information gathered by
secondary or primary methods to a medium for viewing and manipulation.
Keyboarding remains a mainstay for researchers who need to create a data file
immediately and store it in a minimal space on a variety of media.
Validity of data:In general, validity is an indication of how sound your research is.
More specifically, validity applies to both the design and the methods of your
research. Validityindata collection means that your findings truly represent the
phenomenon you are claiming to measure. Valid claims are solid claims.
Qualitative Vs Quantitative data analyses:Read Exhibit 7-2.
Bivariate and Multivariate statistical techniques: Bivariate studies are different
from univariate studies because it allows the researcher to analyze the
relationship between two variables (often denoted as X, Y) ins order to test
simple hypotheses of association and causality. For example, if you wanted to
know whether there is a relationship between the number of students in an
engineering classroom (independent variable) and their grades in that subject
(dependent variable), you would use bivariate analysis since it measures two
elements based on the observation of data. Four steps to conducting bivariate
analysis: 1) Define the nature of the relationship; 2) Identify the type and
direction of the relationship; 3) Determine if the relationship is statistically
significant; 4) Identify the strength of the relationship. Multivariate studies are
similar to bivariate studies, but multivariate studies have more than one
dependent variable. For example, if an advertiser wanted to examine the
effectiveness of three different banner ads on a popular website, the advertiser
could measure the ads click rate for both men and women. Researchers could
then use multivariate statistical analysis to examine the relationships between all
of the variables.Multivariate analytical techniques represent a variety of
mathematical models used to measure and quantify outcomes, taking into
account important factors that can influence this relationship.The most popular is
multiple regression analysis which helps one understand how the typical value of
the dependent variable changes when any one of the independent variables is
varied, while the other independent variables are held fixed. Other techniques
include factor analysis, path analysis and multiple analyses of variance
(MANOVA).
Factor analysis:It is a statistical tool that measures the impact of a few un-
observed variables called factors on a large number of observed variables. It is
used as a data reduction method. It may be used to uncover and establish the
cause and effect relationship between variables or to confirm a hypothesis. It is
often used to determine a linear relationship between variables before
subjecting them to further analysis.Principal Factor Analysis is also called
Common Factor Analysis and it aims to identify the minimum number of factors
that can lead to the correlation between a given set of variables. Other types of
Factor Analysis include Image factoring, Alpha factoring, Principal Component
Analysis and so on.
Discriminant analysis:It is a statistical tool with an objective to assess the
adequacy of a classification, given the group memberships; or to assign objects to
one group among a number of groups. For any kind of Discriminant Analysis,
some group assignments should be known beforehand.Discriminant Analysis is
quite close to being a graphical version of MANOVA and often used to
complement the findings of Cluster Analysis and Principal Components
Analysis.When Discriminant Analysis is used to separate two groups, it is called
Discriminant Function Analysis (DFA); while when there are more than two
groups – the Canonical Varieties Analysis (CVA) method is used.Discriminant
Analysis has various benefits as a statistical tool and is quite similar to regression
analysis. It can be used to determine which predictor variables are related to the
dependent variable and to predict the value of the dependent variable given
certain values of the predictor variables. Discriminant Analysis is also widely
used to create Perceptual Mapping by marketers and has some benefits over
other methods that use perceived distances; like the option of using tests of
significance to check for dissimilarities among products and that the distances
between two products would not be impacted by other products included in the
study.Discriminant Analysis is often used in combination with cluster analysis.
Say, the loans department of a bank wants to find out the creditworthiness of
applicants before disbursing loans. It may use Discriminant Analysis to find out
whether an applicant is a good credit risk or not
cluster analysis:It is a statistical tool used to classify objects into groups, such that
the objects belonging to one group are much more similar to each other and
rather different from objects belonging to other groups. It is generally used for
exploratory data analysis and serves as a method of discovery by solving
classification issues. 1) Hierarchical cluster analysis methods - Agglomerative
methods – in this, all objects start in separate clusters till slowly similar objects
are combined and this process is repeated till all objects are in a single cluster.
Finally, the optimum number of clusters is chosen from among all
options.Divisive methods – in this, all objects start in the same cluster and the
reverse of the agglomerative method is used. 2) Non-hierarchical Cluster
Analysis method (also known as k-means clustering methods): These are
generally used when large data sets are involved. Further, these provide the
flexibility of moving a subject from one cluster to another.The main benefit of
Cluster Analysis is that it allows us to group similar data together. This helps us
identify patterns between data elements. It reveals associations between data
objects and helps to outline structure which might not have been apparent
previously but gives much sense and meaning to the data when discovered. Once
a clear structure emerges, it allows easier decision making.
multiple regression and correlation:Multiple regression is also known as logistic
regression - Logistic regression aims to measure the relationship between a
categorical dependent variable and one or more independent variables (usually
continuous) by plotting the dependent variables’ probability scores. A categorical
variable is a variable that can take values falling in limited categories instead of
being continuous.Logistic regression uses regression to predict the outcome of a
categorical dependent variable on the basis of predictor variables. The probable
outcomes of a single trial are modeled as a function of the explanatory variable
using a logistic function. Logistic modeling is done on categorical data which may
be of various types including binary and nominal. For example, a variable might
be binary and have two possible categories of ‘yes’ and ‘no’; or it may be nominal
say hair color maybe black, brown, red, gold and grey.Another objective of
logistic regression is to check if the probability of getting a particular value of the
dependent variable is related to the independent variable. Multiple logistic
regression is used when there are more than one independent variables under
study. For e.g., Logistic Regression would help identify factors like product
quality, service quality, brand image, reward programs, etc., that impact
customers’ loyalty and willingness to recommend a retail store’s products to
others. The results would help improve the store’s performance on these
parameters and increase customer loyalty.
multidimensional scaling:is a means of visualizing the level of similarity of
individual cases of a dataset. It refers to a set of related ordination techniques
used in information visualization, in particular to display the information
contained in a distance matrix.Steps: 1) formulating the problem; 2) Obtaining
input data; 3) Running the MDS statistical program; 4) Decide number of
dimensions; 5) Mapping the results and defining the dimensions; 6) Test the
results for reliability and validity; 7) Report the results comprehensively. For
e.g,In marketing, MDS is a statistical technique for taking the preferences and
perceptions of respondents and representing them on a visual grid, called
perceptual maps. By mapping multiple attributes and multiple brands at the
same time, a greater understanding of the marketplace and of consumers'
perceptions can be achieved, as compared with a basic two attribute perceptual
map
Application of statistical software for data analysis: Following are the statistical
software and the features it has for doing data analysis: 1) SAS/STAT:SAS/STAT
software is designed for both specialized and enterprise wide analytical needs. It
uses more of coding and little less of menu-driven way of doing
analysis.SAS/STAT software provides a complete, comprehensive set of tools that
can meet the data analysis needs of the entire organization. Features: Anova;
Mixed Models – Linear mixed, non-linear mixed and general linear models;
Regression; Categorical data analysis; Bayesian analysis; Multivariate analysis;
Survival analysis; Psychometric analysis; Cluster analysis; Nonparametric analysis;
Survey data analysis; Mutiple imputation for missing values. 2) SPSS: It is more
menu driven and less coding; Analysing variables seperately; Comparing multiple
2. Hallmark Business School www.hbs.ac.in
variables; Association between variables. 3) R: It is all coding for doing all the
latest methods of doing data analysis. Every data analysis method can be done
using R; Creating unique and beautiful data visualizations; Getting better results
faster; Draw on the talents of statisticians worldwide as they make method
libraries for free usage.