Famous No1 Amil Baba Love marriage Astrologer Specialist Expert In Pakistan a...
Yom_DATA MANAGEMENT.ppt
1. Research Methods for Project
managers
Joint MSc Program
Yom Institute of Economic Development
Mengesha Yayo (Ph.D in Economics )
August, 2019
Credited to Yom Institute of Economic Development and Dr. Getachew Yirga Belete ( BahirDar University ) 1
2. Chapter Five: Data Management
5.1. Coding, editing and cleaning the data
Credited to Yom Institute of Economic Development and Dr. Getachew Yirga ( BahirDar University ) 2
3. Data Management
Data Preparation and Presentation
Data processing starts with the editing, coding,
classifying and tabulation of the collected data.
Data may be collected from a variety of
sources.
◦ This data must be converted into a machine-
readable, numeric format, such as in a spreadsheet
or a text file, so that they can be analyzed using
software programs like Stata, SPSS, etc.
Data management usually follows the
following steps
Credited to Yom Institute of Economic Development and Dr. Getachew Yirga ( BahirDar University ) 3
4. Data coding
Coding is the process of converting data into numeric format.
A codebook should be created to guide the coding process.
A codebook is a comprehensive document containing detailed
description of: each variable in a research study, items or
measures for that variable, the format of each item (numeric,
text, etc.),
the response scale for each item, and how to code each value
into a numeric format.
For e.g.: Suppose we have a measurement item on a 7-point
Likert
scale with anchors ranging from “strongly disagree” to “strongly
agree”, we may code that item as 1 for strongly disagree, 4 for
neutral, and 7 for strongly agree, etc.
Credited to Yom Institute of Economic Development and Dr. Getachew Yirga ( BahirDar University ) 4
5. Data coding
Nominal data such as industry type can be coded in numeric
form
using a coding scheme: such as 1 for manufacturing, 2 for
retailing, 3 for financial, 4 for healthcare, and so forth (of
course, nominal data cannot be analyzed statistically).
Ratio scale data such as age, income, or test scores can be
coded as
entered by the respondent. Sometimes, data may need to be
aggregated into a different form than the format used for data
collection.
◦ For e.g., total consumption expenditure from individual consumption
items.
Note that many other forms of data, such as from interview,
cannot
be converted into a numeric format for statistical analysis.
Coding is especially important for large complex studies
involving
many variables and measurement items.
Credited to Yom Institute of Economic Development and Dr. Getachew Yirga ( BahirDar University ) 5
6. Data entry
Coded data can be entered into a spreadsheet, database, text file
or
directly into a statistical program.
Most statistical programs like Stata provide a data editor for
entering
data. Each observation can be entered as one row and each
measurement item can be represented as one column.
The entered data should be frequently checked for accuracy, via
occasional spot checks on a set of items or observations, during
and
after entry.
Furthermore, while entering data, the coder should watch out for
obvious evidence of bad data, such as the respondent selecting
the “strongly agree” response to all items irrespective of content,
including reverse-coded items. If so, such data can be entered but
should be excluded from subsequent analysis.
Credited to Yom Institute of Economic Development and Dr. Getachew Yirga ( BahirDar University ) 6
7. Missing values
Missing data is an inevitable part of any empirical data set.
Respondents may not answer certain questions if they are
ambiguously worded or too sensitive.
Such problems should be detected earlier during pretests and
corrected
before the main data collection process begins.
During data entry, some statistical programs automatically treat
blank
entries as missing values, while others require a specific numeric
value
such as -1 or 999 to be entered to denote a missing value.
During data analysis, the default mode of handling missing values
in
most software programs is to simply drop the entire observation
containing even a single missing value, in a technique called
listwise
deletion.
Credited to Yom Institute of Economic Development and Dr. Getachew Yirga ( BahirDar University ) 7
8. Missing values
Hence, some software programs allow the option of replacing
missing
values with an estimated value via a process called imputation
(average value of other respondents)
Credited to Yom Institute of Economic Development and Dr. Getachew Yirga ( BahirDar University ) 8
9. Data transformation
Sometimes, it is necessary to transform data values before they
can
be meaningfully interpreted, including Generation of logarithmic
or first difference variables, etc;
Creating scale measures by adding individual scale items;
Creating a weighted index from a set of observed measures, and
Collapsing multiple values into fewer categories; e.g.: collapsing
incomes into income ranges, years of schooling into educational
groups like elementary, secondary, etc.
Stata has lots of commands that simplify data management after
entry.
Credited to Yom Institute of Economic Development and Dr. Getachew Yirga ( BahirDar University ) 9
10. Data Management
Editing
◦ Editing of data is the process of examining the
collected raw data to detect errors and
omissions.
◦ In general one edits to assure that the data are:
Accurate
Consistent with other information/facts
gathered
Uniformly entered
Credited to Yom Institute of Economic Development and Dr. Getachew Yirga ( BahirDar University ) 10
11. Data Management
The editing can be done at two levels
a) Field level Editing
After an interview, field workers should review
their reporting forms, complete what was
abbreviated, translate personal shorthand,
rewrite illegible entries, and make callback if
necessary.
b) Central editing
when all forms have been completed and
returned to the office data editors correct
obvious errors such as entry in wrong place,
recorded in wrong units, etc.
Credited to Yom Institute of Economic Development and Dr. Getachew Yirga ( BahirDar University ) 11
12. Data Management
Classification and Tabulation
large volume of raw data must be reduced into
homogenous groups if we are to get meaningful
relationships.
Classification is the process of arranging data in
groups or classes on the basis of common
characteristics.
Credited to Yom Institute of Economic Development and Dr. Getachew Yirga ( BahirDar University ) 12
13. Data Management
Tabulation is the orderly arrangement of data in
columns and rows.
Simple or complex tables.
◦ Simple tabulation gives information about one
variable.
◦ Complex tabulation shows the division of data into
two or more categories.
SPSS, R, excel, STATA, etc.
Credited to Yom Institute of Economic Development and Dr. Getachew Yirga ( BahirDar University ) 13
14. Data Management
Tabulation provides the following advantages:
It conserves space and reduces explanatory and
descriptive statement to a minimum.
It facilitates the process of comparison
It facilitates the summation of items and the
detection of errors and omissions
It provides a basis for various statistical
computations such as measures of central
tendencies, dispersions, etc.
Credited to Yom Institute of Economic Development and Dr. Getachew Yirga ( BahirDar University ) 14
15. Thank You
Credited to Yom Institute of Economic Development and Dr. Getachew Yirga ( BahirDar University ) 15