SlideShare une entreprise Scribd logo
1  sur  12
Télécharger pour lire hors ligne
TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS
Nancy Buderer, MS
Consulting
Biostatistician, Program Evaluator, Research Consultant
nancy@budererdrug.com
ABSTRACT
Objective
Often the investigator of investigator-initiated research studies is responsible for entering his/her data
into a database or spreadsheet before sending it to a statistician for analysis. This is common in
teaching hospitals particularly for medical resident and student research. Typically Microsoft Excel is
chosen because it is readily available and easy to use.
This document assists investigators in designing a spreadsheet for their research data that can readily be
exported from spreadsheet software (e.g., Microsoft Excel) into a statistical analysis software package
(e.g., SAS or SPSS).
Methods
Best practices in spreadsheet design for research are provided along with examples in Microsoft Excel.
Common pitfalls for data entry are described.
Results
Attention to detail is critical not only when collecting research data, but also when entering it into a
spreadsheet. The following are basic concepts for designing a spreadsheet:
1. One row of data per subject
2. One column for each variable
3. First column is a unique identifier
4. Column labels follow SAS or SPSS naming conventions
5. Columns formatted according to their data type (numeric, mm/dd/yyyy, military time)
6. Data entered as numbers, not text
7. Coding system with documentation
When possible, investigators should show their spreadsheet to their statistician before entering all of
their data. This is valuable in many respects: the analyst can spot inconsistencies between the
spreadsheet and the data collection tool, identify troublesome fields, and ensure that the outcomes the
investigator intended to measure are captured on the spreadsheet in such a way as to allow for the
appropriate statistical analysis.
Conclusion
A well-designed spreadsheet for entering research data may improve the accuracy of the data entered
and save time in the data analysis phase.
1 | Nancy Buderer, MS nancy@budererdrug.com
rev. 3-1-2016
TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS
TIPS
Tip 1. Each study subject has one ROW of data.
It is simplest to keep everything for each research subject on one row, even if a subject has data at
multiple time points.
Below is an example where subjects have their blood pressure taken at two different times.
Subject #1’s first blood pressure was 120 over 80. His second blood pressure was 115 over 80.
The column labeled SBP_1 is the subject’s systolic blood pressure at time 1 (120); the column labeled
DBP_1 is his diastolic blood pressure at time 1 (80); the column labeled SBP_2 is the subject’s systolic
blood pressure at time 2 (115); and DBP_2 is his diastolic blood pressure at time 2. All of subject #1’s
data is on one row of data.
ID SBP_1 DBP_1 SBP_2 DBP_2
1 120 80 115 80
2 130 80 120 75
In the spreadsheet below, the subject has a row for time 1 and another row for time 2, but
there is no way to distinguish between time points. This is much harder for analyses.
ID SBP DBP
1 120 80
1 115 80
2 130 80
2 120 75
SYMBOLS
Thumbs up – Do it like this
Thumbs down – Don’t do this
2 | Nancy Buderer, MS nancy@budererdrug.com
rev. 3-1-2016
TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS
Tip 2. Variables that are measured on the subjects are represented in
COLUMNS. Use one column for each variable.
For variables that only have one value (like age, gender, race) this is straightforward.
If a particular variable has a “check all that apply” kind of response, then use one column for each
possible response choice. Enter the number 1 if the subject chose that response and enter a 0 if they
did not check that response.
For example, a survey asks:
“What kinds of exercise have you done in the last week (check all that apply)”?
Run
Walk
Bike
Swim
Subject #1 did all of them. Enter 1 for RUN, 1 for WALK, 1 for BIKE, and 1 for SWIM. Subject #3 didn’t
do any of these exercises, so enter 0 in each of the columns.
ID RUN WALK BIKE SWIM
1 1 1 1 1
2 1 0 1 0
3 0 0 0 0
The example below tries to put all the response choices in one column. To the computer, this
just looks like a string of characters.
ID exercise
1 1,2,3,4
2 1,3
3 none
ID exercise
1 run, walk, bike, swim
2 run, bike
3 none
3 | Nancy Buderer, MS nancy@budererdrug.com
rev. 3-1-2016
TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS
Tip 3. The first column should be a unique identifier for the subject.
Name, medical record number, social security number, etc., are convenient and unique identifiers, but
they do not protect the subject’s privacy.
Develop an arbitrary ID system.
If it is necessary to keep a subject’s name or other private identifier, then keep a separate list that links
name to the arbitrary ID number.
Ideally, the ID should be a number, not a series of characters.
In the example below, subjects 1 and 2 were enrolled in 2011 and subjects 3 and 4 were
enrolled in 2012. Here are 3 ways to incorporate year into the ID.
ID Year ID ID
1 2011 1001 1.2011
2 2011 1002 2.2011
3 2012 2001 1.2012
4 2012 2002 2.2012
Do not use private identifiers in spreadsheets that are designed for research purposes.
• name
• medical record number
• date of birth
• social security number
Protect the subject’s privacy throughout the spreadsheet. It is rarely necessary for the statistician to
receive identifying information.
• Enter age in years rather than the actual date of birth.
• If the actual date of an event is not necessary, then consider entering only the minimum necessary
information (e.g., month and year).
4 | Nancy Buderer, MS nancy@budererdrug.com
rev. 3-1-2016
TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS
Tip 4. Use column labels that follow standard SAS or SPSS variable naming
rules.
Typically a statistician will import an Excel spreadsheet into a statistical analysis software program like
SAS or SPSS. By using column labels that already follow standard SAS or SPSS naming conventions, the
researcher can save the statistician time (and money). [The words “column label” and “variable names”
are used interchangeably.]
Column Label Guidelines
• Unique name for each column; no two columns with the same name
• Up 32 characters in length (If using older versions of SPSS or SAS, keep this to 8 characters)
• Starts with a letter
• May combination of letters and numbers
• No spaces
• Underscore is OK, but other special characters are not
• Upper or lower case; they’re treated the same
Subject_ID Age_T1 med_costs date_first_use
If you want to have more description in the column labels so that the data entry person has more
complete description of the column, add it as the first row and keep the column labels as the second
row. When the statistician does the analysis, this first row can be easily deleted leaving the clean set of
column labels. (But don’t merge cells in this first upper row. Keep column widths the same throughout
the column.)
age at time 1
Medication
costs
1st
day used
medication
ID Age_T1 med_costs date_first_use
If the same variable is measured at multiple time points, like age at time 1 and again at time 2, then
keep the first part of the column label the same and change it by adding an underscore and the number
behind it to indicate the time point (e.g., AGE_T1, AGE_T2).
This first row
can be easily
deleted before
analysis,
leaving the
simple column
5 | Nancy Buderer, MS nancy@budererdrug.com
rev. 3-1-2016
TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS
Tip 5. Format columns according to their respective data type.
If a cell in Excel is not specifically formatted, it defaults to “GENERAL”. This is the most dangerous
format for researchers because just about anything can be typed into the cell. SAS or SPSS look at those
values as strings of characters which can lead to meaningless data.
Take the time to properly format columns according to their respective data type. Use the FORMAT
CELLS menu to specify the data type of each column of data - numbers are numeric, dates are
mm/dd/yyyy, etc. In Excel, do this by right clicking over the column - it is highlighted and a menu pops-
up); choose “ Format Cells…”; choose the “Number” tab; and select from the list of formats. Another
way to get to the FORMAT menu is from the FORMAT icon on the toolbar.
(screen copied from Microsoft Excel, Microsoft Corporation, Microsoft Office 2007)
6 | Nancy Buderer, MS nancy@budererdrug.com
rev. 3-1-2016
TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS
Formats preferred for research data are as follows:
• Number
Use this for any type of data that is numeric - continuous, interval, ordered, categorical if a
number can be assigned to the category (e.g., 1=yes, 0=no). Excel’s default is 2 decimal points,
but it can be changed as needed.
• Date
The simplest date format is mm/dd/yyyy. On the menu the example to choose reads
“*3/14/2001”. When dates are entered into this column, enter using the slash marks as shown
below. Notice that leading zeros on January through September are automatically dropped.
date_first_use
5/1/2012
5/2/2012
10/2/2011
• Text
If a data type has to be letters, then specify it as such and limit the column width to the
maximum number of characters anticipated. Common uses for text fields are fill-in-the-blank
responses in surveys. In the example from tip 2, “Other” might be a choice. A text column
(other_text) allows for a fill-in-the-blank response.
Run
Walk
Bike
Swim
Other _____________________________________
ID run walk bike swim other other_text
1 1 1 1 1 1 yoga
2 1 0 1 0 1 karate
3 0 0 0 0 1 aerobics
• Time
Choose military time - it avoids having to distinguish between AM and PM. The example on the
menu is “13:30”.
GENERAL format is not desirable for spreadsheets that are intended to be imported into a
statistical package.
7 | Nancy Buderer, MS nancy@budererdrug.com
rev. 3-1-2016
TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS
Setting NUMBER formats
(screens copied from Microsoft Excel, Microsoft Corporation, Microsoft Office 2007)
Setting DATE format
8 | Nancy Buderer, MS nancy@budererdrug.com
rev. 3-1-2016
TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS
Tip 6. Data should be entered as NUMBERS wherever possible, not text, not
special characters.
Obviously, data that are numeric in nature (e.g., age, pain scores) will be entered as numbers. The
problem comes when data are categorical (e.g., yes/no, gender, race). Letters are problematic in
analyzing research data. Letters are case-sensitive (e.g., ‘N’ is different from ‘n’). Differently spelled
words are interpreted as different categories even though they mean the same thing (e.g., ‘Yes’ is
different from ‘YES’ and ‘Y’). Numbers are typically faster to enter and prone to fewer errors. For
categorical data, it is recommended that code numbers be entered rather than words or letters.
In the example below, use 1 for yes and 0 for no; use 1 for male and 2 for female; use 1 for
Black and 2 for Caucasian.
ID HAS_LIVING_WILL GENDER RACE
1 1 1 1
2 0 1 1
3 1 2 2
4 0 2 1
5 0 1 2
(*Using 1 for YES (or present) and 0 for NO (absent) is best for some data analyses such as logistic
regression).
ID HAS_LIVING_WILL GENDER RACE
1 Yes M Black
2 No M Black
3 yes F C
4 None f B
5 N m Caucasian
9 | Nancy Buderer, MS nancy@budererdrug.com
rev. 3-1-2016
TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS
Tip 7. Always create a code sheet (i.e., key, codebook).
This code sheet (or key) shows how the column name links to the data element you collected, and
describes how code numbers represent various categories. This can be hand-written on the paper data
collection tool or included as a separate worksheet with the same file Excel Workbook.
For this spreadsheet …
ID run walk bike swim other other_text has_living_will gender race
1 1 1 1 1 1 yoga 1 1 1
2 1 0 1 0 1 karate 0 1 1
3 0 0 0 0 1 aerobics 1 2 2
The code sheet might look like this …
ID arbitrarily assigned ID number
run subject runs 1=yes, 0=no
walk subject walks 1=yes, 0=no
bike subject bikes 1=yes, 0=no
swim subject swims 1=yes, 0=no
other subject indicated other form of exercise 1=yes, 0=no
other_text write-in the other exercise
has_living_will subject has a living will 1=yes, 0=no
gender gender 1=Male, 2=Female
race race 1=Black, 2=Caucasian
10 | Nancy Buderer, MS nancy@budererdrug.com
rev. 3-1-2016
TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS
Tip 8. Carefully consider how to handle missing data and be consistent
throughout.
Missing data can be handled in a variety of ways. If data are missing, it is simplest – though not always
best - to leave the cell completely empty. Don’t type anything in the cell - not even a space. In some
cases it is necessary to distinguish a missing value by assigning it a code number. Some conventions for
missing are to use 9, 9999, or 88. But if the data type is numeric (e.g., age), consider using -99 as the
missing code or simply leaving the cell blank to avoid the potential that the missing code will be included
in the analysis as a real number.
• Leave cell empty
• Assign special code
• ‘NA’
• ‘N/A’
• ‘?’
• ‘Missing’
• ‘Don’t know’
• ‘---‘
• ‘(space)’
11 | Nancy Buderer, MS nancy@budererdrug.com
rev. 3-1-2016
TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS
Tip 9. The spreadsheet that the statistician receives should simply have the raw
data in rows and columns with simple column labels.
Of course, Excel is much more powerful than just a place to put raw data. It has formulas, summary
statistics, graphs, etc. But if the intent of the spreadsheet is to have it imported into a software
package, then do not include these other features in the spreadsheet. If the researcher desires to
calculate summary statistics or make graphs, this can be done on another worksheet (another tab)
separate from the data worksheet.
Use of color is often helpful for the data entry person, but keep in mind that color itself does not provide
any meaningful information from a data analysis standpoint. It’s OK to use color, but if there is
important information related to the color (e.g., rows highlighted in blue received the intervention and
those in yellow did not), then create a column(s) to indicate that information (e.g., a column labeled
GROUP where 1=intervention, 0=no intervention).
Tip 10. Consult with your statistician.
When possible, investigators should consider showing their spreadsheet to their statistician before
entering all of their data. This is a valuable exercise in many respects: the analyst can spot
inconsistencies between the spreadsheet and the data collection tool, identify troublesome fields, and
ensure that the outcomes the investigator intended to measure are captured on the spreadsheet in such
a way as to allow for the appropriate statistical analysis. The statistician can test-run the transfer of the
data from Excel to the statistical analysis software. This small investment in time up-front can save
hours in the end.
Software referenced in this document are as follows:
Microsoft Excel, Microsoft Corporation
SAS – Statistical Analysis Software, SAS Institute
SPSS – IBM SPSS
I hope you have found this document helpful. I welcome your suggestions and comments.
Contact me at nancy@budererdrug.com. Thank you.
12 | Nancy Buderer, MS nancy@budererdrug.com
rev. 3-1-2016

Contenu connexe

Tendances

Tendances (19)

Spss
SpssSpss
Spss
 
Spss basics tutorial
Spss basics tutorialSpss basics tutorial
Spss basics tutorial
 
spss teaching
spss teachingspss teaching
spss teaching
 
SPSS introduction Presentation
SPSS introduction Presentation SPSS introduction Presentation
SPSS introduction Presentation
 
An introduction to spss
An introduction to spssAn introduction to spss
An introduction to spss
 
Dataanalysis
DataanalysisDataanalysis
Dataanalysis
 
introduction to spss
introduction to spssintroduction to spss
introduction to spss
 
Spss by vijay ambast
Spss by vijay ambastSpss by vijay ambast
Spss by vijay ambast
 
SPSS an intro...
SPSS an intro...SPSS an intro...
SPSS an intro...
 
Spss and software Application
Spss and software ApplicationSpss and software Application
Spss and software Application
 
Spss
SpssSpss
Spss
 
Introduction to spss 18
Introduction to spss 18Introduction to spss 18
Introduction to spss 18
 
Spss
SpssSpss
Spss
 
Uses of SPSS and Excel to analyze data
Uses of SPSS and Excel   to analyze dataUses of SPSS and Excel   to analyze data
Uses of SPSS and Excel to analyze data
 
Introduction to spss
Introduction to spssIntroduction to spss
Introduction to spss
 
Introduction to spss 1
Introduction to spss 1Introduction to spss 1
Introduction to spss 1
 
Evaluation Spss
Evaluation SpssEvaluation Spss
Evaluation Spss
 
Spss beginners
Spss beginnersSpss beginners
Spss beginners
 
اىفناىنمفبانفيا
اىفناىنمفبانفيااىفناىنمفبانفيا
اىفناىنمفبانفيا
 

En vedette

Capitulo1 conceptosbasicos
Capitulo1 conceptosbasicosCapitulo1 conceptosbasicos
Capitulo1 conceptosbasicosRocio Saenz
 
YV BKI CH20 Denunciation of Youth
YV BKI CH20 Denunciation of YouthYV BKI CH20 Denunciation of Youth
YV BKI CH20 Denunciation of YouthPardeep Sehgal
 
Modelos teóricos de liderazgo
Modelos teóricos de liderazgoModelos teóricos de liderazgo
Modelos teóricos de liderazgoAl Cougar
 
Branded content: o que é? Como fazer? Para que serve?
Branded content: o que é? Como fazer? Para que serve?Branded content: o que é? Como fazer? Para que serve?
Branded content: o que é? Como fazer? Para que serve?Soraia Lima
 
Trenger vi en norsk nettverksoperatør gruppe nix 2016-02-04
Trenger vi en norsk nettverksoperatør gruppe   nix 2016-02-04Trenger vi en norsk nettverksoperatør gruppe   nix 2016-02-04
Trenger vi en norsk nettverksoperatør gruppe nix 2016-02-04Hans Petter Holen
 

En vedette (11)

Capitulo1 conceptosbasicos
Capitulo1 conceptosbasicosCapitulo1 conceptosbasicos
Capitulo1 conceptosbasicos
 
66666666666666 letter
66666666666666   letter66666666666666   letter
66666666666666 letter
 
Principios de la ley de contrataciones
Principios de la ley de contratacionesPrincipios de la ley de contrataciones
Principios de la ley de contrataciones
 
Kohti kiertotaloutta
Kohti kiertotaloutta Kohti kiertotaloutta
Kohti kiertotaloutta
 
YV BKI CH20 Denunciation of Youth
YV BKI CH20 Denunciation of YouthYV BKI CH20 Denunciation of Youth
YV BKI CH20 Denunciation of Youth
 
Trafico aereo
Trafico aereoTrafico aereo
Trafico aereo
 
Bioetica direito a informação
Bioetica   direito a informaçãoBioetica   direito a informação
Bioetica direito a informação
 
Modelos teóricos de liderazgo
Modelos teóricos de liderazgoModelos teóricos de liderazgo
Modelos teóricos de liderazgo
 
Branded content: o que é? Como fazer? Para que serve?
Branded content: o que é? Como fazer? Para que serve?Branded content: o que é? Como fazer? Para que serve?
Branded content: o que é? Como fazer? Para que serve?
 
2 leyes del liderazgo
2 leyes del liderazgo2 leyes del liderazgo
2 leyes del liderazgo
 
Trenger vi en norsk nettverksoperatør gruppe nix 2016-02-04
Trenger vi en norsk nettverksoperatør gruppe   nix 2016-02-04Trenger vi en norsk nettverksoperatør gruppe   nix 2016-02-04
Trenger vi en norsk nettverksoperatør gruppe nix 2016-02-04
 

Similaire à Tips on Setting Up Excel Spreadsheets - Nancy Buderer

PUH 6301, Public Health Research 1 Course Learning Ou
 PUH 6301, Public Health Research 1 Course Learning Ou PUH 6301, Public Health Research 1 Course Learning Ou
PUH 6301, Public Health Research 1 Course Learning OuTatianaMajor22
 
Running head INSERT TITLE HERE1INSERT TITLE HERE3.docx
Running head INSERT TITLE HERE1INSERT TITLE HERE3.docxRunning head INSERT TITLE HERE1INSERT TITLE HERE3.docx
Running head INSERT TITLE HERE1INSERT TITLE HERE3.docxjeanettehully
 
I need help with the below assignment.Attached are 3 docs-1. EXA.docx
I need help with the below assignment.Attached are 3 docs-1. EXA.docxI need help with the below assignment.Attached are 3 docs-1. EXA.docx
I need help with the below assignment.Attached are 3 docs-1. EXA.docxsamirapdcosden
 
Biology statistics made_simple_using_excel
Biology statistics made_simple_using_excelBiology statistics made_simple_using_excel
Biology statistics made_simple_using_excelharamaya university
 
Beginners SPSS.ppt
Beginners SPSS.pptBeginners SPSS.ppt
Beginners SPSS.pptsayahuwaina
 
Use of-Excel
Use of-ExcelUse of-Excel
Use of-ExcelBrisbane
 
Btm8107 8 week2 activity understanding and exploring assumptions a+ work
Btm8107 8 week2 activity understanding and exploring assumptions a+ workBtm8107 8 week2 activity understanding and exploring assumptions a+ work
Btm8107 8 week2 activity understanding and exploring assumptions a+ workcoursesexams1
 
Microsoft Excel | Master Excel | Advance Excel | Excel
Microsoft Excel | Master Excel | Advance Excel | Excel Microsoft Excel | Master Excel | Advance Excel | Excel
Microsoft Excel | Master Excel | Advance Excel | Excel devbhargav1
 
Splitter Student version Tutorial June 2020 - English
Splitter Student version Tutorial June 2020 - EnglishSplitter Student version Tutorial June 2020 - English
Splitter Student version Tutorial June 2020 - EnglishAdhi Wikantyoso
 
Empowerment Technologies - Module 5
Empowerment Technologies - Module 5Empowerment Technologies - Module 5
Empowerment Technologies - Module 5Jesus Rances
 
lecture 2 Centeral Tendency.pptx
lecture 2 Centeral Tendency.pptxlecture 2 Centeral Tendency.pptx
lecture 2 Centeral Tendency.pptxssuser378d7c
 
Sheet1Number of Visits Per DayNumber of Status Changes Per WeekAge.docx
Sheet1Number of Visits Per DayNumber of Status Changes Per WeekAge.docxSheet1Number of Visits Per DayNumber of Status Changes Per WeekAge.docx
Sheet1Number of Visits Per DayNumber of Status Changes Per WeekAge.docxlesleyryder69361
 
Instructions Descriptive Statistics Analysis Describe the Sun Coast data usin...
Instructions Descriptive Statistics Analysis Describe the Sun Coast data usin...Instructions Descriptive Statistics Analysis Describe the Sun Coast data usin...
Instructions Descriptive Statistics Analysis Describe the Sun Coast data usin...4934bk
 
Instructions Descriptive Statistics Analysis Describe the Sun Coast data usin...
Instructions Descriptive Statistics Analysis Describe the Sun Coast data usin...Instructions Descriptive Statistics Analysis Describe the Sun Coast data usin...
Instructions Descriptive Statistics Analysis Describe the Sun Coast data usin...bkbk37
 
Microsoft Excel | Master Excel | Advance Excel | Excel
Microsoft Excel | Master Excel | Advance Excel | Excel Microsoft Excel | Master Excel | Advance Excel | Excel
Microsoft Excel | Master Excel | Advance Excel | Excel devbhargav1
 
Central Tendency and Probability.docx
Central Tendency and Probability.docxCentral Tendency and Probability.docx
Central Tendency and Probability.docxwrite31
 
Psyc 354 Massive Success / snaptutorial.com
Psyc 354 Massive Success / snaptutorial.comPsyc 354 Massive Success / snaptutorial.com
Psyc 354 Massive Success / snaptutorial.comReynolds45
 

Similaire à Tips on Setting Up Excel Spreadsheets - Nancy Buderer (20)

PUH 6301, Public Health Research 1 Course Learning Ou
 PUH 6301, Public Health Research 1 Course Learning Ou PUH 6301, Public Health Research 1 Course Learning Ou
PUH 6301, Public Health Research 1 Course Learning Ou
 
Running head INSERT TITLE HERE1INSERT TITLE HERE3.docx
Running head INSERT TITLE HERE1INSERT TITLE HERE3.docxRunning head INSERT TITLE HERE1INSERT TITLE HERE3.docx
Running head INSERT TITLE HERE1INSERT TITLE HERE3.docx
 
I need help with the below assignment.Attached are 3 docs-1. EXA.docx
I need help with the below assignment.Attached are 3 docs-1. EXA.docxI need help with the below assignment.Attached are 3 docs-1. EXA.docx
I need help with the below assignment.Attached are 3 docs-1. EXA.docx
 
Biology statistics made_simple_using_excel
Biology statistics made_simple_using_excelBiology statistics made_simple_using_excel
Biology statistics made_simple_using_excel
 
Beginners SPSS.ppt
Beginners SPSS.pptBeginners SPSS.ppt
Beginners SPSS.ppt
 
Use of-Excel
Use of-ExcelUse of-Excel
Use of-Excel
 
UNIT 4.pptx
UNIT 4.pptxUNIT 4.pptx
UNIT 4.pptx
 
Btm8107 8 week2 activity understanding and exploring assumptions a+ work
Btm8107 8 week2 activity understanding and exploring assumptions a+ workBtm8107 8 week2 activity understanding and exploring assumptions a+ work
Btm8107 8 week2 activity understanding and exploring assumptions a+ work
 
Microsoft Excel | Master Excel | Advance Excel | Excel
Microsoft Excel | Master Excel | Advance Excel | Excel Microsoft Excel | Master Excel | Advance Excel | Excel
Microsoft Excel | Master Excel | Advance Excel | Excel
 
Splitter Student version Tutorial June 2020 - English
Splitter Student version Tutorial June 2020 - EnglishSplitter Student version Tutorial June 2020 - English
Splitter Student version Tutorial June 2020 - English
 
Data analysis.pptx
Data analysis.pptxData analysis.pptx
Data analysis.pptx
 
Empowerment Technologies - Module 5
Empowerment Technologies - Module 5Empowerment Technologies - Module 5
Empowerment Technologies - Module 5
 
lecture 2 Centeral Tendency.pptx
lecture 2 Centeral Tendency.pptxlecture 2 Centeral Tendency.pptx
lecture 2 Centeral Tendency.pptx
 
Sheet1Number of Visits Per DayNumber of Status Changes Per WeekAge.docx
Sheet1Number of Visits Per DayNumber of Status Changes Per WeekAge.docxSheet1Number of Visits Per DayNumber of Status Changes Per WeekAge.docx
Sheet1Number of Visits Per DayNumber of Status Changes Per WeekAge.docx
 
Instructions Descriptive Statistics Analysis Describe the Sun Coast data usin...
Instructions Descriptive Statistics Analysis Describe the Sun Coast data usin...Instructions Descriptive Statistics Analysis Describe the Sun Coast data usin...
Instructions Descriptive Statistics Analysis Describe the Sun Coast data usin...
 
Instructions Descriptive Statistics Analysis Describe the Sun Coast data usin...
Instructions Descriptive Statistics Analysis Describe the Sun Coast data usin...Instructions Descriptive Statistics Analysis Describe the Sun Coast data usin...
Instructions Descriptive Statistics Analysis Describe the Sun Coast data usin...
 
SPSS FINAL.pdf
SPSS FINAL.pdfSPSS FINAL.pdf
SPSS FINAL.pdf
 
Microsoft Excel | Master Excel | Advance Excel | Excel
Microsoft Excel | Master Excel | Advance Excel | Excel Microsoft Excel | Master Excel | Advance Excel | Excel
Microsoft Excel | Master Excel | Advance Excel | Excel
 
Central Tendency and Probability.docx
Central Tendency and Probability.docxCentral Tendency and Probability.docx
Central Tendency and Probability.docx
 
Psyc 354 Massive Success / snaptutorial.com
Psyc 354 Massive Success / snaptutorial.comPsyc 354 Massive Success / snaptutorial.com
Psyc 354 Massive Success / snaptutorial.com
 

Tips on Setting Up Excel Spreadsheets - Nancy Buderer

  • 1. TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS Nancy Buderer, MS Consulting Biostatistician, Program Evaluator, Research Consultant nancy@budererdrug.com ABSTRACT Objective Often the investigator of investigator-initiated research studies is responsible for entering his/her data into a database or spreadsheet before sending it to a statistician for analysis. This is common in teaching hospitals particularly for medical resident and student research. Typically Microsoft Excel is chosen because it is readily available and easy to use. This document assists investigators in designing a spreadsheet for their research data that can readily be exported from spreadsheet software (e.g., Microsoft Excel) into a statistical analysis software package (e.g., SAS or SPSS). Methods Best practices in spreadsheet design for research are provided along with examples in Microsoft Excel. Common pitfalls for data entry are described. Results Attention to detail is critical not only when collecting research data, but also when entering it into a spreadsheet. The following are basic concepts for designing a spreadsheet: 1. One row of data per subject 2. One column for each variable 3. First column is a unique identifier 4. Column labels follow SAS or SPSS naming conventions 5. Columns formatted according to their data type (numeric, mm/dd/yyyy, military time) 6. Data entered as numbers, not text 7. Coding system with documentation When possible, investigators should show their spreadsheet to their statistician before entering all of their data. This is valuable in many respects: the analyst can spot inconsistencies between the spreadsheet and the data collection tool, identify troublesome fields, and ensure that the outcomes the investigator intended to measure are captured on the spreadsheet in such a way as to allow for the appropriate statistical analysis. Conclusion A well-designed spreadsheet for entering research data may improve the accuracy of the data entered and save time in the data analysis phase. 1 | Nancy Buderer, MS nancy@budererdrug.com rev. 3-1-2016
  • 2. TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS TIPS Tip 1. Each study subject has one ROW of data. It is simplest to keep everything for each research subject on one row, even if a subject has data at multiple time points. Below is an example where subjects have their blood pressure taken at two different times. Subject #1’s first blood pressure was 120 over 80. His second blood pressure was 115 over 80. The column labeled SBP_1 is the subject’s systolic blood pressure at time 1 (120); the column labeled DBP_1 is his diastolic blood pressure at time 1 (80); the column labeled SBP_2 is the subject’s systolic blood pressure at time 2 (115); and DBP_2 is his diastolic blood pressure at time 2. All of subject #1’s data is on one row of data. ID SBP_1 DBP_1 SBP_2 DBP_2 1 120 80 115 80 2 130 80 120 75 In the spreadsheet below, the subject has a row for time 1 and another row for time 2, but there is no way to distinguish between time points. This is much harder for analyses. ID SBP DBP 1 120 80 1 115 80 2 130 80 2 120 75 SYMBOLS Thumbs up – Do it like this Thumbs down – Don’t do this 2 | Nancy Buderer, MS nancy@budererdrug.com rev. 3-1-2016
  • 3. TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS Tip 2. Variables that are measured on the subjects are represented in COLUMNS. Use one column for each variable. For variables that only have one value (like age, gender, race) this is straightforward. If a particular variable has a “check all that apply” kind of response, then use one column for each possible response choice. Enter the number 1 if the subject chose that response and enter a 0 if they did not check that response. For example, a survey asks: “What kinds of exercise have you done in the last week (check all that apply)”? Run Walk Bike Swim Subject #1 did all of them. Enter 1 for RUN, 1 for WALK, 1 for BIKE, and 1 for SWIM. Subject #3 didn’t do any of these exercises, so enter 0 in each of the columns. ID RUN WALK BIKE SWIM 1 1 1 1 1 2 1 0 1 0 3 0 0 0 0 The example below tries to put all the response choices in one column. To the computer, this just looks like a string of characters. ID exercise 1 1,2,3,4 2 1,3 3 none ID exercise 1 run, walk, bike, swim 2 run, bike 3 none 3 | Nancy Buderer, MS nancy@budererdrug.com rev. 3-1-2016
  • 4. TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS Tip 3. The first column should be a unique identifier for the subject. Name, medical record number, social security number, etc., are convenient and unique identifiers, but they do not protect the subject’s privacy. Develop an arbitrary ID system. If it is necessary to keep a subject’s name or other private identifier, then keep a separate list that links name to the arbitrary ID number. Ideally, the ID should be a number, not a series of characters. In the example below, subjects 1 and 2 were enrolled in 2011 and subjects 3 and 4 were enrolled in 2012. Here are 3 ways to incorporate year into the ID. ID Year ID ID 1 2011 1001 1.2011 2 2011 1002 2.2011 3 2012 2001 1.2012 4 2012 2002 2.2012 Do not use private identifiers in spreadsheets that are designed for research purposes. • name • medical record number • date of birth • social security number Protect the subject’s privacy throughout the spreadsheet. It is rarely necessary for the statistician to receive identifying information. • Enter age in years rather than the actual date of birth. • If the actual date of an event is not necessary, then consider entering only the minimum necessary information (e.g., month and year). 4 | Nancy Buderer, MS nancy@budererdrug.com rev. 3-1-2016
  • 5. TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS Tip 4. Use column labels that follow standard SAS or SPSS variable naming rules. Typically a statistician will import an Excel spreadsheet into a statistical analysis software program like SAS or SPSS. By using column labels that already follow standard SAS or SPSS naming conventions, the researcher can save the statistician time (and money). [The words “column label” and “variable names” are used interchangeably.] Column Label Guidelines • Unique name for each column; no two columns with the same name • Up 32 characters in length (If using older versions of SPSS or SAS, keep this to 8 characters) • Starts with a letter • May combination of letters and numbers • No spaces • Underscore is OK, but other special characters are not • Upper or lower case; they’re treated the same Subject_ID Age_T1 med_costs date_first_use If you want to have more description in the column labels so that the data entry person has more complete description of the column, add it as the first row and keep the column labels as the second row. When the statistician does the analysis, this first row can be easily deleted leaving the clean set of column labels. (But don’t merge cells in this first upper row. Keep column widths the same throughout the column.) age at time 1 Medication costs 1st day used medication ID Age_T1 med_costs date_first_use If the same variable is measured at multiple time points, like age at time 1 and again at time 2, then keep the first part of the column label the same and change it by adding an underscore and the number behind it to indicate the time point (e.g., AGE_T1, AGE_T2). This first row can be easily deleted before analysis, leaving the simple column 5 | Nancy Buderer, MS nancy@budererdrug.com rev. 3-1-2016
  • 6. TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS Tip 5. Format columns according to their respective data type. If a cell in Excel is not specifically formatted, it defaults to “GENERAL”. This is the most dangerous format for researchers because just about anything can be typed into the cell. SAS or SPSS look at those values as strings of characters which can lead to meaningless data. Take the time to properly format columns according to their respective data type. Use the FORMAT CELLS menu to specify the data type of each column of data - numbers are numeric, dates are mm/dd/yyyy, etc. In Excel, do this by right clicking over the column - it is highlighted and a menu pops- up); choose “ Format Cells…”; choose the “Number” tab; and select from the list of formats. Another way to get to the FORMAT menu is from the FORMAT icon on the toolbar. (screen copied from Microsoft Excel, Microsoft Corporation, Microsoft Office 2007) 6 | Nancy Buderer, MS nancy@budererdrug.com rev. 3-1-2016
  • 7. TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS Formats preferred for research data are as follows: • Number Use this for any type of data that is numeric - continuous, interval, ordered, categorical if a number can be assigned to the category (e.g., 1=yes, 0=no). Excel’s default is 2 decimal points, but it can be changed as needed. • Date The simplest date format is mm/dd/yyyy. On the menu the example to choose reads “*3/14/2001”. When dates are entered into this column, enter using the slash marks as shown below. Notice that leading zeros on January through September are automatically dropped. date_first_use 5/1/2012 5/2/2012 10/2/2011 • Text If a data type has to be letters, then specify it as such and limit the column width to the maximum number of characters anticipated. Common uses for text fields are fill-in-the-blank responses in surveys. In the example from tip 2, “Other” might be a choice. A text column (other_text) allows for a fill-in-the-blank response. Run Walk Bike Swim Other _____________________________________ ID run walk bike swim other other_text 1 1 1 1 1 1 yoga 2 1 0 1 0 1 karate 3 0 0 0 0 1 aerobics • Time Choose military time - it avoids having to distinguish between AM and PM. The example on the menu is “13:30”. GENERAL format is not desirable for spreadsheets that are intended to be imported into a statistical package. 7 | Nancy Buderer, MS nancy@budererdrug.com rev. 3-1-2016
  • 8. TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS Setting NUMBER formats (screens copied from Microsoft Excel, Microsoft Corporation, Microsoft Office 2007) Setting DATE format 8 | Nancy Buderer, MS nancy@budererdrug.com rev. 3-1-2016
  • 9. TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS Tip 6. Data should be entered as NUMBERS wherever possible, not text, not special characters. Obviously, data that are numeric in nature (e.g., age, pain scores) will be entered as numbers. The problem comes when data are categorical (e.g., yes/no, gender, race). Letters are problematic in analyzing research data. Letters are case-sensitive (e.g., ‘N’ is different from ‘n’). Differently spelled words are interpreted as different categories even though they mean the same thing (e.g., ‘Yes’ is different from ‘YES’ and ‘Y’). Numbers are typically faster to enter and prone to fewer errors. For categorical data, it is recommended that code numbers be entered rather than words or letters. In the example below, use 1 for yes and 0 for no; use 1 for male and 2 for female; use 1 for Black and 2 for Caucasian. ID HAS_LIVING_WILL GENDER RACE 1 1 1 1 2 0 1 1 3 1 2 2 4 0 2 1 5 0 1 2 (*Using 1 for YES (or present) and 0 for NO (absent) is best for some data analyses such as logistic regression). ID HAS_LIVING_WILL GENDER RACE 1 Yes M Black 2 No M Black 3 yes F C 4 None f B 5 N m Caucasian 9 | Nancy Buderer, MS nancy@budererdrug.com rev. 3-1-2016
  • 10. TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS Tip 7. Always create a code sheet (i.e., key, codebook). This code sheet (or key) shows how the column name links to the data element you collected, and describes how code numbers represent various categories. This can be hand-written on the paper data collection tool or included as a separate worksheet with the same file Excel Workbook. For this spreadsheet … ID run walk bike swim other other_text has_living_will gender race 1 1 1 1 1 1 yoga 1 1 1 2 1 0 1 0 1 karate 0 1 1 3 0 0 0 0 1 aerobics 1 2 2 The code sheet might look like this … ID arbitrarily assigned ID number run subject runs 1=yes, 0=no walk subject walks 1=yes, 0=no bike subject bikes 1=yes, 0=no swim subject swims 1=yes, 0=no other subject indicated other form of exercise 1=yes, 0=no other_text write-in the other exercise has_living_will subject has a living will 1=yes, 0=no gender gender 1=Male, 2=Female race race 1=Black, 2=Caucasian 10 | Nancy Buderer, MS nancy@budererdrug.com rev. 3-1-2016
  • 11. TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS Tip 8. Carefully consider how to handle missing data and be consistent throughout. Missing data can be handled in a variety of ways. If data are missing, it is simplest – though not always best - to leave the cell completely empty. Don’t type anything in the cell - not even a space. In some cases it is necessary to distinguish a missing value by assigning it a code number. Some conventions for missing are to use 9, 9999, or 88. But if the data type is numeric (e.g., age), consider using -99 as the missing code or simply leaving the cell blank to avoid the potential that the missing code will be included in the analysis as a real number. • Leave cell empty • Assign special code • ‘NA’ • ‘N/A’ • ‘?’ • ‘Missing’ • ‘Don’t know’ • ‘---‘ • ‘(space)’ 11 | Nancy Buderer, MS nancy@budererdrug.com rev. 3-1-2016
  • 12. TIPS FOR SETTING-UP AN EXCEL SPREADSHEET FOR RESEARCH PROJECTS Tip 9. The spreadsheet that the statistician receives should simply have the raw data in rows and columns with simple column labels. Of course, Excel is much more powerful than just a place to put raw data. It has formulas, summary statistics, graphs, etc. But if the intent of the spreadsheet is to have it imported into a software package, then do not include these other features in the spreadsheet. If the researcher desires to calculate summary statistics or make graphs, this can be done on another worksheet (another tab) separate from the data worksheet. Use of color is often helpful for the data entry person, but keep in mind that color itself does not provide any meaningful information from a data analysis standpoint. It’s OK to use color, but if there is important information related to the color (e.g., rows highlighted in blue received the intervention and those in yellow did not), then create a column(s) to indicate that information (e.g., a column labeled GROUP where 1=intervention, 0=no intervention). Tip 10. Consult with your statistician. When possible, investigators should consider showing their spreadsheet to their statistician before entering all of their data. This is a valuable exercise in many respects: the analyst can spot inconsistencies between the spreadsheet and the data collection tool, identify troublesome fields, and ensure that the outcomes the investigator intended to measure are captured on the spreadsheet in such a way as to allow for the appropriate statistical analysis. The statistician can test-run the transfer of the data from Excel to the statistical analysis software. This small investment in time up-front can save hours in the end. Software referenced in this document are as follows: Microsoft Excel, Microsoft Corporation SAS – Statistical Analysis Software, SAS Institute SPSS – IBM SPSS I hope you have found this document helpful. I welcome your suggestions and comments. Contact me at nancy@budererdrug.com. Thank you. 12 | Nancy Buderer, MS nancy@budererdrug.com rev. 3-1-2016