SlideShare une entreprise Scribd logo
1  sur  23
Best Practices
Creating Research Data




                         Sherry Lake
                         July 31, 2012 University of Florida Data Management Workshop
WHY?

Following these Best Practices…….
• Will improve the usability of the data by you
  or by others
• Your data will be “computer ready”
• Your data will be ready to share with others
Spreadsheet Examples
Spreadsheet Problems?
Problems

• Dates are not
  stored
  consistently
• Values are labeled inconsistently
• Data coding is inconsistent
• Order of values are different
Problems

• Confusion
  between
  numbers and
  text
• Different types of data are stored in the
  same columns
• The spreadsheet loses interpretability if it
  is sorted
Best Practices Data Organization
• Lines or rows of data should be complete
   – Designed to be machine readable, not human
     readable (sort)
Best Practices Data Organization


• Include a Header Line 1st line (or record)
• Label each Column with a short but
  descriptive name
  – Names should be unique
  – Use letters, numbers, or “_” (underscore)
  – Do not include blank spaces or symbols (+ - & ^ *)
Best Practices Data Organization


• Columns of data should be consistent
  – Use the same naming convention for text data
• Columns should include only a single kind of
  data
  – Text or “string” data
  – Integer numbers
  – Floating point or real numbers
Use Standardized Formats

• ISO 8601 Standard for Date and Time
  – YYYYMMDDThh:mmss.sTZD
               20091013T09:1234.9Z
       20091013T09:1234.9+05:00
• Spatial Coordinates for Latitute/Longitude
  – +/- DD.DDDDD
        -78.476 (longitude)
        +38.029 (latitude)
File Names
File Names
• Use descriptive names
• Not too long
• Don’t use spaces
• Try to include time,
  place & theme
• May use “-” or “_”
File Names

• String words together with
  Caps (VegBiodiv_2007)
• Think about using version
  numbers
• Don’t change default
  extensions (txt, jpg, csv,…)
Quantitative Assurance/Control
Dataset Creation & Integrity Errors
   • Use a data entry program
      – Program to catch typing errors
      – Program pull-down menu options
   • Perform double entry of the data
   • Manually check 5 – 10% of data records
   • Check for out-of-range values (plotting)
   • Check for missing or impossible values
   • Perform statistical summaries (random samples)
Analyzing Data - Notes
• Keep Original File
  – Uncorrected copy
  – Make “read-only”
• Make notes on transformations
• Any changes, save as a new file
• Use scripted code to transform and correct
  data
Analyzing Data
• Use a scripted program (R, SAS, SPSS, Matlab)
  – Steps are recorded in textual format
  – Can be easily revised and re-executed
  – Helps sharing and repetition
  – Easy to document
• GUI-bases analysis may be easier, but harder
  to reproduce
Document EVERYTHING!

• Create a Project Document File
  – More than a Lab Notebook
  – Data Management Plan
• Start at the beginning of the project and
  continue throughout data collection & analysis
  – Why you are collecting data
  – Exact details of methods of collecting & analyzing
Document EVERYTHING!
• Details such as:
  – Names of data & analysis files associated with
    study
  – Definitions for data and codes (include missing
    value codes, names) example
  – Units of measure (accuracy and precision)
  – Standards or instrument calibrations
Choosing File Formats

• Accessible Data (in the future)
  – Non-proprietary (software formats)
  – Open, documented standard
  – Common, used by the research community
  – Standard representation (ASCII, Unicode)
  – Unencrypted & Uncompressed
  – Media formats (hardware formats)
Preferred Format Choices
•   PDF, not Word
•   ASCII, not Excel
•   MPEG-4, not Quicktime
•   TIFF or JPEG2000, not GIF or JPG
•   XML or RDF, not RDBMS

Good if not software specific
Best Practices

1. Use Consistent Data Organization
2. Use Standardized Formats
3. Assign Descriptive File Names
4. Perform Basic Quality Assurance/ Quality Control
5. Use Scripted Program for Analysis and Keep Notes
6. Document EVERYTHING! (Define Contents of Data
   Files )
7. Use Consistent, Stable and Open File Formats
Best Practices Bibliography
Borer, E. T., Seabloom, E. W., Jones, M. B., & Schildhauer, M. (2009). Some
   simple guidelines for effective data management. Bulletin of the Ecological
   Society of America, 90(2), 205-214.
Hook, L. A., Santhana Vannan, S.K., Beaty, T. W., Cook, R. B. and Wilson, B.E.
  (2010). Best Practices for Preparing Environmental Data Sets to Share and
  Archive. Available online (http://daac.ornl.gov/PI/BestPractices-2010.pdf)
  from Oak Ridge National Laboratory Distributed Active Archive Center, Oak
  Ridge, Tennessee, U.S.A. doi:10.3334/ORNLDAAC/BestPractices-2010.
Inter-university Consortium for Political and Social Research (ICPSR). (2012).
    Guide to social science data preparation and archiving: Best practices
    throughout the data cycle (5th ed.). Ann Arbor, MI. Retrieved
    05/31/2012, from
    http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf.
Data Observation Network for Earth (DataONE). (2012). DataONE Best
   Practices database. Retrieved 07/21/12, from
   http://www.dataone.org/best-practices.
Questions? Discussion?

• Sherry Lake
  Senior Scientific Data Consultant, UVA Library
• shlake@virginia.edu
• Twitter: shlakeuva
• Slideshare: http://www.slideshare.net/shlake
• Web: http://www.lib.virginia.edu/brown/data




                                                   23

Contenu connexe

Tendances

Data mining basic fundamentals
Data mining basic fundamentalsData mining basic fundamentals
Data mining basic fundamentalsSiddique Ibrahim
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence ArchitecturePhilippe Julio
 
Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata HarvestingNikesh Narayanan
 
Data Mining: Data warehouse and olap technology
Data Mining: Data warehouse and olap technologyData Mining: Data warehouse and olap technology
Data Mining: Data warehouse and olap technologyDatamining Tools
 
Basic Concept of Database
Basic Concept of DatabaseBasic Concept of Database
Basic Concept of DatabaseMarlon Jamera
 
Business intelligence, Data Analytics & Data Visualization
Business intelligence, Data Analytics & Data VisualizationBusiness intelligence, Data Analytics & Data Visualization
Business intelligence, Data Analytics & Data VisualizationMuthu Natarajan
 
Dbms Introduction and Basics
Dbms Introduction and BasicsDbms Introduction and Basics
Dbms Introduction and BasicsSHIKHA GAUTAM
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl conceptsjeshocarme
 
Database fundamentals(database)
Database fundamentals(database)Database fundamentals(database)
Database fundamentals(database)welcometofacebook
 

Tendances (20)

Data Management
Data ManagementData Management
Data Management
 
DATABASE MANAGEMENT SYSTEM
DATABASE MANAGEMENT SYSTEMDATABASE MANAGEMENT SYSTEM
DATABASE MANAGEMENT SYSTEM
 
Database Basics
Database BasicsDatabase Basics
Database Basics
 
Data mining basic fundamentals
Data mining basic fundamentalsData mining basic fundamentals
Data mining basic fundamentals
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence Architecture
 
Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata Harvesting
 
Data Mining: Data warehouse and olap technology
Data Mining: Data warehouse and olap technologyData Mining: Data warehouse and olap technology
Data Mining: Data warehouse and olap technology
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
 
Data management
Data managementData management
Data management
 
Database Systems Concepts, 5th Ed
Database Systems Concepts, 5th EdDatabase Systems Concepts, 5th Ed
Database Systems Concepts, 5th Ed
 
Interoperability in Digital Libraries
Interoperability in Digital LibrariesInteroperability in Digital Libraries
Interoperability in Digital Libraries
 
Basic Concept of Database
Basic Concept of DatabaseBasic Concept of Database
Basic Concept of Database
 
Business intelligence, Data Analytics & Data Visualization
Business intelligence, Data Analytics & Data VisualizationBusiness intelligence, Data Analytics & Data Visualization
Business intelligence, Data Analytics & Data Visualization
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Dbms Introduction and Basics
Dbms Introduction and BasicsDbms Introduction and Basics
Dbms Introduction and Basics
 
INTRODUCTION TO DATABASE
INTRODUCTION TO DATABASEINTRODUCTION TO DATABASE
INTRODUCTION TO DATABASE
 
Metadata in Business Intelligence
Metadata in Business IntelligenceMetadata in Business Intelligence
Metadata in Business Intelligence
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
 
Object oriented database
Object oriented databaseObject oriented database
Object oriented database
 
Database fundamentals(database)
Database fundamentals(database)Database fundamentals(database)
Database fundamentals(database)
 

Similaire à Best practices data collection

Best practices data management
Best practices data managementBest practices data management
Best practices data managementSherry Lake
 
Data Management for Graduate Students
Data Management for Graduate StudentsData Management for Graduate Students
Data Management for Graduate StudentsRebekah Cummings
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and SharingC. Tobin Magle
 
3.1 Database structure - designing a system.ppt
3.1 Database structure - designing a system.ppt3.1 Database structure - designing a system.ppt
3.1 Database structure - designing a system.pptAghaSyedNaqvi
 
Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing Mojtaba Lotfaliany
 
Data Management for Undergraduate Researchers (updated - 02/2016)
Data Management for Undergraduate Researchers (updated - 02/2016)Data Management for Undergraduate Researchers (updated - 02/2016)
Data Management for Undergraduate Researchers (updated - 02/2016)Rebekah Cummings
 
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217lyarmey
 
Creating Effective Data Visualizations in Excel 2016: Some Basics
Creating Effective Data Visualizations in Excel 2016:  Some BasicsCreating Effective Data Visualizations in Excel 2016:  Some Basics
Creating Effective Data Visualizations in Excel 2016: Some BasicsShalin Hai-Jew
 
Elements of Data Documentation
Elements of Data DocumentationElements of Data Documentation
Elements of Data Documentationssri-duke
 
Lec20.pptx introduction to data bases and information systems
Lec20.pptx introduction to data bases and information systemsLec20.pptx introduction to data bases and information systems
Lec20.pptx introduction to data bases and information systemssamiullahamjad06
 
Making your data good enough for sharing.
Making your data good enough for sharing.Making your data good enough for sharing.
Making your data good enough for sharing.FAIRDOM
 
IS L03 - Database Management
IS L03 - Database ManagementIS L03 - Database Management
IS L03 - Database ManagementJan Wong
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersRebekah Cummings
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data ManagementC. Tobin Magle
 
Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto UniversityStephanie Simms
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Dios Kurniawan
 
Epidata presentation course for heath science
Epidata presentation course for heath scienceEpidata presentation course for heath science
Epidata presentation course for heath scienceMitikuTeka1
 

Similaire à Best practices data collection (20)

Best practices data management
Best practices data managementBest practices data management
Best practices data management
 
Data Management for Graduate Students
Data Management for Graduate StudentsData Management for Graduate Students
Data Management for Graduate Students
 
6.2 software
6.2 software6.2 software
6.2 software
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and Sharing
 
3.1 Database structure - designing a system.ppt
3.1 Database structure - designing a system.ppt3.1 Database structure - designing a system.ppt
3.1 Database structure - designing a system.ppt
 
Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing
 
Data Management for Undergraduate Researchers (updated - 02/2016)
Data Management for Undergraduate Researchers (updated - 02/2016)Data Management for Undergraduate Researchers (updated - 02/2016)
Data Management for Undergraduate Researchers (updated - 02/2016)
 
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217
 
Creating Effective Data Visualizations in Excel 2016: Some Basics
Creating Effective Data Visualizations in Excel 2016:  Some BasicsCreating Effective Data Visualizations in Excel 2016:  Some Basics
Creating Effective Data Visualizations in Excel 2016: Some Basics
 
Elements of Data Documentation
Elements of Data DocumentationElements of Data Documentation
Elements of Data Documentation
 
Digital data
Digital dataDigital data
Digital data
 
Digital Types
Digital TypesDigital Types
Digital Types
 
Lec20.pptx introduction to data bases and information systems
Lec20.pptx introduction to data bases and information systemsLec20.pptx introduction to data bases and information systems
Lec20.pptx introduction to data bases and information systems
 
Making your data good enough for sharing.
Making your data good enough for sharing.Making your data good enough for sharing.
Making your data good enough for sharing.
 
IS L03 - Database Management
IS L03 - Database ManagementIS L03 - Database Management
IS L03 - Database Management
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate Researchers
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data Management
 
Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto University
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1
 
Epidata presentation course for heath science
Epidata presentation course for heath scienceEpidata presentation course for heath science
Epidata presentation course for heath science
 

Plus de Sherry Lake

Planning for Libra Data
Planning for Libra DataPlanning for Libra Data
Planning for Libra DataSherry Lake
 
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...Sherry Lake
 
Using a Case Study to Teach Data Management to Librarians
Using a Case Study to Teach Data Management to LibrariansUsing a Case Study to Teach Data Management to Librarians
Using a Case Study to Teach Data Management to LibrariansSherry Lake
 
Documentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM BootcampDocumentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM BootcampSherry Lake
 
DMTool-ASERL-Webinar
DMTool-ASERL-WebinarDMTool-ASERL-Webinar
DMTool-ASERL-WebinarSherry Lake
 
DMPTool Workshop University of Georgia
DMPTool Workshop University of GeorgiaDMPTool Workshop University of Georgia
DMPTool Workshop University of GeorgiaSherry Lake
 
Federal funder mandates
Federal funder mandatesFederal funder mandates
Federal funder mandatesSherry Lake
 
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014Sherry Lake
 
Data Management Planning for Engineers
Data Management Planning for EngineersData Management Planning for Engineers
Data Management Planning for EngineersSherry Lake
 
DMPTool Webinar Environmental Scan
DMPTool Webinar Environmental ScanDMPTool Webinar Environmental Scan
DMPTool Webinar Environmental ScanSherry Lake
 
Lake dmp tool_i_conference
Lake dmp tool_i_conferenceLake dmp tool_i_conference
Lake dmp tool_i_conferenceSherry Lake
 
Lake us-canada policesupdate
Lake us-canada policesupdateLake us-canada policesupdate
Lake us-canada policesupdateSherry Lake
 
Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-supportSherry Lake
 
Managing the research life cycle
Managing the research life cycleManaging the research life cycle
Managing the research life cycleSherry Lake
 
Dmp tool presentation
Dmp tool presentationDmp tool presentation
Dmp tool presentationSherry Lake
 
Funder requirements for Data Management Plans
Funder requirements for Data Management PlansFunder requirements for Data Management Plans
Funder requirements for Data Management PlansSherry Lake
 
Library support for life cycle
Library support for life cycleLibrary support for life cycle
Library support for life cycleSherry Lake
 

Plus de Sherry Lake (20)

Planning for Libra Data
Planning for Libra DataPlanning for Libra Data
Planning for Libra Data
 
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
 
Using a Case Study to Teach Data Management to Librarians
Using a Case Study to Teach Data Management to LibrariansUsing a Case Study to Teach Data Management to Librarians
Using a Case Study to Teach Data Management to Librarians
 
Documentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM BootcampDocumentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM Bootcamp
 
Creating dmp
Creating dmpCreating dmp
Creating dmp
 
DMTool-ASERL-Webinar
DMTool-ASERL-WebinarDMTool-ASERL-Webinar
DMTool-ASERL-Webinar
 
DMPTool Workshop University of Georgia
DMPTool Workshop University of GeorgiaDMPTool Workshop University of Georgia
DMPTool Workshop University of Georgia
 
Federal funder mandates
Federal funder mandatesFederal funder mandates
Federal funder mandates
 
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
 
Data Management Planning for Engineers
Data Management Planning for EngineersData Management Planning for Engineers
Data Management Planning for Engineers
 
DMPTool Webinar Environmental Scan
DMPTool Webinar Environmental ScanDMPTool Webinar Environmental Scan
DMPTool Webinar Environmental Scan
 
Lake dmp tool_i_conference
Lake dmp tool_i_conferenceLake dmp tool_i_conference
Lake dmp tool_i_conference
 
Lake us-canada policesupdate
Lake us-canada policesupdateLake us-canada policesupdate
Lake us-canada policesupdate
 
Why managedata
Why managedataWhy managedata
Why managedata
 
Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-support
 
Web links
Web linksWeb links
Web links
 
Managing the research life cycle
Managing the research life cycleManaging the research life cycle
Managing the research life cycle
 
Dmp tool presentation
Dmp tool presentationDmp tool presentation
Dmp tool presentation
 
Funder requirements for Data Management Plans
Funder requirements for Data Management PlansFunder requirements for Data Management Plans
Funder requirements for Data Management Plans
 
Library support for life cycle
Library support for life cycleLibrary support for life cycle
Library support for life cycle
 

Dernier

Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxdhanalakshmis0310
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 

Dernier (20)

Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 

Best practices data collection

  • 1. Best Practices Creating Research Data Sherry Lake July 31, 2012 University of Florida Data Management Workshop
  • 2. WHY? Following these Best Practices……. • Will improve the usability of the data by you or by others • Your data will be “computer ready” • Your data will be ready to share with others
  • 5. Problems • Dates are not stored consistently • Values are labeled inconsistently • Data coding is inconsistent • Order of values are different
  • 6. Problems • Confusion between numbers and text • Different types of data are stored in the same columns • The spreadsheet loses interpretability if it is sorted
  • 7. Best Practices Data Organization • Lines or rows of data should be complete – Designed to be machine readable, not human readable (sort)
  • 8. Best Practices Data Organization • Include a Header Line 1st line (or record) • Label each Column with a short but descriptive name – Names should be unique – Use letters, numbers, or “_” (underscore) – Do not include blank spaces or symbols (+ - & ^ *)
  • 9. Best Practices Data Organization • Columns of data should be consistent – Use the same naming convention for text data • Columns should include only a single kind of data – Text or “string” data – Integer numbers – Floating point or real numbers
  • 10. Use Standardized Formats • ISO 8601 Standard for Date and Time – YYYYMMDDThh:mmss.sTZD 20091013T09:1234.9Z 20091013T09:1234.9+05:00 • Spatial Coordinates for Latitute/Longitude – +/- DD.DDDDD -78.476 (longitude) +38.029 (latitude)
  • 12. File Names • Use descriptive names • Not too long • Don’t use spaces • Try to include time, place & theme • May use “-” or “_”
  • 13. File Names • String words together with Caps (VegBiodiv_2007) • Think about using version numbers • Don’t change default extensions (txt, jpg, csv,…)
  • 14. Quantitative Assurance/Control Dataset Creation & Integrity Errors • Use a data entry program – Program to catch typing errors – Program pull-down menu options • Perform double entry of the data • Manually check 5 – 10% of data records • Check for out-of-range values (plotting) • Check for missing or impossible values • Perform statistical summaries (random samples)
  • 15. Analyzing Data - Notes • Keep Original File – Uncorrected copy – Make “read-only” • Make notes on transformations • Any changes, save as a new file • Use scripted code to transform and correct data
  • 16. Analyzing Data • Use a scripted program (R, SAS, SPSS, Matlab) – Steps are recorded in textual format – Can be easily revised and re-executed – Helps sharing and repetition – Easy to document • GUI-bases analysis may be easier, but harder to reproduce
  • 17. Document EVERYTHING! • Create a Project Document File – More than a Lab Notebook – Data Management Plan • Start at the beginning of the project and continue throughout data collection & analysis – Why you are collecting data – Exact details of methods of collecting & analyzing
  • 18. Document EVERYTHING! • Details such as: – Names of data & analysis files associated with study – Definitions for data and codes (include missing value codes, names) example – Units of measure (accuracy and precision) – Standards or instrument calibrations
  • 19. Choosing File Formats • Accessible Data (in the future) – Non-proprietary (software formats) – Open, documented standard – Common, used by the research community – Standard representation (ASCII, Unicode) – Unencrypted & Uncompressed – Media formats (hardware formats)
  • 20. Preferred Format Choices • PDF, not Word • ASCII, not Excel • MPEG-4, not Quicktime • TIFF or JPEG2000, not GIF or JPG • XML or RDF, not RDBMS Good if not software specific
  • 21. Best Practices 1. Use Consistent Data Organization 2. Use Standardized Formats 3. Assign Descriptive File Names 4. Perform Basic Quality Assurance/ Quality Control 5. Use Scripted Program for Analysis and Keep Notes 6. Document EVERYTHING! (Define Contents of Data Files ) 7. Use Consistent, Stable and Open File Formats
  • 22. Best Practices Bibliography Borer, E. T., Seabloom, E. W., Jones, M. B., & Schildhauer, M. (2009). Some simple guidelines for effective data management. Bulletin of the Ecological Society of America, 90(2), 205-214. Hook, L. A., Santhana Vannan, S.K., Beaty, T. W., Cook, R. B. and Wilson, B.E. (2010). Best Practices for Preparing Environmental Data Sets to Share and Archive. Available online (http://daac.ornl.gov/PI/BestPractices-2010.pdf) from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A. doi:10.3334/ORNLDAAC/BestPractices-2010. Inter-university Consortium for Political and Social Research (ICPSR). (2012). Guide to social science data preparation and archiving: Best practices throughout the data cycle (5th ed.). Ann Arbor, MI. Retrieved 05/31/2012, from http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf. Data Observation Network for Earth (DataONE). (2012). DataONE Best Practices database. Retrieved 07/21/12, from http://www.dataone.org/best-practices.
  • 23. Questions? Discussion? • Sherry Lake Senior Scientific Data Consultant, UVA Library • shlake@virginia.edu • Twitter: shlakeuva • Slideshare: http://www.slideshare.net/shlake • Web: http://www.lib.virginia.edu/brown/data 23

Notes de l'éditeur

  1. Have you ever collected data and had trouble remembering what you did at the start?Tried to share your data with someone and they (or you) couldn’t understand itUsing “Best Practices” when you collect and record your data will improve future usability and may save time.Preparing your data using these “Best Practices”Following these best practices (guidelines) will help you Following these best practices will improve the usability of the data by you or by others … use it with other data.
  2. Spreadsheets are widely used for simple analyses They are easy to use BUT They allow (encourage) users to structure data in ways that are hard to use with other softwareYou can use them like Word, with columns. These spreadsheets (in this format) are good for “human” interpretation, not computers – and since you probably will need either Write a program or use a software package, then the “human” format is not best.These formats are good for presenting your findings such as publishing…. But it will be harder to use with other software later on (if you need to do any analysis).It is betterto store the data in ways that it can be used in automated ways, with minimal human intervention
  3. This is some well data measurements, where a salinity meter was used to measure the salinity (top and bottom) and the conductivity (Top & bottom)Take a look at this spreadsheet… What’s wrong with it?Could this be easily automated? Sorted?
  4. Dates are not stored consistentlySometimes date is stored with a label (e.g., “Date:5/23/2005”) sometimes in its own cell (10/2/2005)Values are labeled inconsistentlySometimes “Conductivity Top” others “conductivity_top”For Salinity sometimes two cells are used for top and bottom, in others they are combined in one cellData coding is inconsistentSometimes YSI_Model_30, sometimes “YSI Model 30”---- sort of can’t tell if it’s a “label” or a data valueTide State is sometimes a text description, sometimes a numberThe order of values in the “mini-table” for a given sampling date are different“Meter Type” comes first in the 5/23 table and second in the 10/2 table
  5. Confusion between numbers and textFor most software 39% or <30 are considered TEXT not numbers (what is the average of 349 and <30?)Different types of data are stored in the same columnsMany software products require that a single column contain either TEXT or NUMBERS (but not both!)The spreadsheet loses interpretability if it is sortedDates are related to a set of attributes only by their position in the file. Once sorted that relationship is lost.Not sure why you would sort this.
  6. The spreadsheet loses interpretability if it is sortedDates are related to a set of attributes only by their position in the file. Once sorted that relationship is lost.Look what happens when we sort this….Look at the difference in this one… sort it..https://docs.google.com/spreadsheet/ccc?key=0Att-cHR6O7gCdEZ2NzRhUWFLYy1nM2FMcDhaNGRVeWchttps://docs.google.com/spreadsheet/ccc?key=0Att-cHR6O7gCdHpTMC1kdWREbTNlanBwM3J5WVE3ZFE
  7. Standard convention for many software programs (usually a “check” yes,no) is for the 1st line (record) to be a header line… lists the names of variables in the file. Rest of records (lines) are data.Not too long some software programs may not work with long variable names
  8. We’ve seen that a spreadsheet or word processor can create datasets that can only be interpreted by human interventionThe “ugly spreadsheet” example would be hard to analyze even in a spreadsheet, except with lots case-by-case human decisionsBut what are some principles that characterize good archival data?Keep in mind that good data formats for data and sharing may not be the ones you prefer for viewing or analysis!Same naming convention for text data – use a vocabulary, keep same… “slack-high”…. Not “slack high”
  9. There are already standards for certain types of data (like date/time, spatial coords). Use them, don’t invent your own.Can you think of others?(am/pm NOT allowed) T appears literally in the string. Min. for date is YYYY.YYYY = four-digit yearMM = two-digit month (01=January, etc.) DD = two-digit day of month (01 through 31)hh = two digits of hour (00 through 23)mm = two digits of minute (00 through 59)ss = two digits of second (00 through 59) s = one or more digits representing a decimal fraction of a second TZD = time zone designator (Z or +hh:mm or -hh:mm) Vs. DMS degree minutes seconds important when data field could have more than one type of unit.
  10. Guidelines for filenames will only help you with your files/research. Once they are “archived” they will get new names that fit with the systems, usually a permanent name based on computer “locating” the file.Look at the file names……Context.txt, DataFile1.txt, DataFile2.txt, word6doc.zipLong ones….Safari, Ray… good date, placeNote “_” and “-” Think about how the name will look in a directory with lots of other files, want to be able to “pick it out”.
  11. File names easiest way to indicate the contents of the file, use terse but indicative of their content. Want to uniquely id the data file.Don’t’ make them too long, some scripting programs have a filename limit for file importing (reading)Don’t use blanks, some software may not be able to read file names with blanks.Think about how the name will look in a directory with lots of other files, want to be able to “pick it out”.
  12. Maybe use version numbers…. Don’t forget the extension (3 char.) used to tell the file type
  13. Data Quality control takes place at various stages during data collection, data entry, and data checking. The quality of the collection methods has direct correlation to the quality of the data.Quality of data collection methods used has a significant bearing on data quality.Quality includes: equipment calibration (use instrument calibration to check precision) allows other researchers to look at your data and compare to theirs need to validate transcriptionTrain coders (different people doing this) – create handbook.Can create (program) data entry interfaces and verify data entry, use lists to choose fromVerification: out-of range values, random samples, double checking entriesMinimize manual entryVisual Basic can create forms for Excel. Access form creationRandom sample of dataConsistency checkseach record is keyed in and then re-keyed against the original. Several standard packages offer this feature. In the re-entry process, the program catches discrepancies immediately. Start before data collection, define standards – document in handbook
  14. Don’t want to change something (or delete something) that could be important later.If use a scripted language you could re-run analyses
  15. Analysis “scripted” software: R, SAS, SPSS, MatlabAnalysis scripts are written records of the various steps involved in processing and analyzing data (sort of “analytical metadata”).Easily revised and re-executed at any time if needs to modify analysisVS. GUI (easier) but does not leave a clear accounting of exactly what you have doneDocument scripted code with comments on why data is being changed.
  16. Important to repeat!!!!More documentation: Documentation can also be called metadataDescription of the data file names (especially if using acronyms and abbreviations).Record why you are collecting data, Details of methods of analysisNames of all data and analysis filesDefinitions for data (include coding keys)Missing value codesUnit of measures.Structured metadata (XML) format standards for discipline (Ecological Metadata language – EML)
  17. Can also be called metadataDescription of the data file names (especially if using acronyms and abbrevs.Record why you are collecting data, Details of methods of analysisNames of all data and analysis filesDefinitions for data (include coding keys)Missing value codesUnit of measures.Calibrations so others can compare their results with yours.Structured metadata (XML) format standards for discipline (Ecological Metadata language – EML)
  18. Spreadsheets are widely used for simple analysesBut they have poor archival qualities Different versions over time are not compatibleFormulas are hard to capture or displayPlan what type of data you will be collecting. Want to choose a file format that can be read well into the future and is independent of software changes.These are formats more likely to be accessible in the future. to replace old media, maintaining devices that can still read the proprietary formats or media typeFormat of the file is a major factor in the ability to use the data in the future. As technology changes, plan for software and hardware obsolescence. System files (SAS, SPSS) are compact and efficient, but not very portable. Use software to “export” data to a portable (or transport) file. Convert proprietary formats to non-proprietary. Check for data errors in conversion.
  19. Examples of preferred format choicesFormats for long-term digital preservation (open). Don’t expect you (won’t have time) or the archive to be able to convert older formats to new one.
  20. Remember create spreadsheet so it can be automated2. Date/Time standards, Geospatial coords, Species, other standards from discipline3. Descriptive File Names – File names can help id what’ inside 4. Quality Assurance – when planning on data entry can “program” data checks in forms (Access and Excel), create pick lists (codes), missing data values5. Make it easier to replicate data transformation, can be documented6. Document EVERYTHING, dataset details, database details, collection notes – conditions, You will not remember everything 20 years from now! What someone would need to know about your data to use it.7. Stable File Formats – easier if all files were same format, also knowing what formats are better in the long-term