SlideShare une entreprise Scribd logo
1  sur  35
Data
Organization
C. Tobin Magle, PhD
Feb. 28, 2017
10:00-11:30 a.m.
Morgan Library Computer
Classroom 175
*inspired by content from Data Carpentry
Hypothesis Data
Experimental
design
ResultsArticle
Data
Management
Plans
The research cycle
Main topics
• Hierarchical organization
• Folders in folders
• Open Science Framework
• File naming
• Human readability
• Machine readability
• “Tidy” data in spreadsheets
Folder systems
• Organize your data
hierarchically
• Identify ways to divide your data
into categories (Attributes)
• Top level organization is the
most important attribute
Hierarchical Organization
Putting your files into a folder system
my_project
Data Notes protocols manuscripts
Paper1
Figures
Text
References
Paper2
Questions to ask
• What kinds of files are there? (See data inventory)
• How could you group them?
• Project?
• Time?
• Location?
• File type?
• What are the most important attributes?
Example: Lou the first year
Lou is a first year graduate student working on a project in a
biomedical research laboratory. He’s trying to decipher data
left by a former post doc as a start for his thesis project. For
one year, the postdoc recorded weight daily and cytokine
levels monthly from 16 mice. Half were infected with a
parasite, half were treated with saline.
• List the attributes of his project?
• How would you rank these attributes?
Example: Lou the first year
Lou is a first year graduate student working on a project in a
biomedical research laboratory. He’s trying to decipher data
left by a former post doc as a start for his thesis project. For
one year, the postdoc recorded weight daily and cytokine
levels monthly from 16 mice. Half were infected with a
parasite, half were treated with saline.
• List the attributes of his project?
• How would you rank these attributes?
Attributes
• Time
Example: Lou the first year
Lou is a first year graduate student working on a project in a
biomedical research laboratory. He’s trying to decipher data
left by a former post doc as a start for his thesis project. For
one year, the postdoc recorded weight daily and cytokine
levels monthly from 16 mice. Half were infected with a
parasite, half were treated with saline.
• List the attributes of his project?
• How would you rank these attributes?
Attributes
• Time
• Infection Status
Example: Lou the first year
Lou is a first year graduate student working on a project in a
biomedical research laboratory. He’s trying to decipher data
left by a former post doc as a start for his thesis project. For
one year, the postdoc recorded weight daily and cytokine
levels monthly from 16 mice. Half were infected with a
parasite, half were treated with saline.
• List the attributes of his project?
• How would you rank these attributes?
Attributes
• Time
• Infection Status
• Data Type
Exercise: Organize files
• Download Lou’s files (look in the README file for insight)
• http://tinyurl.com/hvna4mg
• Create a hierarchical folder structure for Lou
• Drag his files into the correct folders
• Fix Lou’s README
• Bonus: think about how you’d organize your data.
Tool: Open Science Framework
• Components
• Add-ons
• Contributors
• Wiki
http://help.osf.io/m/collaborating/l/524109-using-the-wiki http://www.slideshare.net/DuraSpace/121014
-slides-roadmap-to-the-future-of-share
Organization tips
• Be consistent
• One directory per project
• Separate components for
• Raw data
• Processed data
• Code
• Output
• Make raw data read-only
• Make README files
http://help.osf.io/m/60347/l/611391-organizing-files
Components
• “Subprojects”
• Separate privacy settings,
contributors, wiki, add-ons, and
files.
• Examples:
• Different projects:
https://osf.io/82fba/
• Clinical: https://osf.io/gq4mz/
• Manuscript: https://osf.io/if7ug/
• Collaboration: https://osf.io/ezcuj/
Demo: Getting started with OSF
1. Create a project
2. Add components
3. Add files
Don’t panic!
• Just try something
• There’s no right answer
• Be consistent
• Write a README.txt file
http://4vector.com/i/free-vector-don-t-panic-clip-
art_103946_Dont_Panic_clip_art_hight.png
File naming conventions
Make file name both human and machine readable.
Use descriptive names
• Bad name: file.txt
• Ok name: 05-07-2016-mouse-data.txt
• Good name: 2016-05-07-mouse-weight.tsv
• Human readability: name contains information about content
Go from general to specific
• Bad name: rep1-5-7-2016-gene-expression.csv
• Good name: 2016-05-07-gene-expression-rep1.csv
• Machine readability: can be sorted meaningfully
Avoid abbreviations
• Bad name: “sprlbgp1”
• Good name: “spencer_lab_group_1”
• Human readability: no one understands your acronyms
Avoid spaces
• Alternatives
• Dashes-are-cool.txt
• I_also_like_underscores.txt
• CamelCaseIsNeatToo.txt
• Machine readability: spaces are delimiters in programming
• Human readability: delineates words
Avoid special characters
• Bad characters: ~ ! @ # $ % ^ & * ( ) ` ; < > ? , [ ] { } ' "
• Machine readability: can have special meanings in scripting
languages
• Example: ~ tells unix to go to your home directory
• Alternatives: underscore (_) dash( - ) dot (.)
Be consistent
• Establishing standards makes data more findable
• Extending standards to everyone who works on a project is
even better
Renaming files
• Ways to Automate file renaming
• Bulk Rename Utility (Windows, free)
• Renamer 5 (Mac)
• PSRenamer (Linux, Mac, or Windows, free)
Exercise: Rename Lou’s files
• Use descriptive names
• General to specific
• Avoid abbreviations, spaces and special characters
• Be consistent
Tidy data
How to organize your data efficiently in spreadsheets
Spreadsheets as lab notebook
• Color coding
• Formatting
• Notes
• Calculations
• Graphs/Tables
Downsides
• Computers don’t understand
notes/formatting/color coding
• Calculations/Graphs/Tables in
spreadsheets are inefficient
• “Tidy data” + automation =
saved time
Using spreadsheets wisely
• Don’t put multiple tables in one sheet
• Don’t use multiple sheets
• Use descriptive field names
• Don’t mix notes and data
Tidy Data
1. Columns as variables
• Don’t combine multiple
pieces of info in one column
2. Rows as observations
• One measured value
Exercise: Tidy Lou’s data
• Open MouseInventory.xls
• Is he using spreadsheets wisely?
• Is each column a variable?
• Is each row an observation?
• Open the January files for both weight and cytokines
• What variables are being measured? –ie, what columns should we
have?
• Can we combine some of these tables?
Exercise: Data carpentry ecology
• Lesson: http://www.datacarpentry.org/spreadsheet-ecology-
lesson/
• File: https://ndownloader.figshare.com/files/2252083
• Goal: combine data from first 2 tabs into one table
• Make a new tab, don’t edit the raw data!
Example: Supplemental_data_1_xls
• https://figshare.com/articles/Supplemental_data_1_xls/4055544
• Description: “Table of the results given by HPLC analysis of
the samples. Key: Rt, retention time; +, presence of peak; -,
absence of peak.”
Example: cck8_xls
• https://figshare.com/articles/cck8_xls/3505772
• Description: “This data are from CCK-8 assay and ELISA.”
Need help?
• Email: tobin.magle@colostate.edu
• Data Management Services website:
http://lib.colostate.edu/services/data-management
• Data Carpentry: http://www.datacarpentry.org/
• Software Carpentry: http://software-carpentry.org/

Contenu connexe

Tendances

pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
Gregor Hagedorn
 

Tendances (20)

Reproducible research
Reproducible researchReproducible research
Reproducible research
 
Data and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data ManagementData and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data Management
 
Analyzing Extended and Scientific Metadata for Scalable Index Designs
Analyzing Extended and Scientific Metadata for Scalable Index DesignsAnalyzing Extended and Scientific Metadata for Scalable Index Designs
Analyzing Extended and Scientific Metadata for Scalable Index Designs
 
Data wrangling with dplyr
Data wrangling with dplyrData wrangling with dplyr
Data wrangling with dplyr
 
Basic data analysis using R.
Basic data analysis using R.Basic data analysis using R.
Basic data analysis using R.
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
 
Coding and Cookies: R basics
Coding and Cookies: R basicsCoding and Cookies: R basics
Coding and Cookies: R basics
 
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
 
Converting Metadata to Linked Data
Converting Metadata to Linked DataConverting Metadata to Linked Data
Converting Metadata to Linked Data
 
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...Open Harvester - Search publications for a researcher from CrossRef, PubMed a...
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...
 
Data Management Services at the Morgan Library
Data Management Services at the Morgan LibraryData Management Services at the Morgan Library
Data Management Services at the Morgan Library
 
Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees
Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter TreesExpediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees
Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
 
Introduction to open-data
Introduction to open-dataIntroduction to open-data
Introduction to open-data
 
A basic course on Research data management, part 4: caring for your data, or ...
A basic course on Research data management, part 4: caring for your data, or ...A basic course on Research data management, part 4: caring for your data, or ...
A basic course on Research data management, part 4: caring for your data, or ...
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
 
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
 
Arakno
AraknoArakno
Arakno
 
Data management basics, for UC Davis EDU 292
Data management basics, for UC Davis EDU 292Data management basics, for UC Davis EDU 292
Data management basics, for UC Davis EDU 292
 
A basic course on Research data management, part 1: what and why
A basic course on Research data management, part 1: what and whyA basic course on Research data management, part 1: what and why
A basic course on Research data management, part 1: what and why
 

En vedette

Gdz istoriya serednih_vikiv
Gdz istoriya serednih_vikivGdz istoriya serednih_vikiv
Gdz istoriya serednih_vikiv
Lucky Alex
 
Marketing mix of an NGO (shanessa)
Marketing mix of an NGO (shanessa)Marketing mix of an NGO (shanessa)
Marketing mix of an NGO (shanessa)
Heemanish Midde
 
Communication Research Methods
Communication Research MethodsCommunication Research Methods
Communication Research Methods
Jenny Donley
 

En vedette (16)

Data and donuts: Data Visualization using R
Data and donuts: Data Visualization using RData and donuts: Data Visualization using R
Data and donuts: Data Visualization using R
 
Who invented donuts?
Who invented donuts?Who invented donuts?
Who invented donuts?
 
Gdz istoriya serednih_vikiv
Gdz istoriya serednih_vikivGdz istoriya serednih_vikiv
Gdz istoriya serednih_vikiv
 
Scholarly social media applications platforms for knowledge sharing and net...
Scholarly social media applications   platforms for knowledge sharing and net...Scholarly social media applications   platforms for knowledge sharing and net...
Scholarly social media applications platforms for knowledge sharing and net...
 
Collaborative Data Management using OSF
Collaborative Data Management using OSFCollaborative Data Management using OSF
Collaborative Data Management using OSF
 
Policy Briefs: a development research communication tool
Policy Briefs:a development research communication toolPolicy Briefs:a development research communication tool
Policy Briefs: a development research communication tool
 
Facebook data analysis using r
Facebook data analysis using rFacebook data analysis using r
Facebook data analysis using r
 
New science communication: Research and Innovation in the Era of the Internet
New science communication: Research and Innovation in the Era of the InternetNew science communication: Research and Innovation in the Era of the Internet
New science communication: Research and Innovation in the Era of the Internet
 
Marketing mix of an NGO (shanessa)
Marketing mix of an NGO (shanessa)Marketing mix of an NGO (shanessa)
Marketing mix of an NGO (shanessa)
 
Social Media for Research Communication
Social Media for Research CommunicationSocial Media for Research Communication
Social Media for Research Communication
 
Communication tools for research communication
Communication tools for research communicationCommunication tools for research communication
Communication tools for research communication
 
Communication Research Methods
Communication Research MethodsCommunication Research Methods
Communication Research Methods
 
Fundamental of Communication Research
Fundamental of Communication ResearchFundamental of Communication Research
Fundamental of Communication Research
 
Social Media Marketing Nonprofits and NGO
Social Media Marketing Nonprofits and NGOSocial Media Marketing Nonprofits and NGO
Social Media Marketing Nonprofits and NGO
 
Beyond the usual: Integrating strategic communication into research
Beyond the usual: Integrating strategic communication into researchBeyond the usual: Integrating strategic communication into research
Beyond the usual: Integrating strategic communication into research
 
Building your brand – A practical guide for nonprofit organizations
Building your brand – A practical guide for nonprofit organizationsBuilding your brand – A practical guide for nonprofit organizations
Building your brand – A practical guide for nonprofit organizations
 

Similaire à Data and Donuts: Data organization

Powerpoint versiebeheer there is no such thing as a final version 1
Powerpoint versiebeheer there is no such thing as a final version 1Powerpoint versiebeheer there is no such thing as a final version 1
Powerpoint versiebeheer there is no such thing as a final version 1
Hugo Besemer
 

Similaire à Data and Donuts: Data organization (20)

Lab Notebooks: A Librarian's Primer
Lab Notebooks: A Librarian's PrimerLab Notebooks: A Librarian's Primer
Lab Notebooks: A Librarian's Primer
 
Preventing data loss
Preventing data lossPreventing data loss
Preventing data loss
 
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate Researchers
 
A Guide for Reproducible Research
A Guide for Reproducible ResearchA Guide for Reproducible Research
A Guide for Reproducible Research
 
Reviewing the literature
Reviewing the literatureReviewing the literature
Reviewing the literature
 
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
 
Powerpoint versiebeheer there is no such thing as a final version 1
Powerpoint versiebeheer there is no such thing as a final version 1Powerpoint versiebeheer there is no such thing as a final version 1
Powerpoint versiebeheer there is no such thing as a final version 1
 
Documentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM BootcampDocumentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM Bootcamp
 
Intro to dh data management
Intro to dh data management Intro to dh data management
Intro to dh data management
 
File_Organization_112014
File_Organization_112014File_Organization_112014
File_Organization_112014
 
The big six!
The big six!The big six!
The big six!
 
Data Management for Undergraduate Research
Data Management for Undergraduate ResearchData Management for Undergraduate Research
Data Management for Undergraduate Research
 
Medlink revision course in a box
Medlink revision course in a boxMedlink revision course in a box
Medlink revision course in a box
 
Data management (newest version)
Data management (newest version)Data management (newest version)
Data management (newest version)
 
Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto University
 
Scientific Writing in Agriculture Handbook
Scientific Writing in Agriculture Handbook Scientific Writing in Agriculture Handbook
Scientific Writing in Agriculture Handbook
 
Scientific Writing in Agriculture 2015
Scientific Writing in Agriculture 2015Scientific Writing in Agriculture 2015
Scientific Writing in Agriculture 2015
 
The Data Analysis Workflow
The Data Analysis WorkflowThe Data Analysis Workflow
The Data Analysis Workflow
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curation
 

Plus de C. Tobin Magle (6)

Open access day
Open access dayOpen access day
Open access day
 
Bringing bioinformatics into the library
Bringing bioinformatics into the libraryBringing bioinformatics into the library
Bringing bioinformatics into the library
 
Reproducible research: practice
Reproducible research: practiceReproducible research: practice
Reproducible research: practice
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
CU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data ServicesCU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data Services
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in libraries
 

Dernier

Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 

Dernier (20)

Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 

Data and Donuts: Data organization

  • 1. Data Organization C. Tobin Magle, PhD Feb. 28, 2017 10:00-11:30 a.m. Morgan Library Computer Classroom 175 *inspired by content from Data Carpentry
  • 3. Main topics • Hierarchical organization • Folders in folders • Open Science Framework • File naming • Human readability • Machine readability • “Tidy” data in spreadsheets
  • 4. Folder systems • Organize your data hierarchically • Identify ways to divide your data into categories (Attributes) • Top level organization is the most important attribute
  • 5. Hierarchical Organization Putting your files into a folder system my_project Data Notes protocols manuscripts Paper1 Figures Text References Paper2
  • 6. Questions to ask • What kinds of files are there? (See data inventory) • How could you group them? • Project? • Time? • Location? • File type? • What are the most important attributes?
  • 7. Example: Lou the first year Lou is a first year graduate student working on a project in a biomedical research laboratory. He’s trying to decipher data left by a former post doc as a start for his thesis project. For one year, the postdoc recorded weight daily and cytokine levels monthly from 16 mice. Half were infected with a parasite, half were treated with saline. • List the attributes of his project? • How would you rank these attributes?
  • 8. Example: Lou the first year Lou is a first year graduate student working on a project in a biomedical research laboratory. He’s trying to decipher data left by a former post doc as a start for his thesis project. For one year, the postdoc recorded weight daily and cytokine levels monthly from 16 mice. Half were infected with a parasite, half were treated with saline. • List the attributes of his project? • How would you rank these attributes? Attributes • Time
  • 9. Example: Lou the first year Lou is a first year graduate student working on a project in a biomedical research laboratory. He’s trying to decipher data left by a former post doc as a start for his thesis project. For one year, the postdoc recorded weight daily and cytokine levels monthly from 16 mice. Half were infected with a parasite, half were treated with saline. • List the attributes of his project? • How would you rank these attributes? Attributes • Time • Infection Status
  • 10. Example: Lou the first year Lou is a first year graduate student working on a project in a biomedical research laboratory. He’s trying to decipher data left by a former post doc as a start for his thesis project. For one year, the postdoc recorded weight daily and cytokine levels monthly from 16 mice. Half were infected with a parasite, half were treated with saline. • List the attributes of his project? • How would you rank these attributes? Attributes • Time • Infection Status • Data Type
  • 11. Exercise: Organize files • Download Lou’s files (look in the README file for insight) • http://tinyurl.com/hvna4mg • Create a hierarchical folder structure for Lou • Drag his files into the correct folders • Fix Lou’s README • Bonus: think about how you’d organize your data.
  • 12. Tool: Open Science Framework • Components • Add-ons • Contributors • Wiki http://help.osf.io/m/collaborating/l/524109-using-the-wiki http://www.slideshare.net/DuraSpace/121014 -slides-roadmap-to-the-future-of-share
  • 13. Organization tips • Be consistent • One directory per project • Separate components for • Raw data • Processed data • Code • Output • Make raw data read-only • Make README files http://help.osf.io/m/60347/l/611391-organizing-files
  • 14. Components • “Subprojects” • Separate privacy settings, contributors, wiki, add-ons, and files. • Examples: • Different projects: https://osf.io/82fba/ • Clinical: https://osf.io/gq4mz/ • Manuscript: https://osf.io/if7ug/ • Collaboration: https://osf.io/ezcuj/
  • 15. Demo: Getting started with OSF 1. Create a project 2. Add components 3. Add files
  • 16. Don’t panic! • Just try something • There’s no right answer • Be consistent • Write a README.txt file http://4vector.com/i/free-vector-don-t-panic-clip- art_103946_Dont_Panic_clip_art_hight.png
  • 17. File naming conventions Make file name both human and machine readable.
  • 18. Use descriptive names • Bad name: file.txt • Ok name: 05-07-2016-mouse-data.txt • Good name: 2016-05-07-mouse-weight.tsv • Human readability: name contains information about content
  • 19. Go from general to specific • Bad name: rep1-5-7-2016-gene-expression.csv • Good name: 2016-05-07-gene-expression-rep1.csv • Machine readability: can be sorted meaningfully
  • 20. Avoid abbreviations • Bad name: “sprlbgp1” • Good name: “spencer_lab_group_1” • Human readability: no one understands your acronyms
  • 21. Avoid spaces • Alternatives • Dashes-are-cool.txt • I_also_like_underscores.txt • CamelCaseIsNeatToo.txt • Machine readability: spaces are delimiters in programming • Human readability: delineates words
  • 22. Avoid special characters • Bad characters: ~ ! @ # $ % ^ & * ( ) ` ; < > ? , [ ] { } ' " • Machine readability: can have special meanings in scripting languages • Example: ~ tells unix to go to your home directory • Alternatives: underscore (_) dash( - ) dot (.)
  • 23. Be consistent • Establishing standards makes data more findable • Extending standards to everyone who works on a project is even better
  • 24. Renaming files • Ways to Automate file renaming • Bulk Rename Utility (Windows, free) • Renamer 5 (Mac) • PSRenamer (Linux, Mac, or Windows, free)
  • 25. Exercise: Rename Lou’s files • Use descriptive names • General to specific • Avoid abbreviations, spaces and special characters • Be consistent
  • 26. Tidy data How to organize your data efficiently in spreadsheets
  • 27. Spreadsheets as lab notebook • Color coding • Formatting • Notes • Calculations • Graphs/Tables
  • 28. Downsides • Computers don’t understand notes/formatting/color coding • Calculations/Graphs/Tables in spreadsheets are inefficient • “Tidy data” + automation = saved time
  • 29. Using spreadsheets wisely • Don’t put multiple tables in one sheet • Don’t use multiple sheets • Use descriptive field names • Don’t mix notes and data
  • 30. Tidy Data 1. Columns as variables • Don’t combine multiple pieces of info in one column 2. Rows as observations • One measured value
  • 31. Exercise: Tidy Lou’s data • Open MouseInventory.xls • Is he using spreadsheets wisely? • Is each column a variable? • Is each row an observation? • Open the January files for both weight and cytokines • What variables are being measured? –ie, what columns should we have? • Can we combine some of these tables?
  • 32. Exercise: Data carpentry ecology • Lesson: http://www.datacarpentry.org/spreadsheet-ecology- lesson/ • File: https://ndownloader.figshare.com/files/2252083 • Goal: combine data from first 2 tabs into one table • Make a new tab, don’t edit the raw data!
  • 33. Example: Supplemental_data_1_xls • https://figshare.com/articles/Supplemental_data_1_xls/4055544 • Description: “Table of the results given by HPLC analysis of the samples. Key: Rt, retention time; +, presence of peak; -, absence of peak.”
  • 34. Example: cck8_xls • https://figshare.com/articles/cck8_xls/3505772 • Description: “This data are from CCK-8 assay and ELISA.”
  • 35. Need help? • Email: tobin.magle@colostate.edu • Data Management Services website: http://lib.colostate.edu/services/data-management • Data Carpentry: http://www.datacarpentry.org/ • Software Carpentry: http://software-carpentry.org/