2. Scientific Data
“The recorded information (regardless of the
form or the media in which they may exist)
necessary to support or validate a research
project’s observations, findings or outputs.”
-University of Oxford
2BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
3. A Usual Research Cycle
3BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
5. Considerations of DIR
• Data Acquisition
• Data Processing/Analysis
• Result reproduction
• Availability of data
• Teamwork and data sharing
• Digital rights
• Referencing and citation
Data Management is needed
5BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
6. Need for Data Management
“Data management refers to all aspects of
creating, housing, delivering, maintaining, and
archiving and preserving data. It is one of the
essential areas of responsible conduct of
research.”
-MANTRA 2013
6BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
7. DM is done through a Lifecycle
7BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
Boston University The University
Of Alabama
The University Of Virginia
DataONE The U.S. Geological Survey
9. • Data and File Formats
• Data Standards
• Data Access Policies
• Data Management Plan
• Data Preservation Plan
• Data Retirement
• Quality Level
• Hardware
• Software
• Cost/ Funding
• Technical Staff
• Tools:
https://dmptool.org/
https://dmponline.dcc.
ac.uk
9BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
10. 10BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
• By means of:
– Collecting new data
– Updating existing data
– Converting/Transforming existing data
– Purchasing/Obtaining data
• Either manually or automated
• In the laboratory, in the field, or by computation
• Following methodologies, standards,
recommendations
• Satisfying constraints such as access policies
11. • To prepare data for subsequent use
– Verify
– Organize
– Transform
– integrate, and extract
• Tools:
– OpenRefine/ GoogleRefine
– Statistical software: R, SAS
– Modeling Tools: ….
11BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
12. • describe facts
• detect patterns
• develop explanations
• test hypotheses.
• This includes
– data quality assurance
– statistical data analysis
– Modeling
– interpretation of analysis results.
12BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
13. • The need for:
– Supporting research publications by associated, accessible
datasets.
– re-usability by others
• actions and procedures to:
– keep data for some period of time
– set data aside for future use
– archiving in a data repository.
• Considering
– Discovering
– Identification
– Reproduction/ Presentation
– Policies
13BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
14. • Disseminate quality data to the public and to
other agencies
• Medium- and agent-independent
• Via non-/automated mechanisms
• Shared, but with controls
• Useful metadata
14BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
15. • What to publish:
– the research result citing
the data
– A data paper describing
the data
– The data itself
• Where to Publish:
– Catalogs
– Portals
– Repositories
– National Archives
• Considerations
– Licensing and rights
– Cost
– Sensitive data
– Anonymization
15BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
16. “Metadata is information about the context,
content, quality, provenance, and/or
accessibility of a set of data.”
-Digital Curation at the University of Wisconsin-Madison
16BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
But why it is needed?
17. 17BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
DATADETAILS
Time of data development
Specific details about problems with individual items or specific dates
are lost relatively rapidly
General details about datasets are lost through
time
Accident or
technology
change may
make data
unusable
Retirement or career change makes
access to “mental storage” difficult or
unlikely
Loss of data developer
leads to loss of remaining
information
TIME (Michener et al 1997)
18. • Formally describes various key attributes of
each data element or collection of elements
• To maintain data quality.
• And make use of data possible/ easier
18BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
19. • QA focuses on building-in quality to prevent defects
– Setting the Quality Level
– Setting standards
– Proper protocols and methods for:
• Data collection
• Data processing and usage
• Maintenance
• QC focuses on testing for quality (Defect detection)
– Acceptance Criteria
– Automatic QC upon data manipulation
– Configuring/testing instruments
– Unit of measurement, accuracy, conversion errors, …
19BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
20. • Protect data from:
– Loss
– Corruption
– Unauthorized access
• Regular backups
• Regular restores
• Proper structure and naming
20BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
21. Feedback
The first part is over
Thank you
21BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
22. Note!
There are some suggestions for cooperation at
the end of the workshop
22BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
23. The Workshop
• BExIS
– Data Lifecycle
Management
– Generic
– Extensible
– Portable
– Scalable
• Flexible Data Structures
• Data Submission
• Validation
• Preserving
• Metadata Management
• Versioning
23BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
27. The Scenario
• Registration/ Logging in
• Seeing the data and metadata structures
• Downloading a template
• Filling in the Excel data (sample datasets)
• Uploading the datasets
• Providing metadata
• Checking validations
• Seeing the dataset in the system
• Searching, etc.
27BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
30. Example Datasets
30BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
– Tectonic Stress Fields on BExIS website
– International Seismological Center
– DATA TYPE FOCAL MECHANISM
31. Creating a Data Structure
31BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
32. Creating a Data Structure
32BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran
49. 4949
Thank You:
Workshop Participants
Martin Hohmuth
Nafiseh Navabpour
Roman Gerlach
Contact:
javad.chamanara@uni-jena.de
http://bexis2.uni-jena.de
BEXIS Tech Talk #2: The Conceptual Model
Acknowledgment
50. Suggestions
• Data Lifecycle survey
– List of lifecycles
– Their features/domain of application
– Strengths/ weaknesses
• GSI data lifecycle
– Best of all
– Customizable
– …
50BExIS Hands-On Workshop, Feb. 2016, Tehran, Iran