SlideShare a Scribd company logo
1 of 18
TWC 
Why Data Science Matters 
Xiaogang (Marshall) Ma 
Tetherless World Constellation 
Rensselaer Polytechnic Institute 
Email: max7@rpi.edu; Twitter: @MarshallXMa 
ICSU-WDS Data Stewardship Award Lecture 
SciDataCon 2014, New Delhi, India, Nov. 02-05
TAckWnowledgCements 
• Dr. Mustapha Mokrane and Dr. Simon Hodson 
• Colleagues at TWC/RPI, CODATA-ECDP, ESIP, CGI-IUGS, 
AGU/ESSI, ICSU-WDS, RDA, ITC, and more 
• My mentor Prof. Peter Fox 
• My family 
• All of you
TWOutlinCe 
• Technical trends 
– Data management, publication & citation 
• Methodology 
– Interoperability & Provenance 
• Data management is just a start 
– Data analysis 
– Semantic eScience 
3
TDatWa ManagCement 
4 
data work 
Image courtesy Randy Glasbergen
DTata MWanagemCent Plan 
• Data Management Plan 
– A formal document that outlines what you will do with your data 
during and after you complete your research 
• Resources/Tools help create DMPs: 
– NSF Data Management Plan Requirements: 
http://www.nsf.gov/eng/general/dmp.jsp 
– DCC Data Management Plans: 
http://www.dcc.ac.uk/resources/data-management-plans 
– DMPTool: https://dmptool.org 
– DCC DMPOnline: https://dmponline.dcc.ac.uk 
5
TDaWta PubliCcation 
• Data as first class products of research 
– e.g., NSF bio-sketches can include data publications 
See: http://www.nsf.gov/pubs/2013/nsf13004/nsf13004.jsp 
6 Image from j4h.net
TWC 
7 
“All data necessary to understand, assess, and extend the conclusions of 
the manuscript must be available to any reader of Science. ” 
“…authors are required to make materials, data and associated protocols 
promptly available to readers without undue qualifications.” 
“…authors must make materials, data, and associated protocols available 
to readers.” 
“…it is a condition of publication that authors make available the data and 
research materials supporting the results in the article.” 
“…require authors to make all data underlying the findings described in 
their manuscript fully available without restriction…” 
“Earth and space science data should be widely accessible in multiple 
formats and long‐term preservation of data is an integral responsibility of 
scientists and sponsoring institutions.” 
“…support the principle that research data should be made freely 
available to all researchers…” 
“…recommends depositing data that correspond to journal articles in 
reliable data repositories…”
TWC 
• Ways of data publication 
– Data as supplemental material of a paper 
– Standalone data 
– Data paper: data in a repository + descriptive ‘data paper’ 
8 
Examples: 
• Standalone data journals: Nature Scientific Data, Geoscience Data 
Journal, Ecological Archives, Data in Brief … 
• Journals that publish data papers: Earth and Space Science, 
GigaScience, F1000 Research, Internet Archaeology … 
Strasser, GeoData 2014 Workshop Presentation (2014)
TWC 
9 
An isolateddata island ?! 
Image from nature.com
TDWata CitaCtion 
• Data Citation Index 
– Indexes the world's leading data repositories 
– Connects datasets to related refereed literature indexed in 
the Web of Science™ 
– Efficient access to data across subjects and regions 
10 
Image courtesy http://wokinfo.com
TDataW interopCerability 
11 
Interoperability: 
“Data should be discoverable, accessible, decodable, 
understandable and usable, and data sharing should be 
legal and ethical for all participants.” 
Ma et al., Nature Geosciecne (2011) 
Original image from: http://ehna.org
PTroveWnance ofC research 
12 
Provenance documentation 
“Linking a range of observations and model outputs, research 
activities, people and organizations involved in the production of 
scientific findings with the supporting data sets and methods 
used to generate them” 
Image from nature.com 
Ma et al., Nature Climate Change (2014) 
http://data.globalchange.gov
TWC • IPython Notebook: 
A web-based interactive computational environment 
Codes, APIs, 
datasets, text… 
PDF document 
• We made extension to the IPython Notebook 
environment to enable automatic provenance 
capture during a scientific workflow 
Di Stefano et al., ESIP 2014 Summer Meeting Presentation (2014) 
13
TWC 
14
TSemWantic eSCcience 
• Artificial Intelligence accelerates scientific discovery 
– Data search, synthesis and hypothesis representation 
– Data analysis: reasoning with models of the data 
Gil et al., Science (2014) 
Image from science.com 
A state-of-the-art example: 
Hanalyzer (high-throughput analyzer) 
• Uses natural language processing to 
automatically extract a semantic network from 
all PubMed papers relevant to a scientist 
• Uses Semantic Web technology to integrate 
assertions from other biomedical sources 
• Reasons about the network to find new 
correlations that suggest new genes to 
investigate 
Leach et al., PLoS Comput Bio (2009) 
15
TWC Deep Carbon Virtual Observatory 
Fox, RDA Fourth Plenary Meeting Presentation (2014) 
A cyber-enabled 
platform for linked 
science 
http://deepcarbon.net
TWSummaCry 
• Data as first class products of research 
• eScience: the digital or electronic facilitation of science 
• Semantic eScience 
– A virtuous circle between science and semantic technologies 
– Data driven + Knowledge driven? 
Image courtesy @WileyExchanges 
17
TWC 
More information: 
Marshall X Ma 
max7@rpi.edu 
Thank you!

More Related Content

What's hot

Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at ElsevierPaul Groth
 
Research data management free online courses, publisher policies
Research data management   free online courses, publisher policiesResearch data management   free online courses, publisher policies
Research data management free online courses, publisher policiesNikesh Narayanan
 
SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...Natalie Stanford
 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016Susanna-Assunta Sansone
 
Data Management Services at the Morgan Library
Data Management Services at the Morgan LibraryData Management Services at the Morgan Library
Data Management Services at the Morgan LibraryC. Tobin Magle
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsPaul Groth
 
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13DataDryad
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data managementCunera Buys
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialPaul Groth
 
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar SeriesKatina Toufexis
 
The Disappearing Data Scientist
The Disappearing Data ScientistThe Disappearing Data Scientist
The Disappearing Data ScientistKurt Cagle
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset useHeather Piwowar
 
Machines are people too
Machines are people tooMachines are people too
Machines are people tooPaul Groth
 
Collaborative Data Management using OSF
Collaborative Data Management using OSFCollaborative Data Management using OSF
Collaborative Data Management using OSFC. Tobin Magle
 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityJames Hendler
 

What's hot (20)

Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at Elsevier
 
Strasser "Effective data management and its role in open research"
Strasser "Effective data management and its role in open research"Strasser "Effective data management and its role in open research"
Strasser "Effective data management and its role in open research"
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Research data management free online courses, publisher policies
Research data management   free online courses, publisher policiesResearch data management   free online courses, publisher policies
Research data management free online courses, publisher policies
 
SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...
 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016
 
Data Management Services at the Morgan Library
Data Management Services at the Morgan LibraryData Management Services at the Morgan Library
Data Management Services at the Morgan Library
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
 
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data management
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational Material
 
EDI Training Module 2: EDI Project
EDI Training Module 2:  EDI ProjectEDI Training Module 2:  EDI Project
EDI Training Module 2: EDI Project
 
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
 
The Disappearing Data Scientist
The Disappearing Data ScientistThe Disappearing Data Scientist
The Disappearing Data Scientist
 
Open Research 2017
Open Research 2017Open Research 2017
Open Research 2017
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset use
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
 
Collaborative Data Management using OSF
Collaborative Data Management using OSFCollaborative Data Management using OSF
Collaborative Data Management using OSF
 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/Interoperability
 

Viewers also liked

Exploring the Web of Data for Earth and Environmental Sciences
Exploring the Web of Data for Earth and Environmental SciencesExploring the Web of Data for Earth and Environmental Sciences
Exploring the Web of Data for Earth and Environmental SciencesXiaogang (Marshall) Ma
 
Ontology spectrum for geological data interoperability (PhD defense nov 2011)
Ontology spectrum for geological data interoperability (PhD defense nov 2011)Ontology spectrum for geological data interoperability (PhD defense nov 2011)
Ontology spectrum for geological data interoperability (PhD defense nov 2011)Xiaogang (Marshall) Ma
 
Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Obser...
Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Obser...Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Obser...
Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Obser...Xiaogang (Marshall) Ma
 
Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semant...
Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semant...Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semant...
Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semant...Xiaogang (Marshall) Ma
 
A short story of geologic time ontologies and vocabularies
A short story of geologic time ontologies and vocabulariesA short story of geologic time ontologies and vocabularies
A short story of geologic time ontologies and vocabulariesXiaogang (Marshall) Ma
 
Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal
Adoption of RDA DTR and PID in Deep Carbon Observatory Data PortalAdoption of RDA DTR and PID in Deep Carbon Observatory Data Portal
Adoption of RDA DTR and PID in Deep Carbon Observatory Data PortalXiaogang (Marshall) Ma
 
Exploratory visualization of earth science data in a Semantic Web context
Exploratory visualization of earth science data in a Semantic Web contextExploratory visualization of earth science data in a Semantic Web context
Exploratory visualization of earth science data in a Semantic Web contextXiaogang (Marshall) Ma
 
Ontology Development for Provenance Tracing in National Climate Assessment o...
Ontology Development for Provenance Tracing in National Climate Assessment o...Ontology Development for Provenance Tracing in National Climate Assessment o...
Ontology Development for Provenance Tracing in National Climate Assessment o...Xiaogang (Marshall) Ma
 
A short review of Connected China: A visualization of elite social networks i...
A short review of Connected China: A visualization of elite social networks i...A short review of Connected China: A visualization of elite social networks i...
A short review of Connected China: A visualization of elite social networks i...Xiaogang (Marshall) Ma
 
A use case-driven iterative method for building a provenance-aware GCIS onto...
A use case-driven iterative method for building a provenance-aware GCIS onto...A use case-driven iterative method for building a provenance-aware GCIS onto...
A use case-driven iterative method for building a provenance-aware GCIS onto...Xiaogang (Marshall) Ma
 
Why data science matters and what we can do with it
Why data science matters and what we can do with itWhy data science matters and what we can do with it
Why data science matters and what we can do with itXiaogang (Marshall) Ma
 
From data portal to knowledge portal: Leveraging semantic technologies to sup...
From data portal to knowledge portal: Leveraging semantic technologies to sup...From data portal to knowledge portal: Leveraging semantic technologies to sup...
From data portal to knowledge portal: Leveraging semantic technologies to sup...Xiaogang (Marshall) Ma
 

Viewers also liked (13)

Exploring the Web of Data for Earth and Environmental Sciences
Exploring the Web of Data for Earth and Environmental SciencesExploring the Web of Data for Earth and Environmental Sciences
Exploring the Web of Data for Earth and Environmental Sciences
 
Ontology spectrum for geological data interoperability (PhD defense nov 2011)
Ontology spectrum for geological data interoperability (PhD defense nov 2011)Ontology spectrum for geological data interoperability (PhD defense nov 2011)
Ontology spectrum for geological data interoperability (PhD defense nov 2011)
 
Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Obser...
Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Obser...Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Obser...
Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Obser...
 
Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semant...
Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semant...Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semant...
Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semant...
 
A short story of geologic time ontologies and vocabularies
A short story of geologic time ontologies and vocabulariesA short story of geologic time ontologies and vocabularies
A short story of geologic time ontologies and vocabularies
 
Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal
Adoption of RDA DTR and PID in Deep Carbon Observatory Data PortalAdoption of RDA DTR and PID in Deep Carbon Observatory Data Portal
Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal
 
Exploratory visualization of earth science data in a Semantic Web context
Exploratory visualization of earth science data in a Semantic Web contextExploratory visualization of earth science data in a Semantic Web context
Exploratory visualization of earth science data in a Semantic Web context
 
Ontology Development for Provenance Tracing in National Climate Assessment o...
Ontology Development for Provenance Tracing in National Climate Assessment o...Ontology Development for Provenance Tracing in National Climate Assessment o...
Ontology Development for Provenance Tracing in National Climate Assessment o...
 
A short review of Connected China: A visualization of elite social networks i...
A short review of Connected China: A visualization of elite social networks i...A short review of Connected China: A visualization of elite social networks i...
A short review of Connected China: A visualization of elite social networks i...
 
A use case-driven iterative method for building a provenance-aware GCIS onto...
A use case-driven iterative method for building a provenance-aware GCIS onto...A use case-driven iterative method for building a provenance-aware GCIS onto...
A use case-driven iterative method for building a provenance-aware GCIS onto...
 
Why data science matters and what we can do with it
Why data science matters and what we can do with itWhy data science matters and what we can do with it
Why data science matters and what we can do with it
 
From data portal to knowledge portal: Leveraging semantic technologies to sup...
From data portal to knowledge portal: Leveraging semantic technologies to sup...From data portal to knowledge portal: Leveraging semantic technologies to sup...
From data portal to knowledge portal: Leveraging semantic technologies to sup...
 
A short introduction to GIS
A short introduction to GISA short introduction to GIS
A short introduction to GIS
 

Similar to Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture

Research data management for masters and ph d students
Research data management for masters and ph d studentsResearch data management for masters and ph d students
Research data management for masters and ph d studentsDebs Martindale
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...EDINA, University of Edinburgh
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ LibraryARDC
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Anita de Waard
 
Open science curriculum for students, June 2019
Open science curriculum for students, June 2019Open science curriculum for students, June 2019
Open science curriculum for students, June 2019Dag Endresen
 
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...hsuleslie
 
Tools für das Management von Forschungsdaten
Tools für das Management von ForschungsdatenTools für das Management von Forschungsdaten
Tools für das Management von ForschungsdatenHeinz Pampel
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Susanna-Assunta Sansone
 
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016Jisc
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Bertram Ludäscher
 
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds
 
Data Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn WoolfreyData Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn Woolfreypvhead123
 
Data sharing as part of the research ecosystem
Data sharing as part of the research ecosystemData sharing as part of the research ecosystem
Data sharing as part of the research ecosystemVarsha Khodiyar
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Robin Rice
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Anita de Waard
 
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017ARDC
 

Similar to Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture (20)

Research data management for masters and ph d students
Research data management for masters and ph d studentsResearch data management for masters and ph d students
Research data management for masters and ph d students
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ Library
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
 
Open science curriculum for students, June 2019
Open science curriculum for students, June 2019Open science curriculum for students, June 2019
Open science curriculum for students, June 2019
 
Researh data management
Researh data managementResearh data management
Researh data management
 
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
 
Tools für das Management von Forschungsdaten
Tools für das Management von ForschungsdatenTools für das Management von Forschungsdaten
Tools für das Management von Forschungsdaten
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
 
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
 
Preparing Your Research Material for the Future - 2015-02-23 - Humanities Div...
Preparing Your Research Material for the Future - 2015-02-23 - Humanities Div...Preparing Your Research Material for the Future - 2015-02-23 - Humanities Div...
Preparing Your Research Material for the Future - 2015-02-23 - Humanities Div...
 
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
 
Data Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn WoolfreyData Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn Woolfrey
 
Open Science and Open Data for Librarians
Open Science and Open Data for LibrariansOpen Science and Open Data for Librarians
Open Science and Open Data for Librarians
 
Data sharing as part of the research ecosystem
Data sharing as part of the research ecosystemData sharing as part of the research ecosystem
Data sharing as part of the research ecosystem
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...
 
Preparing Your Research Material for the Future - 2014-06-09 - Humanities Div...
Preparing Your Research Material for the Future - 2014-06-09 - Humanities Div...Preparing Your Research Material for the Future - 2014-06-09 - Humanities Div...
Preparing Your Research Material for the Future - 2014-06-09 - Humanities Div...
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
 

Recently uploaded

PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptxCherry
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Genome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxGenome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxCherry
 
Site specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfSite specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfCherry
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 
Concept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfConcept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfCherry
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Cherry
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cherry
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCherry
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsbassianu17
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsSérgio Sacani
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxANSARKHAN96
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.Cherry
 

Recently uploaded (20)

PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Genome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxGenome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptx
 
Site specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfSite specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdf
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Concept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfConcept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdf
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 

Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture

  • 1. TWC Why Data Science Matters Xiaogang (Marshall) Ma Tetherless World Constellation Rensselaer Polytechnic Institute Email: max7@rpi.edu; Twitter: @MarshallXMa ICSU-WDS Data Stewardship Award Lecture SciDataCon 2014, New Delhi, India, Nov. 02-05
  • 2. TAckWnowledgCements • Dr. Mustapha Mokrane and Dr. Simon Hodson • Colleagues at TWC/RPI, CODATA-ECDP, ESIP, CGI-IUGS, AGU/ESSI, ICSU-WDS, RDA, ITC, and more • My mentor Prof. Peter Fox • My family • All of you
  • 3. TWOutlinCe • Technical trends – Data management, publication & citation • Methodology – Interoperability & Provenance • Data management is just a start – Data analysis – Semantic eScience 3
  • 4. TDatWa ManagCement 4 data work Image courtesy Randy Glasbergen
  • 5. DTata MWanagemCent Plan • Data Management Plan – A formal document that outlines what you will do with your data during and after you complete your research • Resources/Tools help create DMPs: – NSF Data Management Plan Requirements: http://www.nsf.gov/eng/general/dmp.jsp – DCC Data Management Plans: http://www.dcc.ac.uk/resources/data-management-plans – DMPTool: https://dmptool.org – DCC DMPOnline: https://dmponline.dcc.ac.uk 5
  • 6. TDaWta PubliCcation • Data as first class products of research – e.g., NSF bio-sketches can include data publications See: http://www.nsf.gov/pubs/2013/nsf13004/nsf13004.jsp 6 Image from j4h.net
  • 7. TWC 7 “All data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science. ” “…authors are required to make materials, data and associated protocols promptly available to readers without undue qualifications.” “…authors must make materials, data, and associated protocols available to readers.” “…it is a condition of publication that authors make available the data and research materials supporting the results in the article.” “…require authors to make all data underlying the findings described in their manuscript fully available without restriction…” “Earth and space science data should be widely accessible in multiple formats and long‐term preservation of data is an integral responsibility of scientists and sponsoring institutions.” “…support the principle that research data should be made freely available to all researchers…” “…recommends depositing data that correspond to journal articles in reliable data repositories…”
  • 8. TWC • Ways of data publication – Data as supplemental material of a paper – Standalone data – Data paper: data in a repository + descriptive ‘data paper’ 8 Examples: • Standalone data journals: Nature Scientific Data, Geoscience Data Journal, Ecological Archives, Data in Brief … • Journals that publish data papers: Earth and Space Science, GigaScience, F1000 Research, Internet Archaeology … Strasser, GeoData 2014 Workshop Presentation (2014)
  • 9. TWC 9 An isolateddata island ?! Image from nature.com
  • 10. TDWata CitaCtion • Data Citation Index – Indexes the world's leading data repositories – Connects datasets to related refereed literature indexed in the Web of Science™ – Efficient access to data across subjects and regions 10 Image courtesy http://wokinfo.com
  • 11. TDataW interopCerability 11 Interoperability: “Data should be discoverable, accessible, decodable, understandable and usable, and data sharing should be legal and ethical for all participants.” Ma et al., Nature Geosciecne (2011) Original image from: http://ehna.org
  • 12. PTroveWnance ofC research 12 Provenance documentation “Linking a range of observations and model outputs, research activities, people and organizations involved in the production of scientific findings with the supporting data sets and methods used to generate them” Image from nature.com Ma et al., Nature Climate Change (2014) http://data.globalchange.gov
  • 13. TWC • IPython Notebook: A web-based interactive computational environment Codes, APIs, datasets, text… PDF document • We made extension to the IPython Notebook environment to enable automatic provenance capture during a scientific workflow Di Stefano et al., ESIP 2014 Summer Meeting Presentation (2014) 13
  • 15. TSemWantic eSCcience • Artificial Intelligence accelerates scientific discovery – Data search, synthesis and hypothesis representation – Data analysis: reasoning with models of the data Gil et al., Science (2014) Image from science.com A state-of-the-art example: Hanalyzer (high-throughput analyzer) • Uses natural language processing to automatically extract a semantic network from all PubMed papers relevant to a scientist • Uses Semantic Web technology to integrate assertions from other biomedical sources • Reasons about the network to find new correlations that suggest new genes to investigate Leach et al., PLoS Comput Bio (2009) 15
  • 16. TWC Deep Carbon Virtual Observatory Fox, RDA Fourth Plenary Meeting Presentation (2014) A cyber-enabled platform for linked science http://deepcarbon.net
  • 17. TWSummaCry • Data as first class products of research • eScience: the digital or electronic facilitation of science • Semantic eScience – A virtuous circle between science and semantic technologies – Data driven + Knowledge driven? Image courtesy @WileyExchanges 17
  • 18. TWC More information: Marshall X Ma max7@rpi.edu Thank you!