Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Towards Knowledge Graphs of Reusable Research Software Metadata

181 vues

Publié le

Research software is a key asset for understanding, reusing and reproducing results in computational sciences. An increasing amount of software is stored in code repositories, which usually contain human readable instructions indicating how to use it and set it up. However, developers and researchers often need to spend a significant amount of time to understand how to invoke a software component, prepare data in the required format, and use it in combination with other software. In addition, this time investment makes it challenging to discover and compare software with similar functionality. In this talk I will describe our efforts to address these issues by creating and using Open Knowledge Graphs that describe research software in a machine readable manner. Our work includes: 1) an ontology that extends schema.org and codemeta, designed to describe software and the specific data formats it uses; 2) an approach to publish software metadata as an open knowledge graph, linked to other Web of Data objects; and 3) a framework for automatically extracting metadata from software repositories; and 4) a framework to curate, query, explore and compare research software metadata in a collaborative manner. The talk will illustrate our approach with real-world examples, including a domain application for inspecting and discovering hydrology, agriculture, and economic software models; and the results of our framework when enriching the research software entries in Zenodo.org.

Publié dans : Ingénierie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Towards Knowledge Graphs of Reusable Research Software Metadata

  1. 1. Information Sciences Institute TOWARDS KNOWLEDGE GRAPHS OF REUSABLE RESEARCH SOFTWARE METADATA Daniel Garijo, Yolanda Gil, Maximiliano Osrio, Varun Ratnakar, Deborah Khider, Hernan Vargas Information Sciences Institute, University of Southern California @dgarijov dgarijo@isi.edu
  2. 2. Information Sciences Institute Is there a reproducibility crisis? [Nature, 2016] Source: https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970
  3. 3. Information Sciences Institute Reproducibility in Computational Sciences: Open Research Data, Software and Methods Scientific publication Research Data Research Software Research Methods
  4. 4. Information Sciences Institute Challenges for Finding, Understanding, (Re)Using and Sharing Research Software • What does the software component do? Which of its methods should I use? • How to transform my data to use the software component? • How to interpret the results produced by the software component? • How to invoke the software component? • How to configure the software component with the right parameters? • How to compare software with similar software? Software designerSoftware user • How to ease capturing the dependencies and installation instructions of my software? • How to encapsulate my software so it can be used with other data? • How to describe my software so it can be used by others? • How to test if my software is ready to be used by others? • How can my component be found by others
  5. 5. Information Sciences Institute How are we addressing these challenges? 1. Describe Research Software in a machine-readable manner 2. Link and connect Research Software in Knowledge Graphs 3. Build applications for helping finding, understanding and reusing Research Software using those Knowledge Graphs
  6. 6. Information Sciences Institute 1. Describing Research Software metadata in a machine-readable manner
  7. 7. Information Sciences Institute Representing Software Metadata: OntoSoft Crowdsourced Software Metadata Registry • Complements code repositories to make them understandable • Software metadata designed for scientists • Metadata is curated by decentralized communities of users • Training scientists on best practices http://ontosoft.org Finding Software OntoSoft: Capturing scientific software metadata. Gil, Y.; Ratnakar, V.; and Garijo, D. In Proceedings of the 8th International Conference on Knowledge Capture, pages 32, 2015. ACM
  8. 8. Information Sciences Institute Adding Structure to Software Metadata: OKG-Soft Explore input/output variables Explore Software I/O files Knowledge Graph with machine-readable Software Metadata: • (From OntoSoft) Attribution, license, funding, usage examples... • Executable software components • Software invocation • Input & output files, variables and units • Containers used to encapsulate and run software components [Garijo et al 2019]: OKG-Soft: An Open Knowledge Graph with Machine Readable Scientific Software Metadata. International Conference on eScience, San Diego, USA. 2019
  9. 9. Information Sciences Institute Evolving OntoSoft: Software Description Ontology https://w3id.org/okn/o/sd# Extensions: • Schema.org/Codemeta (software metadata) • W3C Data Cubes (Contents of inputs and outputs) • NASA QUDT (Units) • DockerPedia (Software images) • Scientific Variables Ontology (Standard Variables) 14
  10. 10. Information Sciences Institute 1. Describing Research Software Metadata 2. Creating Knowledge Graphs with Research Software Metadata • Automatically
  11. 11. Information Sciences Institute Automated Software Metadata Annotation [Mao et al 2019]: SoMEF: A Framework for Capturing Software Metadata from its Documentation. 2019 IEEE BigData REU Symposium. Los Angeles, 2019 whimian/pyGeoPressure SoMEF Description: A Python package for pore pressure prediction... Installation: pip install pygeopressure Invocation: import pygeopressure as ppp Citation: Yu, (2018). PyGeoPressure: Geopressure Prediction in Python. Journal of Open Source Software, 3(30), 992, https://doi.org/10.21105/joss.00992 Software Metadata Extraction Framework Software repository Metadata fields (17 metadata categories): description, installation instructions, invocation, citation, usage notes, requirements, contact, contributors, FAQ, support, license, keywords... https://somef.readthedocs.io/en/latest/ https://github.com/KnowledgeCaptureAndDiscovery/somef
  12. 12. Information Sciences Institute SOSEN-KG: integrating Zenodo and GitHub https://github.com/KnowledgeCaptureAndDiscovery/sosen Prototype with > 13K entries of research software metadata • Integrating metadata from Zenodo and GitHub (versions, authors, etc.) • Expanding it with Wikidata (future work)
  13. 13. Information Sciences Institute 1. Describing Research Software Metadata 2. Creating Knowledge Graphs with Research Software Metadata • Automatically • Crowdsourcing
  14. 14. Information Sciences Institute OKG-SOFT Software Model Catalog contains: • Models from hydrology, agriculture and economy, their versions and model configurations. • More than 200 variables mapped to SVO. • All models are executable through scientific workflows • Most contents are added manually (expert users) collaboratively • Automated unit transformations • Automated software image description • Semi-automated Wikidata linking OKG-Soft: An Open Knowledge Graph with Machine Readable Scientific Software Metadata. Garijo, D.; Osorio, M.; Khider, D.; Ratnakar, V.; and Gil, Y. In 2019 15th International Conference on eScience (eScience), pages 349–358, San Diego, CA, USA, September 2019. IEEE
  15. 15. Information Sciences Institute 1. Describing Research Software Metadata 2. Creating Knowledge Graphs with Research Software Metadata • Automatically • Crowdsourcing 3. Using KGs to Find, Understand and Reuse Research Software
  16. 16. Information Sciences Institute OntoSoft: Comparing Software Metadata PIHM PIHMgis DrEICH TauDEM WBMsed
  17. 17. Information Sciences Institute OKG-SOFT Framework: Exploring Research Software Model Metadata Explore variables of inputs and outputs Explore software I/O Find, compare and configure software models http://models.mint.isi.edu
  18. 18. Information Sciences Institute Research Software Reuse: Encapsulating & Testing Machine- readable component specification Assistants + Guidelines TestsTestsTests Portable Component Software Metadata Registry OKG-SOFT https://mic-cli.readthedocs.io/en/latest/ https://dame-cli.readthedocs.io/en/latest/
  19. 19. Information Sciences Institute Summing up...
  20. 20. Information Sciences Institute Overcoming the reproducibility crisis (partly) • Research software is a critical asset for reproducible computational experiments • We need to improve the findability, (re)usability and understanding of research software: – Wider adoption – Better comparison of similar computational methods – Better understanding of data products • In this presentation we covered: – How to describe research software and its metadata • OntoSoft, Software Description Ontology – How to build Knowledge Graphs with research software metadata • OntoSoft, OKG-Soft, SOSEN-KG – How we are using KGs to help find, compare, understand and reuse research software
  21. 21. Information Sciences Institute Knowledge Capture and Discovery Group Yolanda Gil Varun Ratnakar Daniel Garijo Deborah Khider Maximiliano Osorio Hernan Vargas https://knowledgecaptureanddiscovery.github.io/
  22. 22. Information Sciences Institute TOWARDS KNOWLEDGE GRAPHS OF REUSABLE RESEARCH SOFTWARE METADATA Daniel Garijo, Yolanda Gil, Maximiliano Osrio, Varun Ratnakar, Deborah Khider, Hernan Vargas Information Sciences Institute, University of Southern California @dgarijov dgarijo@isi.edu

×