SlideShare une entreprise Scribd logo
1  sur  24
Eureka Research Workbench:
An Open Source eScience
Laboratory Notebook
Stuart J. Chalk
Department of Chemistry
University of North Florida
schalk@unf.edu
2014 Spring ACS Meeting – CINF Paper 38
 Big Data
 Electronic Notebooks
 The Eureka Research Workbench
 Experiment Markup Language
 ExptML Schema and Files
 Semantic Data and Ontologies
 File Storage
 Eureka Interface
 Web Interface
 Conclusion
Outline
 Current buzz word for “this bring together lots of data and
build tools on top to extract knowledge”
 This is great, except…
 …how do we do that for science?
 Platform, data structures, and exchange protocols to
capture, identify, and disseminate scientific information
 Research Data Alliance (https://rd-alliance.org/)
 “Research Data Sharing without barriers”
 Fran Berman at RPI is NSF funded co-chair of RDA
Big Data
 Scientists need to move to
digital notebooks…
 ...and record not just the data
but the flow and context
 How science is done
is important for searching,
aggregation, meta-analysis
 We need more than an electronic version of a notebook
 We need a science version of “Second Life” (SciLife?)
Electronic Notebooks
 Started in 2006 after getting involved in the
Analytical Information Markup Language (AnIML) project
 Store all research notes/data in a digital format
 Capture the workflow of scientists
 Writing in a lab notebook is equivalent to
“multi-type” blogging in the digital world
 How to capture information? Many data types! (ExptML)
 How to store files “online”? (Fedora-Commons)
 How to access files in the browser? (CakePHP)
 How to represent laboratory resources? (ExptML)
 How to link data together? RDF (in Fedora-Commons)
Eureka Research Workbench
 A specification (written in XML) that describes different
types of information recorded during the scientific process
(http://exptml.sourceforge.net)
Experiment Markup Language (ExptML)
 Sample
 Solution
 Space
 Specimen
 Substance
 Task
 Template
 Timeline
 User
 Vendor
 Annotation
 Api
 Calculation
 Chemical
 Citation
 Customer
 Data
 Dataset
 Definition
 Element
 Equipment
 Event
 Experiment
 Group
 Message
 Project
 Protocol
 Quote
 Report
 Result
ExptML Chemical Schema
ExptML
Chemical
Schema
ExptML Chemical (Instance)
 Data are connected to other data – ‘Linked Data’
(http://www.w3.org/standards/semanticweb/data)
 The ‘Semantic Web’ approach to contextualize data
 Proposed storage of ‘relationships’ between data is the
Resource Description Format (RDF - http://www.w3.org/RDF/)
Semantic Data
 Digital repository software http://fedora-commons.org/
 Creation and management of online digital libraries
 Fedora ‘Digital Object’ consists of metadata + streams
 Metadata stored as Dublin Core (DC stream)
 ExptML file stored as EXPTML stream
 Other files (PDFs, Images, Word etc.) stored as streams
 Relationships stored as RDF (RELS-EXT stream)
 Features: Version control, Checksumming, Archiving
 Built-in search of objects and relationships
 Add-on for file content search (Fedora GSearch)
Fedora Commons
 Fedora-Commons defines and works on digital objects
 In the definition of a Fedora object an ExptML file is just
one stream of many. By default each object also has a
“DC” stream of metadata and an “RELS-EXT” stream of
relationships
 Each Fedora object can have any number of additional
streams for
 Paper PDFs, product/sample pictures,
binary file formats (if a conversion has been done)
 Video, audio, RDF, anything…
 You can export individual streams or the whole Fedora
object with streams binary encoded (Sharing/archiving)
Fedora for File Storage
Fedora Object Storage
 Web interface written in PHP using the CakePHP Framework
 Communicates with Fedora-Commons API to create,
retrieve, update and delete (CRUD) ExptML and other files
 Representational State Transfer (REST) format for URLs
 E.g. http://example.com/chemicals/view/exptml:chm1
 Creation of ExptML via interface
 Provides search via Fedora and Gsearch
 Can extract data out of XML files
 Can gather data from other websites (via API controller)
and integrate into ExptML files
Eureka Web Application
Eureka Website – Group View
Only data types related to the research group show up on left
Eureka Website – Bench View
Clicking on the “Add” menu on the right
allows you add a comment or link to data
Eureka Website – Notebook View
Eureka Website – Laboratory View
The “Rel” menu shows you the information related to this instrument
Eureka Website – Library View
You can add the PDF of the paper to the citation. The contents of the PDF are searchable in the system
Eureka Website – Stockroom View
 Web Application
 Server: Fedora 4, JSON-LD, ElasticSearch
 Client: CakePHP 3/HTML5, Recline.js, Annotator, JQuery
 Standards
 Linked Data Platform (http://www.w3.org/TR/ldp/)
 Datapackage/Simple Data Format (http://dataprotocols.org/)
 Markup Languages: AnIML, UnitsML, CML
 Other Molecular File Formats: MOL/SDF/CDX/CIF/PDB etc.
 Open Framework for Laboratory Data (Allotrope Foundation)
 Datasources
 ChemSpider, CIR, PubChem, Google Scholar, CrossRef, VIVO
 ExchangeNetwork (EPA), NIST, SDBS (no API’s yet)
 Tools
 Marvin for JS, JSXGraph, JSpecView, Chemicalize.org
Eureka Technology Stack
 Implement ingest of all data types, file (if appropriate) and web based
 In browser processing of data -> dataset -> result, report writing
 Extraction of file based legacy data -> ExptML format data
 Open access to data/spectra, ‘available data’ page (browser only)
 Access to data/spectra via linked data server (discovery/indexing)
 Publishing of packaged datasets with authenticated download option
 Automated ingestion of data from instruments/sensors
 Collaborative research: authentication and data exchange
 Timeframe? Depends on securing funding
Eureka Roadmap
 Eureka: Web application to create ExptML files
 Built on ExptML to capture data/resources/workflows
 Reliable storage/archiving system for ExptML files (Fedora)
 Storage of relationships between data (RDF)
 TODO
 Provide mechanism for sharing of data (different levels)
 Add tools to find, visualize and work on science data
 Integration into the RDA model for sharing research data
 Get the word out and test system with many users
Conclusion
References
 Eureka – http://sourceforge.net/projects/eureka
 Fedora-Commons – http://fedora-commons.org
 XML – http://www.w3.org/standards/xml
 AnIML – http://animl.sourceforge.net
 ExptML – http://exptml.sourceforge.net/
 UnitsML – http://unitsml.nist.gov/
 CML – http://www.xml-cml.org/
 JSON-LD – http://www.w3.org/TR/json-ld/
 RDF – http://www.w3.org/RDF/
 CIR – http://cactus.nci.nih.gov/chemical/structure
 RDA – http://rd-alliance.org
 ChemSpider – http://www.chemspider.com/
 Allotrope Foundation – http://allotrope.org

Contenu connexe

Tendances

ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationStuart Chalk
 
Cornell20080516
Cornell20080516Cornell20080516
Cornell20080516charper
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynoteCarole Goble
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...Carole Goble
 
2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1
2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v12016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1
2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1Bruce Kozuma
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectStuart Chalk
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCarole Goble
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout Carole Goble
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Sean Ekins
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
Reproducibility of model-based results: standards, infrastructure, and recogn...
Reproducibility of model-based results: standards, infrastructure, and recogn...Reproducibility of model-based results: standards, infrastructure, and recogn...
Reproducibility of model-based results: standards, infrastructure, and recogn...FAIRDOM
 
ACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility DataACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility DataStuart Chalk
 

Tendances (20)

The Chemtools LaBLog
The Chemtools LaBLogThe Chemtools LaBLog
The Chemtools LaBLog
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
 
Cornell20080516
Cornell20080516Cornell20080516
Cornell20080516
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1
2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v12016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1
2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teams
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...
 
Hosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry dataHosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry data
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
Crosslinks
Crosslinks Crosslinks
Crosslinks
 
ROHub
ROHubROHub
ROHub
 
Reproducibility of model-based results: standards, infrastructure, and recogn...
Reproducibility of model-based results: standards, infrastructure, and recogn...Reproducibility of model-based results: standards, infrastructure, and recogn...
Reproducibility of model-based results: standards, infrastructure, and recogn...
 
ACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility DataACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility Data
 

Similaire à 247th ACS Meeting: The Eureka Research Workbench

Liberating Laboratory Data - Eureka
Liberating Laboratory Data - EurekaLiberating Laboratory Data - Eureka
Liberating Laboratory Data - EurekaStuart Chalk
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesIan Foster
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsCarole Goble
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Apache Tika: 1 point Oh!
Apache Tika: 1 point Oh!Apache Tika: 1 point Oh!
Apache Tika: 1 point Oh!Chris Mattmann
 
A Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemA Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemChris Mattmann
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the partsCarole Goble
 
ACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationStuart Chalk
 
2008 Jun Zhao Eswc
2008 Jun Zhao Eswc2008 Jun Zhao Eswc
2008 Jun Zhao EswcJun Zhao
 
Structured Dynamics' Semantic Technologies Product Stack
Structured Dynamics' Semantic Technologies Product StackStructured Dynamics' Semantic Technologies Product Stack
Structured Dynamics' Semantic Technologies Product StackMike Bergman
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop frameworkTu Pham
 
Flexible Resources In 3 6 And E4
Flexible Resources In 3 6 And E4Flexible Resources In 3 6 And E4
Flexible Resources In 3 6 And E4szbra
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
 
Blogs Logs Pods: Smart Labs
Blogs Logs Pods: Smart LabsBlogs Logs Pods: Smart Labs
Blogs Logs Pods: Smart LabsJeremy Frey
 
Searching Heterogenous E Learning Resources
Searching Heterogenous E Learning ResourcesSearching Heterogenous E Learning Resources
Searching Heterogenous E Learning Resourcesimranlatif
 
Rbms 2011 edwards
Rbms 2011 edwardsRbms 2011 edwards
Rbms 2011 edwardsglynnedw
 

Similaire à 247th ACS Meeting: The Eureka Research Workbench (20)

Liberating Laboratory Data - Eureka
Liberating Laboratory Data - EurekaLiberating Laboratory Data - Eureka
Liberating Laboratory Data - Eureka
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
dotte.ppt
dotte.pptdotte.ppt
dotte.ppt
 
Apache Tika: 1 point Oh!
Apache Tika: 1 point Oh!Apache Tika: 1 point Oh!
Apache Tika: 1 point Oh!
 
Fedora
FedoraFedora
Fedora
 
A Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemA Look into the Apache OODT Ecosystem
A Look into the Apache OODT Ecosystem
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
ACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka Collaboration
 
2008 Jun Zhao Eswc
2008 Jun Zhao Eswc2008 Jun Zhao Eswc
2008 Jun Zhao Eswc
 
Structured Dynamics' Semantic Technologies Product Stack
Structured Dynamics' Semantic Technologies Product StackStructured Dynamics' Semantic Technologies Product Stack
Structured Dynamics' Semantic Technologies Product Stack
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop framework
 
Flexible Resources In 3 6 And E4
Flexible Resources In 3 6 And E4Flexible Resources In 3 6 And E4
Flexible Resources In 3 6 And E4
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
Dspace
DspaceDspace
Dspace
 
Dspace
DspaceDspace
Dspace
 
Blogs Logs Pods: Smart Labs
Blogs Logs Pods: Smart LabsBlogs Logs Pods: Smart Labs
Blogs Logs Pods: Smart Labs
 
Searching Heterogenous E Learning Resources
Searching Heterogenous E Learning ResourcesSearching Heterogenous E Learning Resources
Searching Heterogenous E Learning Resources
 
Rbms 2011 edwards
Rbms 2011 edwardsRbms 2011 edwards
Rbms 2011 edwards
 

Plus de Stuart Chalk

Semantic properties and units
Semantic properties and unitsSemantic properties and units
Semantic properties and unitsStuart Chalk
 
Open semantic chemical structures
Open semantic chemical structuresOpen semantic chemical structures
Open semantic chemical structuresStuart Chalk
 
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...Stuart Chalk
 
AnIML: A New Analytical Data Standard
AnIML: A New Analytical Data StandardAnIML: A New Analytical Data Standard
AnIML: A New Analytical Data StandardStuart Chalk
 
The Electronic Notebook Ontology
The Electronic Notebook OntologyThe Electronic Notebook Ontology
The Electronic Notebook OntologyStuart Chalk
 
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataSharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataStuart Chalk
 
Bringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic WebBringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic WebStuart Chalk
 
Reactions to the Open Spectral Database
Reactions to the Open Spectral DatabaseReactions to the Open Spectral Database
Reactions to the Open Spectral DatabaseStuart Chalk
 
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Stuart Chalk
 
Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectStuart Chalk
 
A Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXA Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXStuart Chalk
 
Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)Stuart Chalk
 
ACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData ProjectACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData ProjectStuart Chalk
 
247th ACS Meeting: Experiment Markup Language (ExptML)
247th ACS Meeting: Experiment Markup Language (ExptML)247th ACS Meeting: Experiment Markup Language (ExptML)
247th ACS Meeting: Experiment Markup Language (ExptML)Stuart Chalk
 
Liberating Laboratory Data - AnIML
Liberating Laboratory Data - AnIMLLiberating Laboratory Data - AnIML
Liberating Laboratory Data - AnIMLStuart Chalk
 

Plus de Stuart Chalk (15)

Semantic properties and units
Semantic properties and unitsSemantic properties and units
Semantic properties and units
 
Open semantic chemical structures
Open semantic chemical structuresOpen semantic chemical structures
Open semantic chemical structures
 
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
 
AnIML: A New Analytical Data Standard
AnIML: A New Analytical Data StandardAnIML: A New Analytical Data Standard
AnIML: A New Analytical Data Standard
 
The Electronic Notebook Ontology
The Electronic Notebook OntologyThe Electronic Notebook Ontology
The Electronic Notebook Ontology
 
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataSharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
 
Bringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic WebBringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic Web
 
Reactions to the Open Spectral Database
Reactions to the Open Spectral DatabaseReactions to the Open Spectral Database
Reactions to the Open Spectral Database
 
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
 
Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP Project
 
A Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXA Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSX
 
Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)
 
ACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData ProjectACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData Project
 
247th ACS Meeting: Experiment Markup Language (ExptML)
247th ACS Meeting: Experiment Markup Language (ExptML)247th ACS Meeting: Experiment Markup Language (ExptML)
247th ACS Meeting: Experiment Markup Language (ExptML)
 
Liberating Laboratory Data - AnIML
Liberating Laboratory Data - AnIMLLiberating Laboratory Data - AnIML
Liberating Laboratory Data - AnIML
 

Dernier

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Dernier (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

247th ACS Meeting: The Eureka Research Workbench

  • 1. Eureka Research Workbench: An Open Source eScience Laboratory Notebook Stuart J. Chalk Department of Chemistry University of North Florida schalk@unf.edu 2014 Spring ACS Meeting – CINF Paper 38
  • 2.  Big Data  Electronic Notebooks  The Eureka Research Workbench  Experiment Markup Language  ExptML Schema and Files  Semantic Data and Ontologies  File Storage  Eureka Interface  Web Interface  Conclusion Outline
  • 3.  Current buzz word for “this bring together lots of data and build tools on top to extract knowledge”  This is great, except…  …how do we do that for science?  Platform, data structures, and exchange protocols to capture, identify, and disseminate scientific information  Research Data Alliance (https://rd-alliance.org/)  “Research Data Sharing without barriers”  Fran Berman at RPI is NSF funded co-chair of RDA Big Data
  • 4.  Scientists need to move to digital notebooks…  ...and record not just the data but the flow and context  How science is done is important for searching, aggregation, meta-analysis  We need more than an electronic version of a notebook  We need a science version of “Second Life” (SciLife?) Electronic Notebooks
  • 5.  Started in 2006 after getting involved in the Analytical Information Markup Language (AnIML) project  Store all research notes/data in a digital format  Capture the workflow of scientists  Writing in a lab notebook is equivalent to “multi-type” blogging in the digital world  How to capture information? Many data types! (ExptML)  How to store files “online”? (Fedora-Commons)  How to access files in the browser? (CakePHP)  How to represent laboratory resources? (ExptML)  How to link data together? RDF (in Fedora-Commons) Eureka Research Workbench
  • 6.  A specification (written in XML) that describes different types of information recorded during the scientific process (http://exptml.sourceforge.net) Experiment Markup Language (ExptML)  Sample  Solution  Space  Specimen  Substance  Task  Template  Timeline  User  Vendor  Annotation  Api  Calculation  Chemical  Citation  Customer  Data  Dataset  Definition  Element  Equipment  Event  Experiment  Group  Message  Project  Protocol  Quote  Report  Result
  • 10.  Data are connected to other data – ‘Linked Data’ (http://www.w3.org/standards/semanticweb/data)  The ‘Semantic Web’ approach to contextualize data  Proposed storage of ‘relationships’ between data is the Resource Description Format (RDF - http://www.w3.org/RDF/) Semantic Data
  • 11.  Digital repository software http://fedora-commons.org/  Creation and management of online digital libraries  Fedora ‘Digital Object’ consists of metadata + streams  Metadata stored as Dublin Core (DC stream)  ExptML file stored as EXPTML stream  Other files (PDFs, Images, Word etc.) stored as streams  Relationships stored as RDF (RELS-EXT stream)  Features: Version control, Checksumming, Archiving  Built-in search of objects and relationships  Add-on for file content search (Fedora GSearch) Fedora Commons
  • 12.  Fedora-Commons defines and works on digital objects  In the definition of a Fedora object an ExptML file is just one stream of many. By default each object also has a “DC” stream of metadata and an “RELS-EXT” stream of relationships  Each Fedora object can have any number of additional streams for  Paper PDFs, product/sample pictures, binary file formats (if a conversion has been done)  Video, audio, RDF, anything…  You can export individual streams or the whole Fedora object with streams binary encoded (Sharing/archiving) Fedora for File Storage
  • 14.  Web interface written in PHP using the CakePHP Framework  Communicates with Fedora-Commons API to create, retrieve, update and delete (CRUD) ExptML and other files  Representational State Transfer (REST) format for URLs  E.g. http://example.com/chemicals/view/exptml:chm1  Creation of ExptML via interface  Provides search via Fedora and Gsearch  Can extract data out of XML files  Can gather data from other websites (via API controller) and integrate into ExptML files Eureka Web Application
  • 15. Eureka Website – Group View Only data types related to the research group show up on left
  • 16. Eureka Website – Bench View Clicking on the “Add” menu on the right allows you add a comment or link to data
  • 17. Eureka Website – Notebook View
  • 18. Eureka Website – Laboratory View The “Rel” menu shows you the information related to this instrument
  • 19. Eureka Website – Library View You can add the PDF of the paper to the citation. The contents of the PDF are searchable in the system
  • 20. Eureka Website – Stockroom View
  • 21.  Web Application  Server: Fedora 4, JSON-LD, ElasticSearch  Client: CakePHP 3/HTML5, Recline.js, Annotator, JQuery  Standards  Linked Data Platform (http://www.w3.org/TR/ldp/)  Datapackage/Simple Data Format (http://dataprotocols.org/)  Markup Languages: AnIML, UnitsML, CML  Other Molecular File Formats: MOL/SDF/CDX/CIF/PDB etc.  Open Framework for Laboratory Data (Allotrope Foundation)  Datasources  ChemSpider, CIR, PubChem, Google Scholar, CrossRef, VIVO  ExchangeNetwork (EPA), NIST, SDBS (no API’s yet)  Tools  Marvin for JS, JSXGraph, JSpecView, Chemicalize.org Eureka Technology Stack
  • 22.  Implement ingest of all data types, file (if appropriate) and web based  In browser processing of data -> dataset -> result, report writing  Extraction of file based legacy data -> ExptML format data  Open access to data/spectra, ‘available data’ page (browser only)  Access to data/spectra via linked data server (discovery/indexing)  Publishing of packaged datasets with authenticated download option  Automated ingestion of data from instruments/sensors  Collaborative research: authentication and data exchange  Timeframe? Depends on securing funding Eureka Roadmap
  • 23.  Eureka: Web application to create ExptML files  Built on ExptML to capture data/resources/workflows  Reliable storage/archiving system for ExptML files (Fedora)  Storage of relationships between data (RDF)  TODO  Provide mechanism for sharing of data (different levels)  Add tools to find, visualize and work on science data  Integration into the RDA model for sharing research data  Get the word out and test system with many users Conclusion
  • 24. References  Eureka – http://sourceforge.net/projects/eureka  Fedora-Commons – http://fedora-commons.org  XML – http://www.w3.org/standards/xml  AnIML – http://animl.sourceforge.net  ExptML – http://exptml.sourceforge.net/  UnitsML – http://unitsml.nist.gov/  CML – http://www.xml-cml.org/  JSON-LD – http://www.w3.org/TR/json-ld/  RDF – http://www.w3.org/RDF/  CIR – http://cactus.nci.nih.gov/chemical/structure  RDA – http://rd-alliance.org  ChemSpider – http://www.chemspider.com/  Allotrope Foundation – http://allotrope.org