SlideShare a Scribd company logo
1 of 25
Integration of oreChemwith the eCrystals repository for crystal structures Mark Borkum, Simon Coles and Jeremy Frey15 September 2010
Overview Motivation Implementation Discussion and Summary 2
Current Practice in Crystallography Crystallography data is highly structured The de facto standard adopted by the community is the CIF (Crystallographic Information File) Relatively few crystal structures are openly published 3 http://www.rin.ac.uk/our-work/data-management-and-curation/share-or-not-share-research-data-outputs
Open Access Journals Advantages: Rapid publication Highly cited Data is available to download Disadvantages: Electronic only Not all data is of primary importance to the underlying chemistry By-products, unexpected results, tracking reactions, etc. 4
Crystallography and Fraud 5
The eCrystals Federation JISC project to establish a network of crystallography resources on the Internet, with metadata that is harvested by a number of aggregation services Led by the UK National Crystallography Service (NCS) With core partners at UKOLN, the Digital Curation Centre, and the Unilever Centre for Molecular Science Informatics 6
eCrystals – University of Southampton Located @ http://ecrystals.chem.soton.ac.uk Archive for crystal structures that are generated by: Southampton Chemical Crystallography Group UK National Crystallography Service (NCS) Modified version of EPrints 3.1 OAI-PMH compliant Extensible platform (with plug-ins architecture) 7
What is an eCrystal? “all the fundamental and derived data resulting from a single crystal X-ray structure determination” “the information supplied should enable any reader to check the reliability and validity” 8 http://www.ukoln.ac.uk/projects/ebank-uk/images/collage-web.gif
The Scientific Web 9
The Data Deluge 10 In Haiku: Lots of producers;Generating more datathan ever before. 40 years ago, a PhD student would determine 3 structures over the entire course of their study! The Great Wave off Kanagawa by Katsushika Hokusai
Provenance The 7 W’s [Goble 2002] Who, What, Where,  Why, When, Which, & (W)How The Why aspect is usually ignored  Rational, intent, hypothesis, protocol, methodology, workflow, etc. 11 “Diana and Actaeon by Titian has a full provenance covering its passage through several owners and four countries since it was painted for Philip II of Spain in the 1550s.” Source: http://en.wikipedia.org/wiki/Diana_and_Actaeon_%28Titian%29
“In theory, there is no difference between theory and practice.But, in practice, there is.” Unknown (possibly Yogi Berra) 12
Why “Why” Matters It is the reason for the data’s existence It gives us the ability to interpret the data in the correct context It allows us to align the data with the big picture 13 http://www.myexperiment.org/workflows/16.html
The oreChem Core Ontology Describes three concepts: The methodology (planned method) of a scientific experiment The enactment of methodologies The provenance of realised artefacts 14
Methodology (Planned Method) The “plan” is modelled as a directed graph Two node types: Plan Stagedescription of an activity that will be enacted Plan Objectdescription of an artefact that will be realised 15
Enactment (of a Methodology) Each “run” (of a plan) is modelled as a directed graph  Two node types: Stagedescription of an activity that has been enacted Objectdescription of an artefact that has been realised 16
Provenance Prospective The plan describes a scientific experiment that will be enacted Retrospective The run describes a scientific experiment that hasbeen enacted Every ‘run thing’ is linked to exactly one ‘plan thing’ 17
oreChem Plug-in for eCrystals Three components: orechem:Plan (the eCrystals methodology)  “eCrystalorechem:Run” mapping  “orechem:Run provenance graph” pipeline 18
The eCrystals Methodology 19 Before After
Example: eCrystal #643 Before After 20
SPARQL Request PREFIX orechem:   <http://www.openarchives.org/2010/05/24-orechem-ns#> PREFIX ecrystals: <http://ecrystals.chem.soton.ac.uk/plan.rdf#>SELECT ?run ?raw ?derived ?reported WHERE {   ?run a orechem:Run ; orechem:hasPlanecrystals:Ecrystals ; orechem:containsObject ?raw ; orechem:containsObject ?derived ; orechem:containsObject ?reported .   ?raw a orechem:File ; orechem:hasPlanObjectecrystals:HKL .   ?derived a orechem:File ; orechem:derivedFrom ?raw .   ?reported a orechem:File ; orechem:hasPlanObjectecrystals:CIF ; orechem:derivedFrom ?derived . } 21
SPARQL Response (for eCrystal #643) 22 ?run ?reported ?derived ?raw
Summary <summary/> 23
Acknowledgments oreChem is funded by Microsoft External Research eCrystals is funded by both EPSRC and JISC The oreChem project team: Nico Adams, Mark Borkum, William Brouwer, RameswaraSashiKiranChalla, Simon Coles, Nick Day, Jim Downing, Jeremy Frey, C. Lee Giles, Carl Lagoze (PI), Na Li, PrasenjitMitra, Karl Meuller, Peter Murray-Rust, Marlon Pierce, Joe Townsend, and Theresa Velden. 24
25 #ahm2010 #ahm #ahm10 #pch2010 http://pegasus.chem.soton.ac.uk #ahm2010 until 11am Wed 15 Sept 2010

More Related Content

Viewers also liked

The Power Of Multiplication
The Power Of MultiplicationThe Power Of Multiplication
The Power Of Multiplicationfrank tan
 
Soo presentation
Soo presentationSoo presentation
Soo presentationfrank tan
 
FAS: Shop2market over Conversie Attributie
FAS: Shop2market over Conversie AttributieFAS: Shop2market over Conversie Attributie
FAS: Shop2market over Conversie AttributieTjitte Folkertsma
 
New Excited Info
New Excited InfoNew Excited Info
New Excited Infofrank tan
 

Viewers also liked (8)

Change
ChangeChange
Change
 
新年
新年新年
新年
 
The Power Of Multiplication
The Power Of MultiplicationThe Power Of Multiplication
The Power Of Multiplication
 
Soo presentation
Soo presentationSoo presentation
Soo presentation
 
Presentatie webrichtlijnen
Presentatie webrichtlijnenPresentatie webrichtlijnen
Presentatie webrichtlijnen
 
FAS: Shop2market over Conversie Attributie
FAS: Shop2market over Conversie AttributieFAS: Shop2market over Conversie Attributie
FAS: Shop2market over Conversie Attributie
 
Peter Sinnige - webvideo
Peter Sinnige - webvideoPeter Sinnige - webvideo
Peter Sinnige - webvideo
 
New Excited Info
New Excited InfoNew Excited Info
New Excited Info
 

Similar to Integration of oreChem with the eCrystals repository for crystal structures

The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals FederationManjulaPatel
 
Datasalon6 2011 - "Rise of the robo scientists": where is data coming from?
Datasalon6 2011 - "Rise of the robo scientists": where is data coming from?Datasalon6 2011 - "Rise of the robo scientists": where is data coming from?
Datasalon6 2011 - "Rise of the robo scientists": where is data coming from?Pieter Pauwels
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Paragon_Science_Inc
 
Benefits and practice of open science
Benefits and practice of open scienceBenefits and practice of open science
Benefits and practice of open scienceSarah Jones
 
Understanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceUnderstanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceAndrew Sallans
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemHerbert Van de Sompel
 
The Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresThe Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresguest0dc425
 
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...EOSC-hub project
 
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC
 
On chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsOn chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsNina Jeliazkova
 
Showcasing research data tools
Showcasing research data toolsShowcasing research data tools
Showcasing research data toolsJisc RDM
 
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...Carole Goble
 
Cyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean ObservatoriesCyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean ObservatoriesLarry Smarr
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair" OpenAIRE
 
Berlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyBerlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyCornelius Puschmann
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
Showcasing research data tools - Jisc Digifest 2016
Showcasing research data tools - Jisc Digifest 2016Showcasing research data tools - Jisc Digifest 2016
Showcasing research data tools - Jisc Digifest 2016Jisc
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themRoss Mounce
 

Similar to Integration of oreChem with the eCrystals repository for crystal structures (20)

The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals Federation
 
Datasalon6 2011 - "Rise of the robo scientists": where is data coming from?
Datasalon6 2011 - "Rise of the robo scientists": where is data coming from?Datasalon6 2011 - "Rise of the robo scientists": where is data coming from?
Datasalon6 2011 - "Rise of the robo scientists": where is data coming from?
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
 
Benefits and practice of open science
Benefits and practice of open scienceBenefits and practice of open science
Benefits and practice of open science
 
Understanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceUnderstanding the Big Picture of e-Science
Understanding the Big Picture of e-Science
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication System
 
The Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresThe Developing Needs for e-infrastructures
The Developing Needs for e-infrastructures
 
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
 
Perx and TechXtra
Perx and TechXtraPerx and TechXtra
Perx and TechXtra
 
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
 
On chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsOn chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurements
 
Showcasing research data tools
Showcasing research data toolsShowcasing research data tools
Showcasing research data tools
 
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
 
Cyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean ObservatoriesCyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean Observatories
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
 
Berlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyBerlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony Hey
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Showcasing research data tools - Jisc Digifest 2016
Showcasing research data tools - Jisc Digifest 2016Showcasing research data tools - Jisc Digifest 2016
Showcasing research data tools - Jisc Digifest 2016
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 

Recently uploaded

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Recently uploaded (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Integration of oreChem with the eCrystals repository for crystal structures

  • 1. Integration of oreChemwith the eCrystals repository for crystal structures Mark Borkum, Simon Coles and Jeremy Frey15 September 2010
  • 2. Overview Motivation Implementation Discussion and Summary 2
  • 3. Current Practice in Crystallography Crystallography data is highly structured The de facto standard adopted by the community is the CIF (Crystallographic Information File) Relatively few crystal structures are openly published 3 http://www.rin.ac.uk/our-work/data-management-and-curation/share-or-not-share-research-data-outputs
  • 4. Open Access Journals Advantages: Rapid publication Highly cited Data is available to download Disadvantages: Electronic only Not all data is of primary importance to the underlying chemistry By-products, unexpected results, tracking reactions, etc. 4
  • 6. The eCrystals Federation JISC project to establish a network of crystallography resources on the Internet, with metadata that is harvested by a number of aggregation services Led by the UK National Crystallography Service (NCS) With core partners at UKOLN, the Digital Curation Centre, and the Unilever Centre for Molecular Science Informatics 6
  • 7. eCrystals – University of Southampton Located @ http://ecrystals.chem.soton.ac.uk Archive for crystal structures that are generated by: Southampton Chemical Crystallography Group UK National Crystallography Service (NCS) Modified version of EPrints 3.1 OAI-PMH compliant Extensible platform (with plug-ins architecture) 7
  • 8. What is an eCrystal? “all the fundamental and derived data resulting from a single crystal X-ray structure determination” “the information supplied should enable any reader to check the reliability and validity” 8 http://www.ukoln.ac.uk/projects/ebank-uk/images/collage-web.gif
  • 10. The Data Deluge 10 In Haiku: Lots of producers;Generating more datathan ever before. 40 years ago, a PhD student would determine 3 structures over the entire course of their study! The Great Wave off Kanagawa by Katsushika Hokusai
  • 11. Provenance The 7 W’s [Goble 2002] Who, What, Where, Why, When, Which, & (W)How The Why aspect is usually ignored  Rational, intent, hypothesis, protocol, methodology, workflow, etc. 11 “Diana and Actaeon by Titian has a full provenance covering its passage through several owners and four countries since it was painted for Philip II of Spain in the 1550s.” Source: http://en.wikipedia.org/wiki/Diana_and_Actaeon_%28Titian%29
  • 12. “In theory, there is no difference between theory and practice.But, in practice, there is.” Unknown (possibly Yogi Berra) 12
  • 13. Why “Why” Matters It is the reason for the data’s existence It gives us the ability to interpret the data in the correct context It allows us to align the data with the big picture 13 http://www.myexperiment.org/workflows/16.html
  • 14. The oreChem Core Ontology Describes three concepts: The methodology (planned method) of a scientific experiment The enactment of methodologies The provenance of realised artefacts 14
  • 15. Methodology (Planned Method) The “plan” is modelled as a directed graph Two node types: Plan Stagedescription of an activity that will be enacted Plan Objectdescription of an artefact that will be realised 15
  • 16. Enactment (of a Methodology) Each “run” (of a plan) is modelled as a directed graph Two node types: Stagedescription of an activity that has been enacted Objectdescription of an artefact that has been realised 16
  • 17. Provenance Prospective The plan describes a scientific experiment that will be enacted Retrospective The run describes a scientific experiment that hasbeen enacted Every ‘run thing’ is linked to exactly one ‘plan thing’ 17
  • 18. oreChem Plug-in for eCrystals Three components: orechem:Plan (the eCrystals methodology) “eCrystalorechem:Run” mapping “orechem:Run provenance graph” pipeline 18
  • 19. The eCrystals Methodology 19 Before After
  • 20. Example: eCrystal #643 Before After 20
  • 21. SPARQL Request PREFIX orechem: <http://www.openarchives.org/2010/05/24-orechem-ns#> PREFIX ecrystals: <http://ecrystals.chem.soton.ac.uk/plan.rdf#>SELECT ?run ?raw ?derived ?reported WHERE { ?run a orechem:Run ; orechem:hasPlanecrystals:Ecrystals ; orechem:containsObject ?raw ; orechem:containsObject ?derived ; orechem:containsObject ?reported . ?raw a orechem:File ; orechem:hasPlanObjectecrystals:HKL . ?derived a orechem:File ; orechem:derivedFrom ?raw . ?reported a orechem:File ; orechem:hasPlanObjectecrystals:CIF ; orechem:derivedFrom ?derived . } 21
  • 22. SPARQL Response (for eCrystal #643) 22 ?run ?reported ?derived ?raw
  • 24. Acknowledgments oreChem is funded by Microsoft External Research eCrystals is funded by both EPSRC and JISC The oreChem project team: Nico Adams, Mark Borkum, William Brouwer, RameswaraSashiKiranChalla, Simon Coles, Nick Day, Jim Downing, Jeremy Frey, C. Lee Giles, Carl Lagoze (PI), Na Li, PrasenjitMitra, Karl Meuller, Peter Murray-Rust, Marlon Pierce, Joe Townsend, and Theresa Velden. 24
  • 25. 25 #ahm2010 #ahm #ahm10 #pch2010 http://pegasus.chem.soton.ac.uk #ahm2010 until 11am Wed 15 Sept 2010