3. Current Practice in Crystallography Crystallography data is highly structured The de facto standard adopted by the community is the CIF (Crystallographic Information File) Relatively few crystal structures are openly published 3 http://www.rin.ac.uk/our-work/data-management-and-curation/share-or-not-share-research-data-outputs
4. Open Access Journals Advantages: Rapid publication Highly cited Data is available to download Disadvantages: Electronic only Not all data is of primary importance to the underlying chemistry By-products, unexpected results, tracking reactions, etc. 4
6. The eCrystals Federation JISC project to establish a network of crystallography resources on the Internet, with metadata that is harvested by a number of aggregation services Led by the UK National Crystallography Service (NCS) With core partners at UKOLN, the Digital Curation Centre, and the Unilever Centre for Molecular Science Informatics 6
7. eCrystals – University of Southampton Located @ http://ecrystals.chem.soton.ac.uk Archive for crystal structures that are generated by: Southampton Chemical Crystallography Group UK National Crystallography Service (NCS) Modified version of EPrints 3.1 OAI-PMH compliant Extensible platform (with plug-ins architecture) 7
8. What is an eCrystal? “all the fundamental and derived data resulting from a single crystal X-ray structure determination” “the information supplied should enable any reader to check the reliability and validity” 8 http://www.ukoln.ac.uk/projects/ebank-uk/images/collage-web.gif
10. The Data Deluge 10 In Haiku: Lots of producers;Generating more datathan ever before. 40 years ago, a PhD student would determine 3 structures over the entire course of their study! The Great Wave off Kanagawa by Katsushika Hokusai
11. Provenance The 7 W’s [Goble 2002] Who, What, Where, Why, When, Which, & (W)How The Why aspect is usually ignored Rational, intent, hypothesis, protocol, methodology, workflow, etc. 11 “Diana and Actaeon by Titian has a full provenance covering its passage through several owners and four countries since it was painted for Philip II of Spain in the 1550s.” Source: http://en.wikipedia.org/wiki/Diana_and_Actaeon_%28Titian%29
12. “In theory, there is no difference between theory and practice.But, in practice, there is.” Unknown (possibly Yogi Berra) 12
13. Why “Why” Matters It is the reason for the data’s existence It gives us the ability to interpret the data in the correct context It allows us to align the data with the big picture 13 http://www.myexperiment.org/workflows/16.html
14. The oreChem Core Ontology Describes three concepts: The methodology (planned method) of a scientific experiment The enactment of methodologies The provenance of realised artefacts 14
15. Methodology (Planned Method) The “plan” is modelled as a directed graph Two node types: Plan Stagedescription of an activity that will be enacted Plan Objectdescription of an artefact that will be realised 15
16. Enactment (of a Methodology) Each “run” (of a plan) is modelled as a directed graph Two node types: Stagedescription of an activity that has been enacted Objectdescription of an artefact that has been realised 16
17. Provenance Prospective The plan describes a scientific experiment that will be enacted Retrospective The run describes a scientific experiment that hasbeen enacted Every ‘run thing’ is linked to exactly one ‘plan thing’ 17
18. oreChem Plug-in for eCrystals Three components: orechem:Plan (the eCrystals methodology) “eCrystalorechem:Run” mapping “orechem:Run provenance graph” pipeline 18
24. Acknowledgments oreChem is funded by Microsoft External Research eCrystals is funded by both EPSRC and JISC The oreChem project team: Nico Adams, Mark Borkum, William Brouwer, RameswaraSashiKiranChalla, Simon Coles, Nick Day, Jim Downing, Jeremy Frey, C. Lee Giles, Carl Lagoze (PI), Na Li, PrasenjitMitra, Karl Meuller, Peter Murray-Rust, Marlon Pierce, Joe Townsend, and Theresa Velden. 24