SlideShare a Scribd company logo
1 of 51
Download to read offline
a centre of expertise in data curation and preservation




Create or Receive Scientific data

      Dr. Frank Gibson and Dr. Phillip Lord
                               Frank.Gibson@newcastle.ac.uk
                                Phillip.Lord@newcastle.ac.uk


                                                                                                         Funded by:
    This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK:
    Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-
    sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San
    Francisco, California, 94105, USA.


 Digital Curation 101, October 6th-10th, 2008, NeSC, Edinburgh
a centre of expertise in data curation and preservation




               “In the standard
               model, one collects
               data, publishes a
               paper or papers and
               then gradually loses
               the original dataset.”

               - Geoffrey Bowker



Create or Receive
a centre of expertise in data curation and preservation




                                                                  Create or Receive
Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
a centre of expertise in data curation and preservation




                                                                  Create or Receive
Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
a centre of expertise in data curation and preservation




                                                                  Create or Receive
Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
a centre of expertise in data curation and preservation




                                                                  Create or Receive
Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
If we have a paper                    a centre of expertise in data curation and preservation




who cares about the data?




                                  Create or Receive
http://flickr.com/photos/nicmcphee/2756494307/
a centre of expertise in data curation and preservation




                                          A paper = a claim (or claims)

                      The full record that supports that
                   claim should be available for detailed
                         examination and critique




                                                                  Create or Receive
Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
a centre of expertise in data curation and preservation




                                                                  Create or Receive
Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
a centre of expertise in data curation and preservation




                    1000+
                Databases




Create or Receive
Biocuration: Databases
            a centre of expertise in data curation and preservation




      Create or Receive
Biocuration: Wiki
          a centre of expertise in data curation and preservation




    Create or Receive
a centre of expertise in data curation and preservation




                                                                  Create or Receive
Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
a centre of expertise in data curation and preservation




Create or Receive
Funders
                                                   a centre of expertise in data curation and preservation




http://flickr.com/photos/luismimunoznajar/2093185804/or
                                            Create        Receive
a centre of expertise in data curation and preservation




                             Create
                               or
                             Receive




Create or Receive
a centre of expertise in data curation and preservation
Curation aims

           Amenable
           Preservable
           Ownable
           Accessible
           Citable


                Create or Receive
a centre of expertise in data curation and preservation

Significant Properties of Data


                          Content

                          Syntax

                          Semantics

          Create or Receive
a centre of expertise in data curation and preservation




Content




  Create or Receive
a centre of expertise in data curation and preservation
                                                  Publisher


Type


                                              Title


                                     Creator

       Source             Identifier

   Date


                                Rights




          Create or Receive
Simple Dublin Core     a centre of expertise in data curation and preservation




                                                Type
                                              Format
   Title
                                            Identifier
   Creator
                                              Source
   Subject
                                           Language
   Description
                                             Relation
   Publisher
                                           Coverage
   Contributor
                                               Rights
   Date




                 Create or Receive
a centre of expertise in data curation and preservation




Content:
Domain Specific

             Create or Receive
a centre of expertise in data curation and preservation


Syntax




         Create or Receive
a centre of expertise in data curation and preservation




Create or Receive
a centre of expertise in data curation and preservation




        Choosing a Syntax
• Openness
   • -is there an open, publicly available specification for the
     format; are its specifications in the public domain; is it
     unencrypted?
• Portability
   • -is the format independent of hardware, operating system, of
     other software; is it independent of particular institutions,
     groups, or events; is it in widespread current use; does it
     contain little or no built-in functionality?
• Quality
   • -is it robust; simple; highly tested; loss-free?


                         Create or Receive
a centre of expertise in data curation and preservation


Semantics




            Create or Receive
a centre of expertise in data curation and preservation




   Semantics can be complex

One semantic = many words
Many words = one semantic




                  Create or Receive
a centre of expertise in data curation and preservation




          • Excel data example – do I need it?




                                                                          Create or Receive
•Zeeberg et al. BMC Bioinformatics 2004 5:80 doi:10.1186/1471-2105-5-80                             •Zeeberg et al. BMC Bioinformatics 2004 5:80 doi:10.1186/1471-2105-5-80
What is fly?      a centre of expertise in data curation and preservation




                                                                                                                                                                                    •Fly
•Fly

                                                                                                   •http://en.wikipedia.org/wiki/Image:Air_india_b747-400_vt-esn_arp.jpg

                      •http://en.wikipedia.org/wiki/Image:MuscuDomestica.jpg




                                                                                                                                                                                    •Fly
•Fly

       •http://en.wikipedia.org/wiki/Image:Green_Highlander_salmon_fly.jpg




                                                                                                                               •http://en.wikipedia.org/wiki/Image:Fly_poster.jpg




                                                                               Create or Receive
a centre of expertise in data curation and preservation




             Ontology
• A controlled vocabulary is an association
  between formal names (identifiers) and their
  definitions.
• An ontology is a controlled vocabulary
  augmented with logical constraints that
  describe their interrelationships.




                 Create or Receive
a centre of expertise in data curation and preservation




 Ontologies for Life science
• Emergence has occurred for two reasons
• Consistent annotation of data
• To add meaning and understanding that can
  be interpreted computationally
• Bio-ontologies registered on the OBO foundry




                Create or Receive
a centre of expertise in data curation and preservation




Application of
Significant Properties
In
Proteomics


          Create or Receive
a centre of expertise in data curation and preservation



 Minimum Information about a
Proteomics Experiment (MIAPE)
•       Sufficiency.
    •     The MIAPE guidelines should require sufficient information about
          a dataset and its experimental context to allow a reader to
          understand and critically evaluate the interpretation and
          conclusions, and to support their experimental corroboration.

•       Practicability.
    •     Achieving compliance with MIAPE should not be so burdensome
          as to prohibit its widespread use.




                           Create or Receive
a centre of expertise in data curation and preservation




Create or Receive
a centre of expertise in data curation and preservation




Minimum reporting guidelines
                       • Describe content
                       • Implementation
                         independent

                       • Impacts
                            • Publication
                            • Syntax
                            • Semantics




            Create or Receive
a centre of expertise in data curation and preservation




     Syntax for proteomics
• The content in MIAPE GE needs to be structured to
  facilitate
   • dissemination
   • transfer
   • storage
• A community development process to agree on a
  syntax
   • building upon the FuGE data model
   • A pre-existing community developed representation of
     scientific experiments
   • Interoperable



                      Create or Receive
a centre of expertise in data curation and preservation




                        FuGE
•   Model of common components in science investigations, such
    as materials, data, protocols, equipment and software.
•   Provides a framework for capturing complete laboratory
    workflows, enabling the integration of pre-existing data
    formats.




                         Create or Receive
a centre of expertise in data curation and preservation




         UML/XML/RDBMS
• UML gives structure (but not syntax)
   • Very abstract, very general
• XML provides a concrete syntax
   • Meta language is interoperable, checkable, viable and has
     basic metadata support (language, character coding and so
     on).
   • Tends toward the verbose. Not (very) searchable for itself.
   • Therefore, transfer and archive format.
• RDBMS
   •   SQL is (sort of) a standard
   •   Highly computationally amenable form; v. good for searching
   •   Conversion from XML is possible, but in a number of ways.
   •   Hard work – nice to have an off-the-shelf implementation.


                        Create or Receive
GelMLa centre of expertise in data curation and preservation




Create or Receive
a centre of expertise in data curation and preservation



Semantics
   for
  Gels




            Create or Receive
Semantics for science
            a centre of expertise in data curation and preservation




      Create or Receive
a centre of expertise in data curation and preservation




Curation of Gel experiments
                                                       Public
  Laboratory      Data entry and transfer              repositories


                 I) GelML data entry tools

                                              GelML



   MAIPE
    GE                II) Direct database submission




               III) Automated export of GelInfoML

 MAIPE
  GI
                          sepCV




                Create or Receive
Discoverability and reuse
              a centre of expertise in data curation and preservation




                           •Persistent Identifiers
                           •Rights management




        Create or Receive
a centre of expertise in data curation and preservation




      Persistent Identifiers
• a name for a resource which will remain the same
  regardless of where the resource is located
• In biology typically assigned to data upon publication
• Type of identifier dependent on publication method

• Description and Representation Information provides
  more information about persistent identifiers




                    Create or Receive
a centre of expertise in data curation and preservation




   Rights management
                          • Difficult to determine
                          • Lots of legal issues
                          • In biology/bioinformatics
                            tends to be open
                            access




•Creative commons
               Create or Receive
Receiving data for curation
                      a centre of expertise in data curation and preservation




                                                Content
                                                Syntax
                                                Semantics
                Create or Receive
Who will receive it?                              Route map
                             a centre of expertise in data curation and preservation



What are their policies on:
                  Route map
 Content, Syntax, Semantics

Plan your experiment to conform to
 Content, Syntax, Semantics

Implement experiment to;
 Collect appropriate Content
 Structure in appropriate Syntax
 Ensure Semantics are preserved

Curate…
                       Create or Receive
a centre of expertise in data curation and preservation




        Meta Route Map
• How to build the map if you don’t have one
  yet.




                 Create or Receive
a centre of expertise in data curation and preservation




     Appraise and Select
• Investigates the evaluation and selection of
  data for longterm curation and preservation




                 Create or Receive
a centre of expertise in data curation and preservation




       Acknowledgments
• The CARMEN project
  • www.carmen.org.uk
• The Proteomics Standards Initiative (PSI)
  • http://psidev.info
• Colleagues at Newcastle University
  • Phillip Lord, Anil Wipat, Allyson Lister




                    Create or Receive
a centre of expertise in data curation and preservation




Create or Receive

More Related Content

Similar to Create and recieve scientific data

Why 2015 is the Year of Copy Data - What are the requirements?
Why 2015 is the Year of Copy Data - What are the requirements?Why 2015 is the Year of Copy Data - What are the requirements?
Why 2015 is the Year of Copy Data - What are the requirements?Storage Switzerland
 
Ensuring data quality with lakeFS
Ensuring data quality with lakeFSEnsuring data quality with lakeFS
Ensuring data quality with lakeFSPaul Singman
 
Webinar: Which Storage Architecture is Best for Splunk Analytics?
Webinar: Which Storage Architecture is Best for Splunk Analytics?Webinar: Which Storage Architecture is Best for Splunk Analytics?
Webinar: Which Storage Architecture is Best for Splunk Analytics?Storage Switzerland
 
Overview of SharePoint Server 2016 and Office 365 Hybrid Scenarios
Overview of SharePoint Server 2016 and Office 365 Hybrid ScenariosOverview of SharePoint Server 2016 and Office 365 Hybrid Scenarios
Overview of SharePoint Server 2016 and Office 365 Hybrid ScenariosDux Raymond Sy
 
The need to redefine genomic data sharing - moving towards Open Science Oct ...
The need to redefine genomic data sharing - moving towards Open Science  Oct ...The need to redefine genomic data sharing - moving towards Open Science  Oct ...
The need to redefine genomic data sharing - moving towards Open Science Oct ...Fiona Nielsen
 
Knowledge Management Presentation
Knowledge Management PresentationKnowledge Management Presentation
Knowledge Management Presentationkreaume
 
World Wide Technology: Is backing up to the cloud right for you?
World Wide Technology: Is backing up to the cloud right for you?World Wide Technology: Is backing up to the cloud right for you?
World Wide Technology: Is backing up to the cloud right for you?Angie Clark
 
Optimize Your Vertica Data Management Infrastructure
Optimize Your Vertica Data Management InfrastructureOptimize Your Vertica Data Management Infrastructure
Optimize Your Vertica Data Management InfrastructureImanis Data
 
OwnBackup, la plateforme #1 de Data Gouvernance pour Salesforce
OwnBackup, la plateforme #1 de Data Gouvernance pour SalesforceOwnBackup, la plateforme #1 de Data Gouvernance pour Salesforce
OwnBackup, la plateforme #1 de Data Gouvernance pour SalesforceThierry TROUIN ☁
 
Three Steps to Modern Media Asset Management with Active Archive
Three Steps to Modern Media Asset Management with Active ArchiveThree Steps to Modern Media Asset Management with Active Archive
Three Steps to Modern Media Asset Management with Active ArchiveAvere Systems
 
Webinar: Making The Always-On Data Center A Reality
Webinar: Making The Always-On Data Center A RealityWebinar: Making The Always-On Data Center A Reality
Webinar: Making The Always-On Data Center A RealityStorage Switzerland
 
Business Continuity with Disaster Recovery
Business Continuity with Disaster RecoveryBusiness Continuity with Disaster Recovery
Business Continuity with Disaster RecoveryYoong Seng Lai
 
Oracle Open World Presentation - Oracle RMAN Best Practices for Cloud Backups
Oracle Open World Presentation - Oracle RMAN Best Practices for Cloud Backups Oracle Open World Presentation - Oracle RMAN Best Practices for Cloud Backups
Oracle Open World Presentation - Oracle RMAN Best Practices for Cloud Backups Niklas Iveslatt
 

Similar to Create and recieve scientific data (16)

Why 2015 is the Year of Copy Data - What are the requirements?
Why 2015 is the Year of Copy Data - What are the requirements?Why 2015 is the Year of Copy Data - What are the requirements?
Why 2015 is the Year of Copy Data - What are the requirements?
 
Ensuring data quality with lakeFS
Ensuring data quality with lakeFSEnsuring data quality with lakeFS
Ensuring data quality with lakeFS
 
Webinar: Which Storage Architecture is Best for Splunk Analytics?
Webinar: Which Storage Architecture is Best for Splunk Analytics?Webinar: Which Storage Architecture is Best for Splunk Analytics?
Webinar: Which Storage Architecture is Best for Splunk Analytics?
 
Overview of SharePoint Server 2016 and Office 365 Hybrid Scenarios
Overview of SharePoint Server 2016 and Office 365 Hybrid ScenariosOverview of SharePoint Server 2016 and Office 365 Hybrid Scenarios
Overview of SharePoint Server 2016 and Office 365 Hybrid Scenarios
 
The need to redefine genomic data sharing - moving towards Open Science Oct ...
The need to redefine genomic data sharing - moving towards Open Science  Oct ...The need to redefine genomic data sharing - moving towards Open Science  Oct ...
The need to redefine genomic data sharing - moving towards Open Science Oct ...
 
Knowledge Management Presentation
Knowledge Management PresentationKnowledge Management Presentation
Knowledge Management Presentation
 
World Wide Technology: Is backing up to the cloud right for you?
World Wide Technology: Is backing up to the cloud right for you?World Wide Technology: Is backing up to the cloud right for you?
World Wide Technology: Is backing up to the cloud right for you?
 
Optimize Your Vertica Data Management Infrastructure
Optimize Your Vertica Data Management InfrastructureOptimize Your Vertica Data Management Infrastructure
Optimize Your Vertica Data Management Infrastructure
 
Acpl Brief profile
Acpl Brief profileAcpl Brief profile
Acpl Brief profile
 
OwnBackup, la plateforme #1 de Data Gouvernance pour Salesforce
OwnBackup, la plateforme #1 de Data Gouvernance pour SalesforceOwnBackup, la plateforme #1 de Data Gouvernance pour Salesforce
OwnBackup, la plateforme #1 de Data Gouvernance pour Salesforce
 
Three Steps to Modern Media Asset Management with Active Archive
Three Steps to Modern Media Asset Management with Active ArchiveThree Steps to Modern Media Asset Management with Active Archive
Three Steps to Modern Media Asset Management with Active Archive
 
Resource space
Resource spaceResource space
Resource space
 
Webinar: Making The Always-On Data Center A Reality
Webinar: Making The Always-On Data Center A RealityWebinar: Making The Always-On Data Center A Reality
Webinar: Making The Always-On Data Center A Reality
 
Business Continuity with Disaster Recovery
Business Continuity with Disaster RecoveryBusiness Continuity with Disaster Recovery
Business Continuity with Disaster Recovery
 
Oracle Open World Presentation - Oracle RMAN Best Practices for Cloud Backups
Oracle Open World Presentation - Oracle RMAN Best Practices for Cloud Backups Oracle Open World Presentation - Oracle RMAN Best Practices for Cloud Backups
Oracle Open World Presentation - Oracle RMAN Best Practices for Cloud Backups
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Create and recieve scientific data

  • 1. a centre of expertise in data curation and preservation Create or Receive Scientific data Dr. Frank Gibson and Dr. Phillip Lord Frank.Gibson@newcastle.ac.uk Phillip.Lord@newcastle.ac.uk Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc- sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. Digital Curation 101, October 6th-10th, 2008, NeSC, Edinburgh
  • 2. a centre of expertise in data curation and preservation “In the standard model, one collects data, publishes a paper or papers and then gradually loses the original dataset.” - Geoffrey Bowker Create or Receive
  • 3. a centre of expertise in data curation and preservation Create or Receive Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
  • 4. a centre of expertise in data curation and preservation Create or Receive Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
  • 5. a centre of expertise in data curation and preservation Create or Receive Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
  • 6. a centre of expertise in data curation and preservation Create or Receive Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
  • 7. If we have a paper a centre of expertise in data curation and preservation who cares about the data? Create or Receive http://flickr.com/photos/nicmcphee/2756494307/
  • 8. a centre of expertise in data curation and preservation A paper = a claim (or claims) The full record that supports that claim should be available for detailed examination and critique Create or Receive Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
  • 9. a centre of expertise in data curation and preservation Create or Receive Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
  • 10. a centre of expertise in data curation and preservation 1000+ Databases Create or Receive
  • 11. Biocuration: Databases a centre of expertise in data curation and preservation Create or Receive
  • 12. Biocuration: Wiki a centre of expertise in data curation and preservation Create or Receive
  • 13. a centre of expertise in data curation and preservation Create or Receive Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
  • 14. a centre of expertise in data curation and preservation Create or Receive
  • 15. Funders a centre of expertise in data curation and preservation http://flickr.com/photos/luismimunoznajar/2093185804/or Create Receive
  • 16. a centre of expertise in data curation and preservation Create or Receive Create or Receive
  • 17. a centre of expertise in data curation and preservation Curation aims Amenable Preservable Ownable Accessible Citable Create or Receive
  • 18. a centre of expertise in data curation and preservation Significant Properties of Data Content Syntax Semantics Create or Receive
  • 19. a centre of expertise in data curation and preservation Content Create or Receive
  • 20. a centre of expertise in data curation and preservation Publisher Type Title Creator Source Identifier Date Rights Create or Receive
  • 21. Simple Dublin Core a centre of expertise in data curation and preservation Type Format Title Identifier Creator Source Subject Language Description Relation Publisher Coverage Contributor Rights Date Create or Receive
  • 22. a centre of expertise in data curation and preservation Content: Domain Specific Create or Receive
  • 23. a centre of expertise in data curation and preservation Syntax Create or Receive
  • 24. a centre of expertise in data curation and preservation Create or Receive
  • 25. a centre of expertise in data curation and preservation Choosing a Syntax • Openness • -is there an open, publicly available specification for the format; are its specifications in the public domain; is it unencrypted? • Portability • -is the format independent of hardware, operating system, of other software; is it independent of particular institutions, groups, or events; is it in widespread current use; does it contain little or no built-in functionality? • Quality • -is it robust; simple; highly tested; loss-free? Create or Receive
  • 26. a centre of expertise in data curation and preservation Semantics Create or Receive
  • 27. a centre of expertise in data curation and preservation Semantics can be complex One semantic = many words Many words = one semantic Create or Receive
  • 28. a centre of expertise in data curation and preservation • Excel data example – do I need it? Create or Receive •Zeeberg et al. BMC Bioinformatics 2004 5:80 doi:10.1186/1471-2105-5-80 •Zeeberg et al. BMC Bioinformatics 2004 5:80 doi:10.1186/1471-2105-5-80
  • 29. What is fly? a centre of expertise in data curation and preservation •Fly •Fly •http://en.wikipedia.org/wiki/Image:Air_india_b747-400_vt-esn_arp.jpg •http://en.wikipedia.org/wiki/Image:MuscuDomestica.jpg •Fly •Fly •http://en.wikipedia.org/wiki/Image:Green_Highlander_salmon_fly.jpg •http://en.wikipedia.org/wiki/Image:Fly_poster.jpg Create or Receive
  • 30. a centre of expertise in data curation and preservation Ontology • A controlled vocabulary is an association between formal names (identifiers) and their definitions. • An ontology is a controlled vocabulary augmented with logical constraints that describe their interrelationships. Create or Receive
  • 31. a centre of expertise in data curation and preservation Ontologies for Life science • Emergence has occurred for two reasons • Consistent annotation of data • To add meaning and understanding that can be interpreted computationally • Bio-ontologies registered on the OBO foundry Create or Receive
  • 32. a centre of expertise in data curation and preservation Application of Significant Properties In Proteomics Create or Receive
  • 33. a centre of expertise in data curation and preservation Minimum Information about a Proteomics Experiment (MIAPE) • Sufficiency. • The MIAPE guidelines should require sufficient information about a dataset and its experimental context to allow a reader to understand and critically evaluate the interpretation and conclusions, and to support their experimental corroboration. • Practicability. • Achieving compliance with MIAPE should not be so burdensome as to prohibit its widespread use. Create or Receive
  • 34. a centre of expertise in data curation and preservation Create or Receive
  • 35. a centre of expertise in data curation and preservation Minimum reporting guidelines • Describe content • Implementation independent • Impacts • Publication • Syntax • Semantics Create or Receive
  • 36. a centre of expertise in data curation and preservation Syntax for proteomics • The content in MIAPE GE needs to be structured to facilitate • dissemination • transfer • storage • A community development process to agree on a syntax • building upon the FuGE data model • A pre-existing community developed representation of scientific experiments • Interoperable Create or Receive
  • 37. a centre of expertise in data curation and preservation FuGE • Model of common components in science investigations, such as materials, data, protocols, equipment and software. • Provides a framework for capturing complete laboratory workflows, enabling the integration of pre-existing data formats. Create or Receive
  • 38. a centre of expertise in data curation and preservation UML/XML/RDBMS • UML gives structure (but not syntax) • Very abstract, very general • XML provides a concrete syntax • Meta language is interoperable, checkable, viable and has basic metadata support (language, character coding and so on). • Tends toward the verbose. Not (very) searchable for itself. • Therefore, transfer and archive format. • RDBMS • SQL is (sort of) a standard • Highly computationally amenable form; v. good for searching • Conversion from XML is possible, but in a number of ways. • Hard work – nice to have an off-the-shelf implementation. Create or Receive
  • 39. GelMLa centre of expertise in data curation and preservation Create or Receive
  • 40. a centre of expertise in data curation and preservation Semantics for Gels Create or Receive
  • 41. Semantics for science a centre of expertise in data curation and preservation Create or Receive
  • 42. a centre of expertise in data curation and preservation Curation of Gel experiments Public Laboratory Data entry and transfer repositories I) GelML data entry tools GelML MAIPE GE II) Direct database submission III) Automated export of GelInfoML MAIPE GI sepCV Create or Receive
  • 43. Discoverability and reuse a centre of expertise in data curation and preservation •Persistent Identifiers •Rights management Create or Receive
  • 44. a centre of expertise in data curation and preservation Persistent Identifiers • a name for a resource which will remain the same regardless of where the resource is located • In biology typically assigned to data upon publication • Type of identifier dependent on publication method • Description and Representation Information provides more information about persistent identifiers Create or Receive
  • 45. a centre of expertise in data curation and preservation Rights management • Difficult to determine • Lots of legal issues • In biology/bioinformatics tends to be open access •Creative commons Create or Receive
  • 46. Receiving data for curation a centre of expertise in data curation and preservation Content Syntax Semantics Create or Receive
  • 47. Who will receive it? Route map a centre of expertise in data curation and preservation What are their policies on: Route map Content, Syntax, Semantics Plan your experiment to conform to Content, Syntax, Semantics Implement experiment to; Collect appropriate Content Structure in appropriate Syntax Ensure Semantics are preserved Curate… Create or Receive
  • 48. a centre of expertise in data curation and preservation Meta Route Map • How to build the map if you don’t have one yet. Create or Receive
  • 49. a centre of expertise in data curation and preservation Appraise and Select • Investigates the evaluation and selection of data for longterm curation and preservation Create or Receive
  • 50. a centre of expertise in data curation and preservation Acknowledgments • The CARMEN project • www.carmen.org.uk • The Proteomics Standards Initiative (PSI) • http://psidev.info • Colleagues at Newcastle University • Phillip Lord, Anil Wipat, Allyson Lister Create or Receive
  • 51. a centre of expertise in data curation and preservation Create or Receive