SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
The Open Source ISA Metadata Tracking Framework:
From Data Curation and Management at the Source, to the Linked Data Universe



Eamonn Maguire
Lead Software Engineer
Oxford University

eamonn.maguire@oerc.ox.ac.uk




ISCB-Asia, 17th December 2012
What is ISA all about?


                                We want to enable better reporting of
                                experiments...

                                We want to make to easier for
                                submitters...

                                We want to provide tooling which
                                biologists will want to use...




ISCB-Asia, 17th December 2012
What’s the problem?

                                     Could be beans. Could be peas. Could be soup.




                                     Analogy time.
                                     Each can is an experiment.
     Tin can analogy borrowed from   We have no labels, so no indication about what is in the can.
     Norman Morrison & converted
      from ontologies to metadata
            transfer standards.


     In biology, things aren’t quite as bad as this, we have some labels, but they aren’t all in the same
     language.

     1. there is fragmentation in formats: the formats used to describe experiments are different,
     e.g. MAGE-Tab, PRIDE-ML, SRA-XML.
     2. different formats often capture different information - often not enough to actually repeat
     an experiment correctly
     3. the terminologies used to describe an experiment is different, e.g. humans vs homo sapiens
     or rat vs rattus norvegicus, making search more difficult.


ISCB-Asia, 17th December 2012
What’s the problem?

                                     Could be beans. Could be peas. Could be soup.
                                                      可能是豌豆 - a different representation...non latin language



                                     Analogy time.
                                     Each can is an experiment.
     Tin can analogy borrowed from   We have no labels, so no indication about what is in the can.
     Norman Morrison & converted
      from ontologies to metadata
            transfer standards.


     In biology, things aren’t quite as bad as this, we have some labels, but they aren’t all in the same
     language.

     1. there is fragmentation in formats: the formats used to describe experiments are different,
     e.g. MAGE-Tab, PRIDE-ML, SRA-XML.
     2. different formats often capture different information - often not enough to actually repeat
     an experiment correctly
     3. the terminologies used to describe an experiment is different, e.g. humans vs homo sapiens
     or rat vs rattus norvegicus, making search more difficult.


ISCB-Asia, 17th December 2012
What’s the problem?

                                     Could be beans. Could be peas. Could be soup.
                                                      可能是豌豆 - a different representation...non latin language
                                                      Might be petit pois - a different terminology

                                     Analogy time.
                                     Each can is an experiment.
     Tin can analogy borrowed from   We have no labels, so no indication about what is in the can.
     Norman Morrison & converted
      from ontologies to metadata
            transfer standards.


     In biology, things aren’t quite as bad as this, we have some labels, but they aren’t all in the same
     language.

     1. there is fragmentation in formats: the formats used to describe experiments are different,
     e.g. MAGE-Tab, PRIDE-ML, SRA-XML.
     2. different formats often capture different information - often not enough to actually repeat
     an experiment correctly
     3. the terminologies used to describe an experiment is different, e.g. humans vs homo sapiens
     or rat vs rattus norvegicus, making search more difficult.


ISCB-Asia, 17th December 2012
1. There is fragmentation in formats



       Can you imagine having to translate everything you write into a different language in
       order to submit your data?




ISCB-Asia, 17th December 2012
1. There is fragmentation in formats



       Can you imagine having to translate everything you write into a different language in
       order to submit your data?


       你能想象有翻译成不同的语言编写的一切,以提交                                       的数据吗?即使转换
       工具,像谷歌,翻译弄错了。




ISCB-Asia, 17th December 2012
1. There is fragmentation in formats



       Can you imagine having to translate everything you write into a different language in
       order to submit your data?


       你能想象有翻译成不同的语言编写的一切,以提交                                       的数据吗?即使转换
       工具,像谷歌,翻译弄错了。

      An féidir leat a shamhlú go bhfuil gach rud a scríobh tú a aistriú isteach i
      dteanga eile d'fhonn a chur isteach do chuid sonraí? Fiú uirlisí chomhshó,
      cosúil le google translate a fháilsé mícheart.




ISCB-Asia, 17th December 2012
1. There is fragmentation in formats: our solution

 Repositories are making it difficult for biologists to submit data, and for others to use it.
 Particularly for those performing multi-omic experiments...to submit say proteomic and
 transcriptomic data, one must provide slightly different information in two very different
 formats...why?

 Our solution is one general purpose, flexible format, herein referred to as ISA-Tab.



 A domain agnostic format to capture experimental metadata in omic experiments
 (transcriptomic, genomic, proteomic, metabolomic) as well as traditional experiments such as
 clinical chemistry and histology.




 ...it already works in lots of domains...nutrigenomics, toxicogenomics, public health... etc.



ISCB-Asia, 17th December 2012
1. There is fragmentation in formats: our solution

                     investigation                   investigation
                                                      high level concept to link
                                                      related studies

                                                     study
                                                      the central unit, containing
                                                      information on the subject
                                                      under study, its characteristics
                                                      and any treatments applied.
                                                      a study has associated assays

                                                     assay
                                                       test performed either on
                                                       material taken from the sub-
                                                       ject or on the whole initial
                                                       subject, which produce quali-
                                                       tative or quantitative meas-
                                                       urements (data)




      assay(s)                            assay(s)


                  pointers to data file
                                                                                               Biologists like tab.
                    names/location
                                                                                            They don’t like XML.
                                                                                         Through basic inference...
                   external files in                                                           ISA-Tab is good :)
                 native or other for-
                        mats

         data                                data




ISCB-Asia, 17th December 2012
2. Different formats often capture different information
                        ...But there are lots of similarities




 Minimal Information about a Biological or Biomedical Investigation.

 The information captured by a format is generated via a ‘checklist’, ideally a list of fields that
 together provide the minimal amount of information required to be able to reproduce an
 experiment.

 MIBBI is trying to harmonise these checklists to reduce redundancy and make them
 interoperable.


We have 32 checklists at present because there are differences in what is deemed important
depending on the experiment being performed.


ISCB-Asia, 17th December 2012
Now integrated in




Helping to demystify the
unwieldy world of
standards...


Find out what standards are out
there...MI Checklists, ontologies
and formats plus what domains
they are suited to...

Find out about data sharing
policies from NIH for example.

Databases, which standards they
use etc.

ISCB-Asia, 17th December 2012
Now integrated in

In biology, things aren’t quite as bad as this, we have some labels, but they aren’t all in the same
language. What do I mean by this? Well...

1. there is fragmentation:




2. different formats often capture different information




3. the terminologies used to describe an experiment are different: we promote the use of
ontologies to harmonize the recording of experiments.




ISCB-Asia, 17th December 2012
The ISA tools...



               Ontologies                                               MI Checklists




                                         Common representation


           ISA tools brings together a common representation, MI checklists and ontologies.



ISCB-Asia, 17th December 2012
The ISA tools

     Developed on top of the ISA-Tab format...modular, configurable, open source, Java based*




                                     See them all at isa-tools.org




ISCB-Asia, 17th December 2012
The ISA tools... a tool for all your needs




ISCB-Asia, 17th December 2012
Configurable...


                                We need to support lots of different checklists,
                                and it should be easy for people to change their
                                requirements should they need to....

                                So, our infrastructure is built upon XML files.
                                These are created by the ISAConfigurator.




                                A configuration XML file describes the fields (or
                                checklist) required to describe a particular
                                experiment and any ontologies to be used.




ISCB-Asia, 17th December 2012
Create configuration xml files


ISCB-Asia, 17th December 2012
isacreator
                                Create & Edit ISA-Tab


ISCB-Asia, 17th December 2012
The ISAcreator...                                                                       file chooser




                                                      publication searcher                                     visualization




                                                                                                                           ontology search
                                       QR code generator
          isacreator

Developed to be a user friendly
way to enter standards-compliant                                             automated ontology tagging
metadata: it has lots of features...         spreadsheet-like interface       tagterms   visualise   suggest   clear all   help
                                                                                                                                  powered by ncbo annotator




    But these are just some of
 them...we also have a data entry
  wizard and an import utility...




 ISCB-Asia, 17th December 2012
Ontology search and automated annotation in Google Docs
Make sure the ISA-Tab is correct


ISCB-Asia, 17th December 2012
validate from the dedicated tool...
                                               or...
                                validate from the command line...
                                               or...
                                   within ISAcreator directly...




ISCB-Asia, 17th December 2012
Convert to or from differing formats


ISCB-Asia, 17th December 2012
The converters




        Fully Endorsed by ArrayExpress, PRIDE and the European Nucleotide Archive (ENA)...




                                    Converts MAGE-Tab to ISA-Tab.
   This is still in beta, however we are getting close to a fully working version. We’ve successfully
            creating validated ISA-Tab for ~90% of the 21k experiments in ArrayExpress
    Available as a web service, web interface and source is available for running conversions locally
                                http://isatab.sourceforge.net/magetoisa/



ISCB-Asia, 17th December 2012
The converters...semantic web

                                     type,
               material(en*ty(
                                                    Saghantelian_1,             has,specified,input,


                                                       derives,from,               Sample,
                                                                                  collec5on,
                                                                       has,specified,output,
                                                          KO1,                                           type,
                                           type,
                                                                           has,specified,input,
                   processed,,
                    material,
                                                       derives,from,
                                                                          extrac5on,                    material,,
                                                                                                       processing,
                                   type,                               has,specified,output,
                                                      KO1_extract,
                                                                              has,specified,input,         type,


                                                                                     mass,
                  Informa5on,                          derives,from,
                                                                                 spectrometry,
                 content,en5ty,
                                                                                has,specified,output,
                                  type,
                                                   ./cdf/KO/ko15.CDF,




ISCB-Asia, 17th December 2012
The converters...semantic web




    •Make the semantics of ISAtab explicit, including materials & data entities
      & processes
    •Exploit the semantic annotations available in ISAtab datasets
    •Augment ISA syntax with new elements (e.g. groups), facilitating the
      understanding & querying of experimental design
    •Facilitate querying, data integration & knowledge discovery/reasoning



ISCB-Asia, 17th December 2012
The converters...semantic web




    Notes&in&Lab&books&         Spreadsheets&&&Tables&     Facts&as&RDF&statements&
(informa1on&for&humans)&          (ISAtab&metadata)&     (informa1on&for&machines)&




ISCB-Asia, 17th December 2012
Get ISA-Tab into a database
                                Share it (or don’t) with the world



ISCB-Asia, 17th December 2012
Database & Web Application




ISCB-Asia, 17th December 2012
Web application




ISCB-Asia, 17th December 2012
Web application




ISCB-Asia, 17th December 2012
Web application




ISCB-Asia, 17th December 2012
Last but not least...




                                      Analysis




ISCB-Asia, 17th December 2012
Package to read ISA-Tab into R, especially BioConductor to run analysis
                                                  scripts on your data...
                                It can automatically call microarray, mass spec and flow cytometry
                                           analysis packages on appropriate datasets...
                                                 Available from BioConductor...



                                  There is also a script to create Galaxy libraries from ISA-Tab
                                           Brad Chapman is working on this at HSPH




                                Dedicated ISAcreator mode. Allows for persistence and perusal of
                                              ISA experiments in GenomeSpace




ISCB-Asia, 17th December 2012
isacommons
A growing ecosystem of over
30 public and internal
resources using the ISA
metadata tracking framework
to facilitate standards-
compliant collection, curation,
management and reuse of
investigations in an increasingly
diverse set of life science
domains, including:



                                    S t e m C e ll C o m m o n s
                                                                     Nanotechnology
                                                                   Informatics Working
                                                                         Group




ISCB-Asia, 17th December 2012
ISCB-Asia, 17th December 2012
ISA software suite: supporting standards-compliant
                          experimental annotation and enabling curation at the
                          community level
                          Philippe Rocca-Serra; Marco Brandizi; Eamonn Maguire; Nataliya Sklyar; Chris Taylor; Kimberly Begley; Dawn
                          Field; Stephen Harris; Winston Hide; Oliver Hofmann; Steffen Neumann; Peter Sterk; Weida Tong; Susanna-
                          Assunta Sansone
                          Bioinformatics 2010 26: 2354-2356




                          Towards Interoperable Bioscience Data
                          Sansone SA, Rocca-Serra P, Field D, Maguire E et al
                          Nature Genetics 2012




ISCB-Asia, 17th December 2012
Thanks for listening...
                                            Questions??


                                              You can email us...
                                     isatools@googlegroups.com

                                               View our website
                                         http://www.isa-tools.org

                                  View our Git repo & contribute
                                       http://github.com/ISA-tools
                                                    View our blog
                                    http://isatools.wordpress.com

                                            Follow us on Twitter
                                                       @isatools



ISCB-Asia, 17th December 2012

Contenu connexe

En vedette

Metadata taxonomy and content types oh my collab con - mar 2015
Metadata taxonomy and content types oh my   collab con - mar 2015Metadata taxonomy and content types oh my   collab con - mar 2015
Metadata taxonomy and content types oh my collab con - mar 2015Ruven Gotz
 
How to Jump Start Taxonomy Content Creation webinar slides 9 24 15
How to Jump Start Taxonomy Content Creation webinar slides 9 24 15How to Jump Start Taxonomy Content Creation webinar slides 9 24 15
How to Jump Start Taxonomy Content Creation webinar slides 9 24 15Alicia Harapko
 
Power of SharePoint Content Types
Power of SharePoint Content TypesPower of SharePoint Content Types
Power of SharePoint Content TypesDaan De Brouckere
 
Taxonomies Crossing Boundaries: Thomson Reuters Life Sciences Taxonomy Use Cases
Taxonomies Crossing Boundaries: Thomson Reuters Life Sciences Taxonomy Use CasesTaxonomies Crossing Boundaries: Thomson Reuters Life Sciences Taxonomy Use Cases
Taxonomies Crossing Boundaries: Thomson Reuters Life Sciences Taxonomy Use CasesSynaptica, LLC
 
SharePoint Document Sets
SharePoint Document SetsSharePoint Document Sets
SharePoint Document SetsRegroove
 
Taxonomy, Content Types & Metadata: Oh My!
Taxonomy, Content Types & Metadata: Oh My!Taxonomy, Content Types & Metadata: Oh My!
Taxonomy, Content Types & Metadata: Oh My!Ruven Gotz
 
Enterprise Knowledge - Taxonomy Design Best Practices and Methodology
Enterprise Knowledge - Taxonomy Design Best Practices and MethodologyEnterprise Knowledge - Taxonomy Design Best Practices and Methodology
Enterprise Knowledge - Taxonomy Design Best Practices and MethodologyEnterprise Knowledge
 

En vedette (8)

Metadata taxonomy and content types oh my collab con - mar 2015
Metadata taxonomy and content types oh my   collab con - mar 2015Metadata taxonomy and content types oh my   collab con - mar 2015
Metadata taxonomy and content types oh my collab con - mar 2015
 
How to Jump Start Taxonomy Content Creation webinar slides 9 24 15
How to Jump Start Taxonomy Content Creation webinar slides 9 24 15How to Jump Start Taxonomy Content Creation webinar slides 9 24 15
How to Jump Start Taxonomy Content Creation webinar slides 9 24 15
 
Power of SharePoint Content Types
Power of SharePoint Content TypesPower of SharePoint Content Types
Power of SharePoint Content Types
 
Taxonomies Crossing Boundaries: Thomson Reuters Life Sciences Taxonomy Use Cases
Taxonomies Crossing Boundaries: Thomson Reuters Life Sciences Taxonomy Use CasesTaxonomies Crossing Boundaries: Thomson Reuters Life Sciences Taxonomy Use Cases
Taxonomies Crossing Boundaries: Thomson Reuters Life Sciences Taxonomy Use Cases
 
SharePoint Document Sets
SharePoint Document SetsSharePoint Document Sets
SharePoint Document Sets
 
Taxonomy, Content Types & Metadata: Oh My!
Taxonomy, Content Types & Metadata: Oh My!Taxonomy, Content Types & Metadata: Oh My!
Taxonomy, Content Types & Metadata: Oh My!
 
Taxonomy And Metadata
Taxonomy And MetadataTaxonomy And Metadata
Taxonomy And Metadata
 
Enterprise Knowledge - Taxonomy Design Best Practices and Methodology
Enterprise Knowledge - Taxonomy Design Best Practices and MethodologyEnterprise Knowledge - Taxonomy Design Best Practices and Methodology
Enterprise Knowledge - Taxonomy Design Best Practices and Methodology
 

Similaire à Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Curation and Management at the Source, to the Linked Data Universe

P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...Jan Aerts
 
Lurking in the lab: analysis of data from molecular biology laboratory instr...
Lurking in the lab:  analysis of data from molecular biology laboratory instr...Lurking in the lab:  analysis of data from molecular biology laboratory instr...
Lurking in the lab: analysis of data from molecular biology laboratory instr...jenferguson
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseHilmar Lapp
 
FRBR Applied to Scientific Data by Joseph A. Hourclé
FRBR Applied to Scientific Data by Joseph A. HourcléFRBR Applied to Scientific Data by Joseph A. Hourclé
FRBR Applied to Scientific Data by Joseph A. HourcléPVC.ASIST
 
Teaching Case Studies
Teaching Case StudiesTeaching Case Studies
Teaching Case StudiesJulie Goldman
 
Data Archiving and Processing
Data Archiving and ProcessingData Archiving and Processing
Data Archiving and ProcessingCRRC-Armenia
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...Natalie Stanford
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
 
Brain Imaging Data Structure and Center for Reproducible Neuroscince
Brain Imaging Data Structure and Center for Reproducible NeuroscinceBrain Imaging Data Structure and Center for Reproducible Neuroscince
Brain Imaging Data Structure and Center for Reproducible NeuroscinceKrzysztof Gorgolewski
 
Bibliography (Microsoft Word, 61k)
Bibliography (Microsoft Word, 61k)Bibliography (Microsoft Word, 61k)
Bibliography (Microsoft Word, 61k)butest
 
'These Results Suggest That...', Knowledge Attribution in Scientific Discourse
'These Results Suggest That...', Knowledge Attribution in Scientific Discourse'These Results Suggest That...', Knowledge Attribution in Scientific Discourse
'These Results Suggest That...', Knowledge Attribution in Scientific DiscourseAnita de Waard
 
Research Objects for e-Laboratories
Research Objects for e-LaboratoriesResearch Objects for e-Laboratories
Research Objects for e-LaboratoriesDavid Newman
 
Fmri of bilingual brain atl reveals language independent representations
Fmri of bilingual brain atl reveals language independent representations Fmri of bilingual brain atl reveals language independent representations
Fmri of bilingual brain atl reveals language independent representations Emily Sabo
 

Similaire à Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Curation and Management at the Source, to the Linked Data Universe (20)

Epistemics
EpistemicsEpistemics
Epistemics
 
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
 
Lurking in the lab: analysis of data from molecular biology laboratory instr...
Lurking in the lab:  analysis of data from molecular biology laboratory instr...Lurking in the lab:  analysis of data from molecular biology laboratory instr...
Lurking in the lab: analysis of data from molecular biology laboratory instr...
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
FRBR Applied to Scientific Data by Joseph A. Hourclé
FRBR Applied to Scientific Data by Joseph A. HourcléFRBR Applied to Scientific Data by Joseph A. Hourclé
FRBR Applied to Scientific Data by Joseph A. Hourclé
 
Ma
MaMa
Ma
 
NLP todo
NLP todoNLP todo
NLP todo
 
Teaching Case Studies
Teaching Case StudiesTeaching Case Studies
Teaching Case Studies
 
Data Archiving and Processing
Data Archiving and ProcessingData Archiving and Processing
Data Archiving and Processing
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...
 
NETTAB 2012
NETTAB 2012NETTAB 2012
NETTAB 2012
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Brain Imaging Data Structure and Center for Reproducible Neuroscince
Brain Imaging Data Structure and Center for Reproducible NeuroscinceBrain Imaging Data Structure and Center for Reproducible Neuroscince
Brain Imaging Data Structure and Center for Reproducible Neuroscince
 
Brain Imaging Data Structure
Brain Imaging Data StructureBrain Imaging Data Structure
Brain Imaging Data Structure
 
Bibliography (Microsoft Word, 61k)
Bibliography (Microsoft Word, 61k)Bibliography (Microsoft Word, 61k)
Bibliography (Microsoft Word, 61k)
 
'These Results Suggest That...', Knowledge Attribution in Scientific Discourse
'These Results Suggest That...', Knowledge Attribution in Scientific Discourse'These Results Suggest That...', Knowledge Attribution in Scientific Discourse
'These Results Suggest That...', Knowledge Attribution in Scientific Discourse
 
Research Objects for e-Laboratories
Research Objects for e-LaboratoriesResearch Objects for e-Laboratories
Research Objects for e-Laboratories
 
Fmri of bilingual brain atl reveals language independent representations
Fmri of bilingual brain atl reveals language independent representations Fmri of bilingual brain atl reveals language independent representations
Fmri of bilingual brain atl reveals language independent representations
 

Plus de GigaScience, BGI Hong Kong

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...GigaScience, BGI Hong Kong
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteGigaScience, BGI Hong Kong
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...GigaScience, BGI Hong Kong
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...GigaScience, BGI Hong Kong
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...GigaScience, BGI Hong Kong
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...GigaScience, BGI Hong Kong
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...GigaScience, BGI Hong Kong
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixGigaScience, BGI Hong Kong
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserGigaScience, BGI Hong Kong
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...GigaScience, BGI Hong Kong
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceGigaScience, BGI Hong Kong
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...GigaScience, BGI Hong Kong
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...GigaScience, BGI Hong Kong
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveGigaScience, BGI Hong Kong
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...GigaScience, BGI Hong Kong
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...GigaScience, BGI Hong Kong
 

Plus de GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 

Dernier

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 

Dernier (20)

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 

Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Curation and Management at the Source, to the Linked Data Universe

  • 1. The Open Source ISA Metadata Tracking Framework: From Data Curation and Management at the Source, to the Linked Data Universe Eamonn Maguire Lead Software Engineer Oxford University eamonn.maguire@oerc.ox.ac.uk ISCB-Asia, 17th December 2012
  • 2. What is ISA all about? We want to enable better reporting of experiments... We want to make to easier for submitters... We want to provide tooling which biologists will want to use... ISCB-Asia, 17th December 2012
  • 3. What’s the problem? Could be beans. Could be peas. Could be soup. Analogy time. Each can is an experiment. Tin can analogy borrowed from We have no labels, so no indication about what is in the can. Norman Morrison & converted from ontologies to metadata transfer standards. In biology, things aren’t quite as bad as this, we have some labels, but they aren’t all in the same language. 1. there is fragmentation in formats: the formats used to describe experiments are different, e.g. MAGE-Tab, PRIDE-ML, SRA-XML. 2. different formats often capture different information - often not enough to actually repeat an experiment correctly 3. the terminologies used to describe an experiment is different, e.g. humans vs homo sapiens or rat vs rattus norvegicus, making search more difficult. ISCB-Asia, 17th December 2012
  • 4. What’s the problem? Could be beans. Could be peas. Could be soup. 可能是豌豆 - a different representation...non latin language Analogy time. Each can is an experiment. Tin can analogy borrowed from We have no labels, so no indication about what is in the can. Norman Morrison & converted from ontologies to metadata transfer standards. In biology, things aren’t quite as bad as this, we have some labels, but they aren’t all in the same language. 1. there is fragmentation in formats: the formats used to describe experiments are different, e.g. MAGE-Tab, PRIDE-ML, SRA-XML. 2. different formats often capture different information - often not enough to actually repeat an experiment correctly 3. the terminologies used to describe an experiment is different, e.g. humans vs homo sapiens or rat vs rattus norvegicus, making search more difficult. ISCB-Asia, 17th December 2012
  • 5. What’s the problem? Could be beans. Could be peas. Could be soup. 可能是豌豆 - a different representation...non latin language Might be petit pois - a different terminology Analogy time. Each can is an experiment. Tin can analogy borrowed from We have no labels, so no indication about what is in the can. Norman Morrison & converted from ontologies to metadata transfer standards. In biology, things aren’t quite as bad as this, we have some labels, but they aren’t all in the same language. 1. there is fragmentation in formats: the formats used to describe experiments are different, e.g. MAGE-Tab, PRIDE-ML, SRA-XML. 2. different formats often capture different information - often not enough to actually repeat an experiment correctly 3. the terminologies used to describe an experiment is different, e.g. humans vs homo sapiens or rat vs rattus norvegicus, making search more difficult. ISCB-Asia, 17th December 2012
  • 6. 1. There is fragmentation in formats Can you imagine having to translate everything you write into a different language in order to submit your data? ISCB-Asia, 17th December 2012
  • 7. 1. There is fragmentation in formats Can you imagine having to translate everything you write into a different language in order to submit your data? 你能想象有翻译成不同的语言编写的一切,以提交 的数据吗?即使转换 工具,像谷歌,翻译弄错了。 ISCB-Asia, 17th December 2012
  • 8. 1. There is fragmentation in formats Can you imagine having to translate everything you write into a different language in order to submit your data? 你能想象有翻译成不同的语言编写的一切,以提交 的数据吗?即使转换 工具,像谷歌,翻译弄错了。 An féidir leat a shamhlú go bhfuil gach rud a scríobh tú a aistriú isteach i dteanga eile d'fhonn a chur isteach do chuid sonraí? Fiú uirlisí chomhshó, cosúil le google translate a fháilsé mícheart. ISCB-Asia, 17th December 2012
  • 9. 1. There is fragmentation in formats: our solution Repositories are making it difficult for biologists to submit data, and for others to use it. Particularly for those performing multi-omic experiments...to submit say proteomic and transcriptomic data, one must provide slightly different information in two very different formats...why? Our solution is one general purpose, flexible format, herein referred to as ISA-Tab. A domain agnostic format to capture experimental metadata in omic experiments (transcriptomic, genomic, proteomic, metabolomic) as well as traditional experiments such as clinical chemistry and histology. ...it already works in lots of domains...nutrigenomics, toxicogenomics, public health... etc. ISCB-Asia, 17th December 2012
  • 10. 1. There is fragmentation in formats: our solution investigation investigation high level concept to link related studies study the central unit, containing information on the subject under study, its characteristics and any treatments applied. a study has associated assays assay test performed either on material taken from the sub- ject or on the whole initial subject, which produce quali- tative or quantitative meas- urements (data) assay(s) assay(s) pointers to data file Biologists like tab. names/location They don’t like XML. Through basic inference... external files in ISA-Tab is good :) native or other for- mats data data ISCB-Asia, 17th December 2012
  • 11. 2. Different formats often capture different information ...But there are lots of similarities Minimal Information about a Biological or Biomedical Investigation. The information captured by a format is generated via a ‘checklist’, ideally a list of fields that together provide the minimal amount of information required to be able to reproduce an experiment. MIBBI is trying to harmonise these checklists to reduce redundancy and make them interoperable. We have 32 checklists at present because there are differences in what is deemed important depending on the experiment being performed. ISCB-Asia, 17th December 2012
  • 12. Now integrated in Helping to demystify the unwieldy world of standards... Find out what standards are out there...MI Checklists, ontologies and formats plus what domains they are suited to... Find out about data sharing policies from NIH for example. Databases, which standards they use etc. ISCB-Asia, 17th December 2012
  • 13. Now integrated in In biology, things aren’t quite as bad as this, we have some labels, but they aren’t all in the same language. What do I mean by this? Well... 1. there is fragmentation: 2. different formats often capture different information 3. the terminologies used to describe an experiment are different: we promote the use of ontologies to harmonize the recording of experiments. ISCB-Asia, 17th December 2012
  • 14. The ISA tools... Ontologies MI Checklists Common representation ISA tools brings together a common representation, MI checklists and ontologies. ISCB-Asia, 17th December 2012
  • 15. The ISA tools Developed on top of the ISA-Tab format...modular, configurable, open source, Java based* See them all at isa-tools.org ISCB-Asia, 17th December 2012
  • 16. The ISA tools... a tool for all your needs ISCB-Asia, 17th December 2012
  • 17. Configurable... We need to support lots of different checklists, and it should be easy for people to change their requirements should they need to.... So, our infrastructure is built upon XML files. These are created by the ISAConfigurator. A configuration XML file describes the fields (or checklist) required to describe a particular experiment and any ontologies to be used. ISCB-Asia, 17th December 2012
  • 18. Create configuration xml files ISCB-Asia, 17th December 2012
  • 19. isacreator Create & Edit ISA-Tab ISCB-Asia, 17th December 2012
  • 20. The ISAcreator... file chooser publication searcher visualization ontology search QR code generator isacreator Developed to be a user friendly way to enter standards-compliant automated ontology tagging metadata: it has lots of features... spreadsheet-like interface tagterms visualise suggest clear all help powered by ncbo annotator But these are just some of them...we also have a data entry wizard and an import utility... ISCB-Asia, 17th December 2012
  • 21. Ontology search and automated annotation in Google Docs
  • 22.
  • 23. Make sure the ISA-Tab is correct ISCB-Asia, 17th December 2012
  • 24. validate from the dedicated tool... or... validate from the command line... or... within ISAcreator directly... ISCB-Asia, 17th December 2012
  • 25. Convert to or from differing formats ISCB-Asia, 17th December 2012
  • 26. The converters Fully Endorsed by ArrayExpress, PRIDE and the European Nucleotide Archive (ENA)... Converts MAGE-Tab to ISA-Tab. This is still in beta, however we are getting close to a fully working version. We’ve successfully creating validated ISA-Tab for ~90% of the 21k experiments in ArrayExpress Available as a web service, web interface and source is available for running conversions locally http://isatab.sourceforge.net/magetoisa/ ISCB-Asia, 17th December 2012
  • 27. The converters...semantic web type, material(en*ty( Saghantelian_1, has,specified,input, derives,from, Sample, collec5on, has,specified,output, KO1, type, type, has,specified,input, processed,, material, derives,from, extrac5on, material,, processing, type, has,specified,output, KO1_extract, has,specified,input, type, mass, Informa5on, derives,from, spectrometry, content,en5ty, has,specified,output, type, ./cdf/KO/ko15.CDF, ISCB-Asia, 17th December 2012
  • 28. The converters...semantic web •Make the semantics of ISAtab explicit, including materials & data entities & processes •Exploit the semantic annotations available in ISAtab datasets •Augment ISA syntax with new elements (e.g. groups), facilitating the understanding & querying of experimental design •Facilitate querying, data integration & knowledge discovery/reasoning ISCB-Asia, 17th December 2012
  • 29. The converters...semantic web Notes&in&Lab&books& Spreadsheets&&&Tables& Facts&as&RDF&statements& (informa1on&for&humans)& (ISAtab&metadata)& (informa1on&for&machines)& ISCB-Asia, 17th December 2012
  • 30. Get ISA-Tab into a database Share it (or don’t) with the world ISCB-Asia, 17th December 2012
  • 31. Database & Web Application ISCB-Asia, 17th December 2012
  • 35. Last but not least... Analysis ISCB-Asia, 17th December 2012
  • 36. Package to read ISA-Tab into R, especially BioConductor to run analysis scripts on your data... It can automatically call microarray, mass spec and flow cytometry analysis packages on appropriate datasets... Available from BioConductor... There is also a script to create Galaxy libraries from ISA-Tab Brad Chapman is working on this at HSPH Dedicated ISAcreator mode. Allows for persistence and perusal of ISA experiments in GenomeSpace ISCB-Asia, 17th December 2012
  • 37. isacommons A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards- compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including: S t e m C e ll C o m m o n s Nanotechnology Informatics Working Group ISCB-Asia, 17th December 2012
  • 39. ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level Philippe Rocca-Serra; Marco Brandizi; Eamonn Maguire; Nataliya Sklyar; Chris Taylor; Kimberly Begley; Dawn Field; Stephen Harris; Winston Hide; Oliver Hofmann; Steffen Neumann; Peter Sterk; Weida Tong; Susanna- Assunta Sansone Bioinformatics 2010 26: 2354-2356 Towards Interoperable Bioscience Data Sansone SA, Rocca-Serra P, Field D, Maguire E et al Nature Genetics 2012 ISCB-Asia, 17th December 2012
  • 40. Thanks for listening... Questions?? You can email us... isatools@googlegroups.com View our website http://www.isa-tools.org View our Git repo & contribute http://github.com/ISA-tools View our blog http://isatools.wordpress.com Follow us on Twitter @isatools ISCB-Asia, 17th December 2012