SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
On the reproducibility
of science
Melissa Haendel

Beyond the PDF2
20 March 2013

@ontowonka
haendel@ohsu.edu
The	
  science	
  cycle	
  




                                            Slide	
  from	
  Gully	
  Burns	
  


Do we know if the infrastructure is
actually broken?
The	
  science	
  cycle	
  




       Image:	
  h6p://www.joinchangena=on.org/blog/post/roadblocks-­‐on-­‐the-­‐pathway-­‐to-­‐ci=zenship	
  




This is a broken data story.
Reproducibility	
  is	
  dependent	
  at	
  a	
  minimum,	
  on	
  using	
  the	
  
same	
  resources.	
  But…	
  




   “All	
  companies	
  from	
  which	
  materials	
  were	
  obtained	
  should	
  
   be	
  listed.”	
                             -­‐	
  A	
  well-­‐known	
  journal	
  


Journal guidelines for methods are
often poor and space is limited
Hypothesis:	
  AnAbodies	
  in	
  the	
  published	
  literature	
  
          are	
  not	
  uniquely	
  idenAfiable	
  	
  
        Gather	
  journal	
  
           ar=cles	
                28	
  Journals	
           Iden=fying	
  ques=ons:	
  

   5	
  domains:	
                                             Is	
  the	
  an=body	
  iden=fiable	
  
   Immunology	
                     119	
  papers	
            in	
  the	
  vendor	
  site?	
  
   Cell	
  biology	
  
   Neuroscience	
                                              Is	
  the	
  catalog	
  number	
  
   Developmental	
  biology	
     454	
  an=bodies	
           reported?	
  
   General	
  biology	
  
                                    408	
  commercial	
  
                                      an=bodies	
              Is	
  the	
  source	
  organism	
  
   3	
  impact	
  factors:	
                                   reported?	
  
   High	
                         46	
  non-­‐commercial	
  
   Medium	
                              an=bodies	
  
   Low	
                                                       Is	
  the	
  an=body	
  target	
  
                                                               iden=fiable?	
  




An experiment in reproducibility
Approximately	
  half	
  of	
  anAbodies	
  are	
  not	
  uniquely	
  idenAfiable	
  in	
  
119	
  publicaAons	
  
                              60%	
  
                                                                                n=46	
  
                              50%	
  
  Percent	
  idenAfiable	
  




                                                n=408	
  
                              40%	
  

                              30%	
  

                              20%	
  

                              10%	
  

                               0%	
  
                                        Commercial	
  an=body	
     Non-­‐commerical	
  an=body	
  




The data shows…
Unique	
  idenAficaAon	
  of	
  commercial	
  anAbodies	
  varies	
  across	
  discipline	
  and	
  
impact	
  factor	
  
                             100%	
  
                                                                             n=87	
  
                              90%	
  
                              80%	
                                                         n=95	
  
 Percent	
  iden=fiable	
  




                              70%	
  
                              60%	
  
                                                             n=94	
                                                         High	
  
                              50%	
         n=124	
  
                                                                                                            n=56	
          Medium	
  
                              40%	
                                                                                         Low	
  
                              30%	
  
                              20%	
  
                              10%	
  
                                0%	
  
                                         Immunology	
  Neuroscience	
   Dev	
  Bio	
     Cell	
  Bio	
     General	
  Bio	
  


In some domains high impact journals have worse
reporting, and in others it is the opposite
Maybe labs are just disorganized?
Meet the Urban Lab
Image:	
  Gourami	
  Watcher	
  




Meet the Urban Lab
The	
  Urban	
  lab	
  anAbodies	
  




A+ organization!
90%	
  
                            80%	
  
                            70%	
  
Percent	
  idenAfiable	
  




                            60%	
  
                            50%	
  
                            40%	
  
                            30%	
  
                            20%	
  
                            10%	
  
                             0%	
  
                                      Commerical	
  Ab	
   Non-­‐commercial	
   Catalog	
  number	
   Source	
  organism	
   Target	
  uniquely	
  
                                        iden=fiable	
   Ab	
  iden=fiable	
  	
      reported	
            reported	
            iden=fiable	
  




Of 14 antibodies published in 45 articles,
only 38% were identifiable
What does this tell us?
Scientists really do put their
data in cardboard boxes.
Ø Promote	
  beJer	
  reporAng	
  guidelines	
  in	
  journals	
  
 Ø Include	
  reviewing	
  guidelines	
  
 Ø Provide	
  tools	
  to	
  reference	
  research	
  resources	
  
    with	
  unique	
  and	
  persistent	
  IDs/URIs	
  	
  
 Ø Train	
  librarians	
  and	
  other	
  data	
  stewards	
  to	
  
    apply	
  data	
  standards	
  




What are we going to do about it?

Contenu connexe

Plus de mhaendel

Reusable data for biomedicine: A data licensing odyssey
Reusable data for biomedicine:  A data licensing odysseyReusable data for biomedicine:  A data licensing odyssey
Reusable data for biomedicine: A data licensing odysseymhaendel
 
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease DiscoveryData Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease Discoverymhaendel
 
Global phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discoveryGlobal phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discoverymhaendel
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebasesmhaendel
 
Deep phenotyping to aid identification of coding & non-coding rare disease v...
Deep phenotyping to aid identification  of coding & non-coding rare disease v...Deep phenotyping to aid identification  of coding & non-coding rare disease v...
Deep phenotyping to aid identification of coding & non-coding rare disease v...mhaendel
 
Science in the open, what does it take?
Science in the open, what does it take?Science in the open, what does it take?
Science in the open, what does it take?mhaendel
 
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...mhaendel
 
Phenopackets as applied to variant interpretation
Phenopackets as applied to variant interpretation Phenopackets as applied to variant interpretation
Phenopackets as applied to variant interpretation mhaendel
 
Credit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributionsCredit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributionsmhaendel
 
Deep phenotyping for everyone
Deep phenotyping for everyoneDeep phenotyping for everyone
Deep phenotyping for everyonemhaendel
 
Why the world needs phenopacketeers, and how to be one
Why the world needs phenopacketeers, and how to be oneWhy the world needs phenopacketeers, and how to be one
Why the world needs phenopacketeers, and how to be onemhaendel
 
On the frontier of genotype-2-phenotype data integration
On the frontier of genotype-2-phenotype data integrationOn the frontier of genotype-2-phenotype data integration
On the frontier of genotype-2-phenotype data integrationmhaendel
 
The Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discoveryThe Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discoverymhaendel
 
Envisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve diseaseEnvisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve diseasemhaendel
 
Getting (and giving) credit for all that we do
Getting (and giving) credit for all that we doGetting (and giving) credit for all that we do
Getting (and giving) credit for all that we domhaendel
 
The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicinemhaendel
 
The Monarch Initiative: An integrated genotype-phenotype platform for disease...
The Monarch Initiative: An integrated genotype-phenotype platform for disease...The Monarch Initiative: An integrated genotype-phenotype platform for disease...
The Monarch Initiative: An integrated genotype-phenotype platform for disease...mhaendel
 
Integrating clinical and model organism G2P data for disease discovery
Integrating clinical and model organism G2P data for disease discoveryIntegrating clinical and model organism G2P data for disease discovery
Integrating clinical and model organism G2P data for disease discoverymhaendel
 
Force11: Enabling transparency and efficiency in the research landscape
Force11: Enabling transparency and efficiency in the research landscapeForce11: Enabling transparency and efficiency in the research landscape
Force11: Enabling transparency and efficiency in the research landscapemhaendel
 
Semantic phenotyping for disease diagnosis and discovery
Semantic phenotyping for disease diagnosis and discovery Semantic phenotyping for disease diagnosis and discovery
Semantic phenotyping for disease diagnosis and discovery mhaendel
 

Plus de mhaendel (20)

Reusable data for biomedicine: A data licensing odyssey
Reusable data for biomedicine:  A data licensing odysseyReusable data for biomedicine:  A data licensing odyssey
Reusable data for biomedicine: A data licensing odyssey
 
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease DiscoveryData Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
 
Global phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discoveryGlobal phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discovery
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebases
 
Deep phenotyping to aid identification of coding & non-coding rare disease v...
Deep phenotyping to aid identification  of coding & non-coding rare disease v...Deep phenotyping to aid identification  of coding & non-coding rare disease v...
Deep phenotyping to aid identification of coding & non-coding rare disease v...
 
Science in the open, what does it take?
Science in the open, what does it take?Science in the open, what does it take?
Science in the open, what does it take?
 
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
 
Phenopackets as applied to variant interpretation
Phenopackets as applied to variant interpretation Phenopackets as applied to variant interpretation
Phenopackets as applied to variant interpretation
 
Credit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributionsCredit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributions
 
Deep phenotyping for everyone
Deep phenotyping for everyoneDeep phenotyping for everyone
Deep phenotyping for everyone
 
Why the world needs phenopacketeers, and how to be one
Why the world needs phenopacketeers, and how to be oneWhy the world needs phenopacketeers, and how to be one
Why the world needs phenopacketeers, and how to be one
 
On the frontier of genotype-2-phenotype data integration
On the frontier of genotype-2-phenotype data integrationOn the frontier of genotype-2-phenotype data integration
On the frontier of genotype-2-phenotype data integration
 
The Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discoveryThe Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discovery
 
Envisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve diseaseEnvisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve disease
 
Getting (and giving) credit for all that we do
Getting (and giving) credit for all that we doGetting (and giving) credit for all that we do
Getting (and giving) credit for all that we do
 
The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicine
 
The Monarch Initiative: An integrated genotype-phenotype platform for disease...
The Monarch Initiative: An integrated genotype-phenotype platform for disease...The Monarch Initiative: An integrated genotype-phenotype platform for disease...
The Monarch Initiative: An integrated genotype-phenotype platform for disease...
 
Integrating clinical and model organism G2P data for disease discovery
Integrating clinical and model organism G2P data for disease discoveryIntegrating clinical and model organism G2P data for disease discovery
Integrating clinical and model organism G2P data for disease discovery
 
Force11: Enabling transparency and efficiency in the research landscape
Force11: Enabling transparency and efficiency in the research landscapeForce11: Enabling transparency and efficiency in the research landscape
Force11: Enabling transparency and efficiency in the research landscape
 
Semantic phenotyping for disease diagnosis and discovery
Semantic phenotyping for disease diagnosis and discovery Semantic phenotyping for disease diagnosis and discovery
Semantic phenotyping for disease diagnosis and discovery
 

On Reproducibility of Science: Half of Antibodies Not Identifiable

  • 1. On the reproducibility of science Melissa Haendel Beyond the PDF2 20 March 2013 @ontowonka haendel@ohsu.edu
  • 2. The  science  cycle   Slide  from  Gully  Burns   Do we know if the infrastructure is actually broken?
  • 3. The  science  cycle   Image:  h6p://www.joinchangena=on.org/blog/post/roadblocks-­‐on-­‐the-­‐pathway-­‐to-­‐ci=zenship   This is a broken data story.
  • 4. Reproducibility  is  dependent  at  a  minimum,  on  using  the   same  resources.  But…   “All  companies  from  which  materials  were  obtained  should   be  listed.”   -­‐  A  well-­‐known  journal   Journal guidelines for methods are often poor and space is limited
  • 5. Hypothesis:  AnAbodies  in  the  published  literature   are  not  uniquely  idenAfiable     Gather  journal   ar=cles   28  Journals   Iden=fying  ques=ons:   5  domains:   Is  the  an=body  iden=fiable   Immunology   119  papers   in  the  vendor  site?   Cell  biology   Neuroscience   Is  the  catalog  number   Developmental  biology   454  an=bodies   reported?   General  biology   408  commercial   an=bodies   Is  the  source  organism   3  impact  factors:   reported?   High   46  non-­‐commercial   Medium   an=bodies   Low   Is  the  an=body  target   iden=fiable?   An experiment in reproducibility
  • 6. Approximately  half  of  anAbodies  are  not  uniquely  idenAfiable  in   119  publicaAons   60%   n=46   50%   Percent  idenAfiable   n=408   40%   30%   20%   10%   0%   Commercial  an=body   Non-­‐commerical  an=body   The data shows…
  • 7. Unique  idenAficaAon  of  commercial  anAbodies  varies  across  discipline  and   impact  factor   100%   n=87   90%   80%   n=95   Percent  iden=fiable   70%   60%   n=94   High   50%   n=124   n=56   Medium   40%   Low   30%   20%   10%   0%   Immunology  Neuroscience   Dev  Bio   Cell  Bio   General  Bio   In some domains high impact journals have worse reporting, and in others it is the opposite
  • 8. Maybe labs are just disorganized?
  • 10. Image:  Gourami  Watcher   Meet the Urban Lab
  • 11. The  Urban  lab  anAbodies   A+ organization!
  • 12. 90%   80%   70%   Percent  idenAfiable   60%   50%   40%   30%   20%   10%   0%   Commerical  Ab   Non-­‐commercial   Catalog  number   Source  organism   Target  uniquely   iden=fiable   Ab  iden=fiable     reported   reported   iden=fiable   Of 14 antibodies published in 45 articles, only 38% were identifiable
  • 13. What does this tell us?
  • 14. Scientists really do put their data in cardboard boxes.
  • 15. Ø Promote  beJer  reporAng  guidelines  in  journals   Ø Include  reviewing  guidelines   Ø Provide  tools  to  reference  research  resources   with  unique  and  persistent  IDs/URIs     Ø Train  librarians  and  other  data  stewards  to   apply  data  standards   What are we going to do about it?