SlideShare une entreprise Scribd logo
1  sur  17
The Content Mine
Peter Murray-Rust[*]
University of Cambridge, Open Knowledge,
& Shuttleworth Fellow
OKFest, Berlin, 2014-07-15, DE
[*] and Michelle Brook, Jenny Molloy, Ross Mounce,
Richard Smith-Unna, Mark MacGillivray, Emanuel
Toliv
Liberating facts for humanity*
• Public science 500,000,000,000 USD per year
• 85% of medical research is wasted (bad design,
lost data, non-communication)
• ContentMine will liberate 100,000,000 facts per
year from scientific literature
• Crawl, Scrape, Extract, Republish
• Open Data CC 0, Open Standards, Open Source
• COLLABORATIVE, any data-rich discipline
• [*] Closed data means people die
But we can now
turn PDFs into
Science
We can’t turn a hamburger into a cow
UNITS
TICKS
QUANTITY
SCALE
TITLES
DATA!!
2000+ points
Dumb PDF
CSV
Semantic
Spectrum
2nd Derivative
Smoothing
Gaussian Filter
Automatic
extraction
Chemical Computer Vision
1 sec to turn this into semantic science
PROPERTIES (Name-Value-Units-Error)
Name Value Units
NV U NV U N V
U
N
E
V E U
Note CML supports value ranges and errors
“nuggets” in a scientific paper
quantity
units
Value ranges
Humans aren’t designed to mine this … 
chemical
project places
Parsing chemical sentences
http://wwmm.ch.cam.ac.uk/chemicaltagger
• Typical
Typical chemical synthesis
Open Content Mining of FACTs
Machines can interpret chemical reactions
We have done 500,000 patents. There are >
3,000,000 reactions/year. Added value > 1B Eur.
Evolution of ultraviolet
vision in the largest avian
radiation - the passerines
Anders Ödeen 1* , Olle
Håstad 2,3 and Per Alström 4
PDF 
HTML 
Styles , superscripts
And diåcritics
preserved!
AMI
PDF 
Turdus iliacus
Taeniopygia guttata
Serinus canaria
Lanius excubitor
Melopsittacus undulatus
Pavo cristatus
Sturnus vulgaris
Dolichonyx oryzivorus
Ficedula hypoleuca
Vaccinium myrtillus
Falco tinnunculus
Turdus
Pomatostomus
Leothrix
Amytornis
Acanthisitta
Orthonyx x 2
Malurus
Cnemophilus x 4
Philesturnus x 2
Motacilla x 2
Toxorhampus x 2
Linked Open Data – the world’s knowledge
very little physical science 
http://upload.wikimedia.org/wikipedia/commons/3/34/LOD_Cloud_Diagram_as_of_September_2011.png
DBPedia
BIO
Comp
Lib
PDB
Ontologies
GOV
GOV.uk
Music,
Art
Literature
Social
Knowledge
bases
RDF
triples
Acanthisittidae
Acanthizidae
Acrocephalidae
Callaeidae
Campephagidae
Cnemophilidae
Corvidae
0.84
0.91
0.93
0.95
Acanthisitta
Acrocephalus
Ailuroedus
Ailuroedus
Amytornis
Camptostoma
AMI
23.12
34.54
37.21
38.55
Posterior
probability
AMI can MEASURE
Branch lengths!
NexML
Genus Family
HTML
We can do any data…
… pixel analysis …

Contenu connexe

Tendances

Content Mining for Machines and Humans
Content Mining for Machines and HumansContent Mining for Machines and Humans
Content Mining for Machines and Humanspetermurrayrust
 
Disruptive Communities and Technology
Disruptive Communities and TechnologyDisruptive Communities and Technology
Disruptive Communities and Technologypetermurrayrust
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)petermurrayrust
 
Embrace the Open Revolution
Embrace the Open RevolutionEmbrace the Open Revolution
Embrace the Open Revolutionpetermurrayrust
 
ContentMine and WikiData
ContentMine and WikiDataContentMine and WikiData
ContentMine and WikiDataTheContentMine
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trustpetermurrayrust
 
Copyright Reform and Open Data
Copyright Reform and Open DataCopyright Reform and Open Data
Copyright Reform and Open Datapetermurrayrust
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neurosciencepetermurrayrust
 
Principles and practice of Open Science
Principles and practice of Open SciencePrinciples and practice of Open Science
Principles and practice of Open Sciencepetermurrayrust
 
Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]Ross Mounce
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome TrustTheContentMine
 
Sharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yetSharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yetRoss Mounce
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData TheContentMine
 
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome CampusBibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome CampusDuncan Hull
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europepetermurrayrust
 
Open Access for Early Career Researchers
Open Access for Early Career ResearchersOpen Access for Early Career Researchers
Open Access for Early Career ResearchersRoss Mounce
 
Ontologies in Physical Science
Ontologies in Physical ScienceOntologies in Physical Science
Ontologies in Physical Sciencepetermurrayrust
 

Tendances (20)

Content Mining for Machines and Humans
Content Mining for Machines and HumansContent Mining for Machines and Humans
Content Mining for Machines and Humans
 
Disruptive Communities and Technology
Disruptive Communities and TechnologyDisruptive Communities and Technology
Disruptive Communities and Technology
 
Ebi
EbiEbi
Ebi
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
 
Petermrjisc20141201
Petermrjisc20141201Petermrjisc20141201
Petermrjisc20141201
 
Embrace the Open Revolution
Embrace the Open RevolutionEmbrace the Open Revolution
Embrace the Open Revolution
 
ContentMine and WikiData
ContentMine and WikiDataContentMine and WikiData
ContentMine and WikiData
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 
Copyright Reform and Open Data
Copyright Reform and Open DataCopyright Reform and Open Data
Copyright Reform and Open Data
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
Principles and practice of Open Science
Principles and practice of Open SciencePrinciples and practice of Open Science
Principles and practice of Open Science
 
Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 
Sharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yetSharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yet
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData
 
Making Theses USEFUL
Making Theses USEFULMaking Theses USEFUL
Making Theses USEFUL
 
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome CampusBibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europe
 
Open Access for Early Career Researchers
Open Access for Early Career ResearchersOpen Access for Early Career Researchers
Open Access for Early Career Researchers
 
Ontologies in Physical Science
Ontologies in Physical ScienceOntologies in Physical Science
Ontologies in Physical Science
 

Similaire à Liberating Scientific Facts for Humanity Through Open Content Mining

Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search petermurrayrust
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesTheContentMine
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic BiologyTheContentMine
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biologypetermurrayrust
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and  Medicine from the scholarly literatureAutomatic Extraction of Science and  Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literaturepetermurrayrust
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literatureAutomatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literatureTheContentMine
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)TheContentMine
 
Open Data and Open Science
Open Data and Open ScienceOpen Data and Open Science
Open Data and Open ScienceTheContentMine
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in NeuroscienceTheContentMine
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in NeuroscienceTheContentMine
 
Weller pleasures+perils social media
Weller pleasures+perils social mediaWeller pleasures+perils social media
Weller pleasures+perils social mediaKatrin Weller
 
Open Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteOpen Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteTheContentMine
 
The culture of researchData
The culture of researchDataThe culture of researchData
The culture of researchDatapetermurrayrust
 
The End(s) of e-Research
The End(s) of e-ResearchThe End(s) of e-Research
The End(s) of e-ResearchEric Meyer
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
 
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...Academia Sinica
 
Data, Science, Society - Claudio Gutierrez, University of Chile
Data, Science, Society - Claudio Gutierrez, University of ChileData, Science, Society - Claudio Gutierrez, University of Chile
Data, Science, Society - Claudio Gutierrez, University of ChileLEARN Project
 
State of the Art Informatics for Research Reproducibility, Reliability, and...
 State of the Art  Informatics for Research Reproducibility, Reliability, and... State of the Art  Informatics for Research Reproducibility, Reliability, and...
State of the Art Informatics for Research Reproducibility, Reliability, and...Micah Altman
 
Open data: Enhancing preservation, reproducibility, and innovation
Open data: Enhancing preservation, reproducibility, and innovationOpen data: Enhancing preservation, reproducibility, and innovation
Open data: Enhancing preservation, reproducibility, and innovationciakov
 
Semantic Web in Physical Science
Semantic Web in Physical ScienceSemantic Web in Physical Science
Semantic Web in Physical Sciencepetermurrayrust
 

Similaire à Liberating Scientific Facts for Humanity Through Open Content Mining (20)

Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machines
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biology
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biology
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and  Medicine from the scholarly literatureAutomatic Extraction of Science and  Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literature
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literatureAutomatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literature
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
 
Open Data and Open Science
Open Data and Open ScienceOpen Data and Open Science
Open Data and Open Science
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
Weller pleasures+perils social media
Weller pleasures+perils social mediaWeller pleasures+perils social media
Weller pleasures+perils social media
 
Open Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteOpen Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics Institute
 
The culture of researchData
The culture of researchDataThe culture of researchData
The culture of researchData
 
The End(s) of e-Research
The End(s) of e-ResearchThe End(s) of e-Research
The End(s) of e-Research
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?
 
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
 
Data, Science, Society - Claudio Gutierrez, University of Chile
Data, Science, Society - Claudio Gutierrez, University of ChileData, Science, Society - Claudio Gutierrez, University of Chile
Data, Science, Society - Claudio Gutierrez, University of Chile
 
State of the Art Informatics for Research Reproducibility, Reliability, and...
 State of the Art  Informatics for Research Reproducibility, Reliability, and... State of the Art  Informatics for Research Reproducibility, Reliability, and...
State of the Art Informatics for Research Reproducibility, Reliability, and...
 
Open data: Enhancing preservation, reproducibility, and innovation
Open data: Enhancing preservation, reproducibility, and innovationOpen data: Enhancing preservation, reproducibility, and innovation
Open data: Enhancing preservation, reproducibility, and innovation
 
Semantic Web in Physical Science
Semantic Web in Physical ScienceSemantic Web in Physical Science
Semantic Web in Physical Science
 

Plus de petermurrayrust

Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Agepetermurrayrust
 
Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practicepetermurrayrust
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentationpetermurrayrust
 
Can machines understand the scientific literature?
Can machines understand the scientific literature?Can machines understand the scientific literature?
Can machines understand the scientific literature?petermurrayrust
 
OpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestOpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestpetermurrayrust
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentationpetermurrayrust
 
Automatic mining of data from materials science literature
Automatic mining of data from materials science literatureAutomatic mining of data from materials science literature
Automatic mining of data from materials science literaturepetermurrayrust
 
Climate Change and Human Migration
Climate Change and Human MigrationClimate Change and Human Migration
Climate Change and Human Migrationpetermurrayrust
 
openVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusesopenVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusespetermurrayrust
 
XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?petermurrayrust
 
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be BraveEarly Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be Bravepetermurrayrust
 
Early Career Reseachers and Open Healthcare
Early Career Reseachers and Open HealthcareEarly Career Reseachers and Open Healthcare
Early Career Reseachers and Open Healthcarepetermurrayrust
 
Scientific search for everyone
Scientific search for everyoneScientific search for everyone
Scientific search for everyonepetermurrayrust
 
Openplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingOpenplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingpetermurrayrust
 
Extracting science from the archive
Extracting science from the archiveExtracting science from the archive
Extracting science from the archivepetermurrayrust
 
WikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and EverythingWikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and Everythingpetermurrayrust
 
Disrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic ComplexDisrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic Complexpetermurrayrust
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Minepetermurrayrust
 
Young people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge NeocolonialismYoung people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge Neocolonialismpetermurrayrust
 
WikiFactMine: Science for Everyone
WikiFactMine: Science for EveryoneWikiFactMine: Science for Everyone
WikiFactMine: Science for Everyonepetermurrayrust
 

Plus de petermurrayrust (20)

Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practice
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
 
Can machines understand the scientific literature?
Can machines understand the scientific literature?Can machines understand the scientific literature?
Can machines understand the scientific literature?
 
OpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestOpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFest
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
 
Automatic mining of data from materials science literature
Automatic mining of data from materials science literatureAutomatic mining of data from materials science literature
Automatic mining of data from materials science literature
 
Climate Change and Human Migration
Climate Change and Human MigrationClimate Change and Human Migration
Climate Change and Human Migration
 
openVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusesopenVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on viruses
 
XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?
 
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be BraveEarly Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
 
Early Career Reseachers and Open Healthcare
Early Career Reseachers and Open HealthcareEarly Career Reseachers and Open Healthcare
Early Career Reseachers and Open Healthcare
 
Scientific search for everyone
Scientific search for everyoneScientific search for everyone
Scientific search for everyone
 
Openplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingOpenplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searching
 
Extracting science from the archive
Extracting science from the archiveExtracting science from the archive
Extracting science from the archive
 
WikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and EverythingWikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and Everything
 
Disrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic ComplexDisrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic Complex
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Mine
 
Young people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge NeocolonialismYoung people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge Neocolonialism
 
WikiFactMine: Science for Everyone
WikiFactMine: Science for EveryoneWikiFactMine: Science for Everyone
WikiFactMine: Science for Everyone
 

Liberating Scientific Facts for Humanity Through Open Content Mining