SlideShare une entreprise Scribd logo
The nature.com
ontologies portal
nature.com/ontologies
Tony Hammond, Michele Pasin
Who we are
We are both part of Macmillan Science and Education*
- Macmillan S&E is a global STM publisher
- Tony Hammond is Data Architect, Technology
@tonyhammond
- Michele Pasin is Information Architect, Product Office
@lambdaman
* We merged earlier this year (May 2015) with Springer Science+Business Media
to become Springer Nature. We are currently actively engaged in integrating our
businesses.
Macmillan: science and education brands
We publish a lot of science! (1845-2015)
http://www.nature.com/developers/hacks/articles/by-year
1,2 million articles in total
Why we’re here today: to ask some questions
We have been making semantic data available in RDF models for a number of
years through our data.nature.com portal (2012–2015)
Big questions:
- Is this data of any use to the Linked Science community?
- Should Springer Nature continue to invest in LOD sharing?
More specifically:
- Does the data contain enough items of interest? [Content]
- Are the vocabularies understandable and useful? [Structure]
- Are the data easy to get and to reuse? [Accessibility]
- Is dereference / download / query the preferred option?
Our goals and rationale
- Semantic technologies are a promising way to do enterprise metadata
management at web scale
- Initially used primarily for data publishing / sharing (data.nature.com, 2011)
- Since 2013, a core component of our digital publishing workflow (see ISWC14 paper)
- Contributing to an emerging web of linked science data
- As a major publisher since 1845, ideally positioned to bootstrap a science ‘publications hub’
- Building on the fundamental ties that exist between the actual research works and the
publications that tell the story about it
The vision of a science graph
Zooming into the science graph
Implementing this vision
- Step 1: Linked Data Platform (2012–2014)
- datasets
- downloads + SPARQL endpoints (streaming, non-streaming)
- linked data dereference
- Step 2: Ontologies Portal (2015–)
- datasets + models (core, domain)
- downloads
- extensive documentation
The Ontologies Portal
www.nature.com/ontologies
Architecture
The core ontology
- Language: OWL 2, Profile: ALCHI(D)
- Entities: ~50 classes, ~140 properties
- Principles: Incremental Formation / Enterprise Integration / Model Coherence
http://www.nature.com/ontologies/core/
The core ontology: mappings
:Asset
:Thing
:Publicat ion
:Concept
:Event
:Subj ect
:Type
:Agent
:Art icleType
:Publishing
Event
:Aggregat ion
Event
:Component
:Document
:Serial
cidoc-crm:
Information_Carrier
cidoc-crm:
Conceptual_Object
dbpedia:Agent
dc:Agent
dcterms:Agent
cidoc-crm:Agent
vcard:Agent
foaf:Agent
event:Event
bibo:Event
schema:Event
cidoc-crm:
TemporalEntity
cidoc-crm:Type
vcard:Type
fabio:SubjectTerm
bibo:Document
cidoc-crm:Document
foaf:Document
bibo:Periodical
fabio:Periodical
schema:Periodical
bibo:DocumentPart
fabio:Expression
cidoc-crm:InformationObject
= owl:equivalentClass
http://www.nature.com/ontologies/linksets/core/
Domain models
Domain models: subjects
- Structure: SKOS, multi hierarchical tree, 6 branches, 7 levels of depth
- Entities: ~2500 concepts
- Mappings: 100% of terms, using skos:broadMatch or skos:closeMatch
www.nature.com/ontologies/models/subjects/
http://www.nature.com/developers/hacks/#1
Subject ontology visualizations
Domain models: mappings
Article Types
Subjects
Journals
Relations
http://www.nature.com/ontologies/linksets
Datasets
- Articles: 25m records (for 1.2m articles) with metadata like title, publication etc.. except authors
- Contributors: 11m records (for 2.7m contributors) i.e. the article’s authors, structured and ordered
but not disambiguated
- Citations: 218m records (for 9.3m citations) – from an earlier release
Datasets: articles-wikipedia links
How: data extracted using wikipedia search API, 51,309 links over 145 years
Quality: only ~900 were links to nature.com without a DOI, rest all use DOIs correctly
Encoding: cito:isCitedBy => wiki URL, foaf:topic => dbPedia URI
http://www.nature.com/developers/hacks/wikilinks
Data publishing: sources
Sources:
Ontologies (small scale; RDF native)
- mastered as RDF data (Turtle)
- managed in GitHub
- in-memory RDF models built using Apache Jena
- models augmented at build time using SPIN rules
- deployed to MarkLogic as RDF/XML for query
- exported as RDF dataset (Turtle) and as CSV
Documents (large scale; XML native)
- mastered as XML data
- managed in MarkLogic XML database
- data mined from XML documents (1.2m articles) using Scala
- in-memory RDF models built using Apache Jena
- injected as RDF/XML sections into XML documents for query
- exported as RDF dataset (N-Quads)
Organization:
Named graphs – one graph per class
Data publishing: workflows
Data publishing: rules (basic inference)
construct {
?s npg:publicationStartYear ?xds1 .
?s npg:publicationStartYearMonth ?xds2 .
?s npg:publicationStartDate ?xds3 .
?s npg:publicationEndYear ?xde1 .
?s npg:publicationEndYearMonth ?xde2 .
?s npg:publicationEndDate ?xde3 .
}
where {
?s a npg:Journal .
optional { ?s npg:dateStart ?dateStart } optional { ?s npg:dateEnd ?dateEnd }
{
bind (if(regex(?dateStart, "^d{4}"), substr(?dateStart,1,4), "") as ?ds1)
bind (xsd:gYear(?ds1) as ?xds1)
} union {
bind (if(regex(?dateStart, "^d{4}-d{2}"), substr(?dateStart,1,7), "") as ?ds2)
bind (xsd:gYearMonth(?ds2) as ?xds2)
} union {
bind (if(regex(?dateStart, "^d{4}-d{2}-d{2}$"), substr(?dateStart,1,10), "") as ?ds3)
bind (xsd:date(?ds3) as ?xds3)
} union {
…
}
filter (?xds1 != "" || ?xds2 != "" || ?xds3 != "" || ?xde1 != "" || ?xde2 != "" || ?xde3 != "")
}
Data publishing: rules (validation)
construct {
npgg:journals npg:hasConstraintViolation [
a spin:ConstraintViolation ;
npg:severityLevel "Warning" ;
rdfs:label ?message ;
spin:rule [ a sp:Construct ; sp:text ?query ; ] ;
] .
}
where {
{ select (count(?s) as ?count)
where {
?s a npg:Journal .
filter ( not exists { ?s bibo:shortTitle ?h . } ) }
}
bind (concat("! Found ", str(?count), " journals with no short title") as ?message)
bind (""”
construct {
npgg:journals npg:hasConstraintViolation [
a spin:ConstraintViolation ;
spin:violationRoot ?s ; … ] .
} where { … }
""" as ?query)
}
Data publishing: rules (contracts)
knowledge-bases:public
...
npg:hasContract [
rdfs:comment "Contract for ArticleTypes Ontology" ;
npg:graph npgg:article-types ;
npg:hasBinding [
npg:onOntology article-types: ;
npg:allowsPredicate
dc:creator , dc:date , dc:publisher , dc:rights , dcterms:license ,
npg:webpage , owl:imports , owl:versionInfo , rdf:type , rdfs:comment ,
skos:definition , skos:prefLabel , skos:note ,
vann:preferredNamespacePrefix , vann:preferredNamespaceUri
;
] , [
npg:onInstanceOf npg:ArticleType ;
npg:allowsPredicate
npg:hasRoot , npg:isPrimaryArticleType ,
npg:id , npg:isLeaf , npg:isRoot , npg:treeDepth ,
rdf:type , rdfs:isDefinedBy , rdfs:seeAlso ,
skos:broadMatch , skos:broader , skos:closeMatch ,
skos:definition , skos:exactMatch , skos:inScheme , skos:narrower ,
skos:prefLabel , skos:relatedMatch , skos:topConceptOf
;
] ;
] ;
...
Data publishing: contracts workflow
Next steps
More features:
- Linked data dereference
- Richer dataset descriptions (VoID, PROV, HCLS Profile, etc.)
- SPARQL endpoint?
- JSON-LD API?
More data:
- Adding extra data points (funding info, abstracts, …)
- Revamp citations dataset
- Longer term: extending archive to include Springer content
More feedback:
- User testing around data accessibility
- Surveying communities/users for this data
Looking ahead: how can a publisher make linked
science happen?
From a business perspective:
- Finding adequate licensing solutions
- Justifying the effort to publishers
- Who uses this data? What’s the ROI?
From a communities perspective:
- Do we actually know who are the users?
- How do we get more feedback/uptake?
- Should we work more with non-linked-data communities?

Contenu connexe

Tendances

Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing Webcrawl
Primal Pappachan
 
Web of Data Usage Mining
Web of Data Usage MiningWeb of Data Usage Mining
Web of Data Usage Mining
Markus Luczak-Rösch
 

Tendances (20)

Using the whole web as your dataset
Using the whole web as your datasetUsing the whole web as your dataset
Using the whole web as your dataset
 
30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real world30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real world
 
A Semantic Data Model for Web Applications
A Semantic Data Model for Web ApplicationsA Semantic Data Model for Web Applications
A Semantic Data Model for Web Applications
 
Knowledge discoverylaurahollink
Knowledge discoverylaurahollinkKnowledge discoverylaurahollink
Knowledge discoverylaurahollink
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
 
Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing Webcrawl
 
Repeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data AgnosticRepeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data Agnostic
 
Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021
 
(PROJEKTURA) Big Data Open Data story for TGG
(PROJEKTURA) Big Data Open Data story for TGG(PROJEKTURA) Big Data Open Data story for TGG
(PROJEKTURA) Big Data Open Data story for TGG
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
Knowledge graphs on the Web
Knowledge graphs on the WebKnowledge graphs on the Web
Knowledge graphs on the Web
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
 
Web of Data Usage Mining
Web of Data Usage MiningWeb of Data Usage Mining
Web of Data Usage Mining
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
 
Schema.org - An Extending Influence
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending Influence
 
(Enterprise) Linked Data Platform a new standard to manage LOD
(Enterprise) Linked Data Platform a new standard to manage LOD(Enterprise) Linked Data Platform a new standard to manage LOD
(Enterprise) Linked Data Platform a new standard to manage LOD
 
Danbri Drupalcon Export
Danbri Drupalcon ExportDanbri Drupalcon Export
Danbri Drupalcon Export
 
The Lonesome LOD Cloud
The Lonesome LOD CloudThe Lonesome LOD Cloud
The Lonesome LOD Cloud
 

Similaire à The nature.com ontologies portal: nature.com/ontologies

Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012
scorlosquet
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
Flink Forward
 
The Future of Search and SEO in Drupal
The Future of Search and SEO in DrupalThe Future of Search and SEO in Drupal
The Future of Search and SEO in Drupal
scorlosquet
 

Similaire à The nature.com ontologies portal: nature.com/ontologies (20)

Publishing Linked Data using Schema.org
Publishing Linked Data using Schema.orgPublishing Linked Data using Schema.org
Publishing Linked Data using Schema.org
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
 
SemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in PracticeSemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in Practice
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...
 
Linked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryLinked Open Data Utrecht University Library
Linked Open Data Utrecht University Library
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Research
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked data
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
 
Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
The Future of Search and SEO in Drupal
The Future of Search and SEO in DrupalThe Future of Search and SEO in Drupal
The Future of Search and SEO in Drupal
 

Plus de Tony Hammond

Agile Descriptions
Agile DescriptionsAgile Descriptions
Agile Descriptions
Tony Hammond
 

Plus de Tony Hammond (11)

XMP Inspector
XMP InspectorXMP Inspector
XMP Inspector
 
Data Integration & Disintegration: Managing SN SciGraph with SHACL and OWL
Data Integration & Disintegration: Managing SN SciGraph with SHACL and OWLData Integration & Disintegration: Managing SN SciGraph with SHACL and OWL
Data Integration & Disintegration: Managing SN SciGraph with SHACL and OWL
 
Iswc 2014-hammond-pasin-presentation-final
Iswc 2014-hammond-pasin-presentation-finalIswc 2014-hammond-pasin-presentation-final
Iswc 2014-hammond-pasin-presentation-final
 
Techniques used in RDF Data Publishing at Nature Publishing Group
Techniques used in RDF Data Publishing at Nature Publishing GroupTechniques used in RDF Data Publishing at Nature Publishing Group
Techniques used in RDF Data Publishing at Nature Publishing Group
 
nature.com OpenSearch
nature.com OpenSearchnature.com OpenSearch
nature.com OpenSearch
 
Handle 08
Handle 08Handle 08
Handle 08
 
OpenURL - The Rough Guide
OpenURL - The Rough GuideOpenURL - The Rough Guide
OpenURL - The Rough Guide
 
Bionlp 07
Bionlp 07Bionlp 07
Bionlp 07
 
Agile Descriptions
Agile DescriptionsAgile Descriptions
Agile Descriptions
 
Yads
YadsYads
Yads
 
Jisc
JiscJisc
Jisc
 

Dernier

Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdfPests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
PirithiRaju
 
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Sérgio Sacani
 
Isolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxIsolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptx
GOWTHAMIM22
 
The solar dynamo begins near the surface
The solar dynamo begins near the surfaceThe solar dynamo begins near the surface
The solar dynamo begins near the surface
Sérgio Sacani
 

Dernier (20)

Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdfPests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
 
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
 
A Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on EarthA Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on Earth
 
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
 
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
 
electrochemical gas sensors and their uses.pptx
electrochemical gas sensors and their uses.pptxelectrochemical gas sensors and their uses.pptx
electrochemical gas sensors and their uses.pptx
 
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
 
Isolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxIsolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptx
 
ERTHROPOIESIS: Dr. E. Muralinath & R. Gnana Lahari
ERTHROPOIESIS: Dr. E. Muralinath & R. Gnana LahariERTHROPOIESIS: Dr. E. Muralinath & R. Gnana Lahari
ERTHROPOIESIS: Dr. E. Muralinath & R. Gnana Lahari
 
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
 
Hemoglobin metabolism: C Kalyan & E. Muralinath
Hemoglobin metabolism: C Kalyan & E. MuralinathHemoglobin metabolism: C Kalyan & E. Muralinath
Hemoglobin metabolism: C Kalyan & E. Muralinath
 
National Biodiversity protection initiatives and Convention on Biological Di...
National Biodiversity protection initiatives and  Convention on Biological Di...National Biodiversity protection initiatives and  Convention on Biological Di...
National Biodiversity protection initiatives and Convention on Biological Di...
 
GBSN - Biochemistry (Unit 4) Chemistry of Carbohydrates
GBSN - Biochemistry (Unit 4) Chemistry of CarbohydratesGBSN - Biochemistry (Unit 4) Chemistry of Carbohydrates
GBSN - Biochemistry (Unit 4) Chemistry of Carbohydrates
 
mixotrophy in cyanobacteria: a dual nutritional strategy
mixotrophy in cyanobacteria: a dual nutritional strategymixotrophy in cyanobacteria: a dual nutritional strategy
mixotrophy in cyanobacteria: a dual nutritional strategy
 
INSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversityINSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere University
 
METHODS OF TRANSCRIPTOME ANALYSIS....pptx
METHODS OF TRANSCRIPTOME ANALYSIS....pptxMETHODS OF TRANSCRIPTOME ANALYSIS....pptx
METHODS OF TRANSCRIPTOME ANALYSIS....pptx
 
Tissue engineering......................pptx
Tissue engineering......................pptxTissue engineering......................pptx
Tissue engineering......................pptx
 
The solar dynamo begins near the surface
The solar dynamo begins near the surfaceThe solar dynamo begins near the surface
The solar dynamo begins near the surface
 
Structural annotation................pptx
Structural annotation................pptxStructural annotation................pptx
Structural annotation................pptx
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 

The nature.com ontologies portal: nature.com/ontologies

  • 2. Who we are We are both part of Macmillan Science and Education* - Macmillan S&E is a global STM publisher - Tony Hammond is Data Architect, Technology @tonyhammond - Michele Pasin is Information Architect, Product Office @lambdaman * We merged earlier this year (May 2015) with Springer Science+Business Media to become Springer Nature. We are currently actively engaged in integrating our businesses.
  • 3. Macmillan: science and education brands
  • 4. We publish a lot of science! (1845-2015) http://www.nature.com/developers/hacks/articles/by-year 1,2 million articles in total
  • 5. Why we’re here today: to ask some questions We have been making semantic data available in RDF models for a number of years through our data.nature.com portal (2012–2015) Big questions: - Is this data of any use to the Linked Science community? - Should Springer Nature continue to invest in LOD sharing? More specifically: - Does the data contain enough items of interest? [Content] - Are the vocabularies understandable and useful? [Structure] - Are the data easy to get and to reuse? [Accessibility] - Is dereference / download / query the preferred option?
  • 6. Our goals and rationale - Semantic technologies are a promising way to do enterprise metadata management at web scale - Initially used primarily for data publishing / sharing (data.nature.com, 2011) - Since 2013, a core component of our digital publishing workflow (see ISWC14 paper) - Contributing to an emerging web of linked science data - As a major publisher since 1845, ideally positioned to bootstrap a science ‘publications hub’ - Building on the fundamental ties that exist between the actual research works and the publications that tell the story about it
  • 7. The vision of a science graph
  • 8. Zooming into the science graph
  • 9. Implementing this vision - Step 1: Linked Data Platform (2012–2014) - datasets - downloads + SPARQL endpoints (streaming, non-streaming) - linked data dereference - Step 2: Ontologies Portal (2015–) - datasets + models (core, domain) - downloads - extensive documentation
  • 12. The core ontology - Language: OWL 2, Profile: ALCHI(D) - Entities: ~50 classes, ~140 properties - Principles: Incremental Formation / Enterprise Integration / Model Coherence http://www.nature.com/ontologies/core/
  • 13. The core ontology: mappings :Asset :Thing :Publicat ion :Concept :Event :Subj ect :Type :Agent :Art icleType :Publishing Event :Aggregat ion Event :Component :Document :Serial cidoc-crm: Information_Carrier cidoc-crm: Conceptual_Object dbpedia:Agent dc:Agent dcterms:Agent cidoc-crm:Agent vcard:Agent foaf:Agent event:Event bibo:Event schema:Event cidoc-crm: TemporalEntity cidoc-crm:Type vcard:Type fabio:SubjectTerm bibo:Document cidoc-crm:Document foaf:Document bibo:Periodical fabio:Periodical schema:Periodical bibo:DocumentPart fabio:Expression cidoc-crm:InformationObject = owl:equivalentClass http://www.nature.com/ontologies/linksets/core/
  • 15. Domain models: subjects - Structure: SKOS, multi hierarchical tree, 6 branches, 7 levels of depth - Entities: ~2500 concepts - Mappings: 100% of terms, using skos:broadMatch or skos:closeMatch www.nature.com/ontologies/models/subjects/
  • 17. Domain models: mappings Article Types Subjects Journals Relations http://www.nature.com/ontologies/linksets
  • 18. Datasets - Articles: 25m records (for 1.2m articles) with metadata like title, publication etc.. except authors - Contributors: 11m records (for 2.7m contributors) i.e. the article’s authors, structured and ordered but not disambiguated - Citations: 218m records (for 9.3m citations) – from an earlier release
  • 19. Datasets: articles-wikipedia links How: data extracted using wikipedia search API, 51,309 links over 145 years Quality: only ~900 were links to nature.com without a DOI, rest all use DOIs correctly Encoding: cito:isCitedBy => wiki URL, foaf:topic => dbPedia URI http://www.nature.com/developers/hacks/wikilinks
  • 20. Data publishing: sources Sources: Ontologies (small scale; RDF native) - mastered as RDF data (Turtle) - managed in GitHub - in-memory RDF models built using Apache Jena - models augmented at build time using SPIN rules - deployed to MarkLogic as RDF/XML for query - exported as RDF dataset (Turtle) and as CSV Documents (large scale; XML native) - mastered as XML data - managed in MarkLogic XML database - data mined from XML documents (1.2m articles) using Scala - in-memory RDF models built using Apache Jena - injected as RDF/XML sections into XML documents for query - exported as RDF dataset (N-Quads) Organization: Named graphs – one graph per class
  • 22. Data publishing: rules (basic inference) construct { ?s npg:publicationStartYear ?xds1 . ?s npg:publicationStartYearMonth ?xds2 . ?s npg:publicationStartDate ?xds3 . ?s npg:publicationEndYear ?xde1 . ?s npg:publicationEndYearMonth ?xde2 . ?s npg:publicationEndDate ?xde3 . } where { ?s a npg:Journal . optional { ?s npg:dateStart ?dateStart } optional { ?s npg:dateEnd ?dateEnd } { bind (if(regex(?dateStart, "^d{4}"), substr(?dateStart,1,4), "") as ?ds1) bind (xsd:gYear(?ds1) as ?xds1) } union { bind (if(regex(?dateStart, "^d{4}-d{2}"), substr(?dateStart,1,7), "") as ?ds2) bind (xsd:gYearMonth(?ds2) as ?xds2) } union { bind (if(regex(?dateStart, "^d{4}-d{2}-d{2}$"), substr(?dateStart,1,10), "") as ?ds3) bind (xsd:date(?ds3) as ?xds3) } union { … } filter (?xds1 != "" || ?xds2 != "" || ?xds3 != "" || ?xde1 != "" || ?xde2 != "" || ?xde3 != "") }
  • 23. Data publishing: rules (validation) construct { npgg:journals npg:hasConstraintViolation [ a spin:ConstraintViolation ; npg:severityLevel "Warning" ; rdfs:label ?message ; spin:rule [ a sp:Construct ; sp:text ?query ; ] ; ] . } where { { select (count(?s) as ?count) where { ?s a npg:Journal . filter ( not exists { ?s bibo:shortTitle ?h . } ) } } bind (concat("! Found ", str(?count), " journals with no short title") as ?message) bind (""” construct { npgg:journals npg:hasConstraintViolation [ a spin:ConstraintViolation ; spin:violationRoot ?s ; … ] . } where { … } """ as ?query) }
  • 24. Data publishing: rules (contracts) knowledge-bases:public ... npg:hasContract [ rdfs:comment "Contract for ArticleTypes Ontology" ; npg:graph npgg:article-types ; npg:hasBinding [ npg:onOntology article-types: ; npg:allowsPredicate dc:creator , dc:date , dc:publisher , dc:rights , dcterms:license , npg:webpage , owl:imports , owl:versionInfo , rdf:type , rdfs:comment , skos:definition , skos:prefLabel , skos:note , vann:preferredNamespacePrefix , vann:preferredNamespaceUri ; ] , [ npg:onInstanceOf npg:ArticleType ; npg:allowsPredicate npg:hasRoot , npg:isPrimaryArticleType , npg:id , npg:isLeaf , npg:isRoot , npg:treeDepth , rdf:type , rdfs:isDefinedBy , rdfs:seeAlso , skos:broadMatch , skos:broader , skos:closeMatch , skos:definition , skos:exactMatch , skos:inScheme , skos:narrower , skos:prefLabel , skos:relatedMatch , skos:topConceptOf ; ] ; ] ; ...
  • 26. Next steps More features: - Linked data dereference - Richer dataset descriptions (VoID, PROV, HCLS Profile, etc.) - SPARQL endpoint? - JSON-LD API? More data: - Adding extra data points (funding info, abstracts, …) - Revamp citations dataset - Longer term: extending archive to include Springer content More feedback: - User testing around data accessibility - Surveying communities/users for this data
  • 27. Looking ahead: how can a publisher make linked science happen? From a business perspective: - Finding adequate licensing solutions - Justifying the effort to publishers - Who uses this data? What’s the ROI? From a communities perspective: - Do we actually know who are the users? - How do we get more feedback/uptake? - Should we work more with non-linked-data communities?

Notes de l'éditeur

  1. main questions for the presentation > strucutre and mappings; accesisble enoguh? > content: big enough? > accessibility: need more ? > overall: is this useful? should NPG stop releasing these data and keep using it only for internal purposes?
  2. ideally link to online representation
  3. main questions for the presentation > structure and mappings; accessible enough? > content: big enough? > accessibility: need more ? > overall: is this useful? should NPG stop releasing these data and keep using it only for internal purposes? > data torrents?
  4. slide about vision [1]
  5. slide about vision [2]
  6. The core model is a formal model that defines the key concepts we use for content publishing. It includes branches that describe the things we publish (publications), the things we use to categorise the things we publish (types) and more abstract concepts to document details of the publication workflow (events). In designing the Core Ontology, we adhered to three main principles: Incremental formalization We started out with a relatively flat model and tested it against our use cases and system architecture adding additional structure as more precise requirements were made available. The choice of names for classes and properties has also been tested and validated against our target audience and the enterprise use cases. Cohesiveness Although we do make some use of public vocabularies such as BIBO and FOAF, in general we decided to follow a minimal commitment to external vocabularies as that would let us retain more control over our model and also create a much more cohesive ontology. This is mainly because currently our main driver is to support internal applications. In order to facilitate web-scale data integration we have whenever possible added mappings to other commonly used vocabularies, e.g. BIBO, FABIO and FOAF, via owl:equivalentClass and owl:equivalentProperty relationships. Focus on integration We have primarily focused on building a shared enterprise model, e.g. by getting the core classes and properties right and thus achieving some simple yet fundamental level of data integration. So even though we make use of SPIN rules and some basic inference in the data enrichment phase, we have not yet really taken advantage of the various inference mechanisms that can be built on top of OWL. Overall, the Core Ontology represents a measured balance between supporting legacy practices (some stretching back over many years) and enabling new requirements (which may only be revealed incrementally). It has been developed and grown within a cross-functional software delivery team. Some of the modelling clearly reflects immediate pragmatic concerns and the 'operational semantics' originating from our specific system architecture, but are included here to show how we are using this ontology to drive forward our content publishing and discovery processes.
  7. The Core Ontology is mapped to a number of external ontologies. We use owl:equivalentClass and owl:equivalentProperty properties to map our classes (>70 mappings) and properties (>30 mappings), respectively. This a work in progress as we are constantly trying to improve the precision and variety of our mappings. We would encourage any interested party to give us feedback and suggestions about other models we should link to.
  8. > The Subjects Ontology is mapped to the DBpedia and Wikidata datasets and also to the Bio2RDF and MeSH datasets. We use a skos:broadMatch or skos:closeMatch property to map our subjects instances.
  9. Most mature