SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
UNLEASH THE TRIPLE:
LEVERAGING A CORPORATE DISCOVERY INTERFACE
THE OECD CASE
Mary-Ann Grosset, Digital Practice Team Manager
Thierry Vebr, Coordinator Information Systems
Jan-Anno Schuur, Coordinator Information Specialist
12 September 2017
www.oecd.org
35 member countries
Mission: promote policies that will improve the economic and social well-being of people around the world
OECD
Organisation for Economic Co-operation and Development
OECD’s way of working
Reading Assistant
Less time to identify and collect information
More time available to analyse
Internal and external sources
No language or word constraints
Fully semantic environment
Content and queries tagged using semantic robots using taxonomies/ontologies and linguistic patterns
Tags stored in RDF to link content
Makes suggestions to:
• Narrow down search
• Discover new areas
Resources
11-Jul-17
Number of resources Semantic annotations
Central taxonomies 10,234 n/a
Domain-specific taxonomies 29,081 n/a
Indicators 8,245 944,279
Models and ontologies 1,463 n/a
Internal resources 295,637 4,614,464
Datasets 1,281 108,930
External resources 4,703,396 40,398,473
Surveys 44,405 87,140
Total 5,093,742 46,153,286
• What:
– Put structure into unstructured information
– Extract knowledge from content
– Enable repurposing of knowledge and content nuggets
• How:
– Fragmenting content (information)
– Storing content as structured XML
– using taxonomies, semantic tagging
– Storing taxonomies, tags, links and triples
– Using linked data capabilities
• Deliverables:
– PoC to search and discover content (October 2015)
– Complex semantic analysis tools identifying indicators, trends,…. (from April 2015 to current)
– Project specific reading and tagging assistants (from May 2016 to current)
– Reading assistant (April 2017)
How did we get to where we are
• Move from Librarians/Records Managers to Taxonomy , semantic and linked data experts
• Training/workshops in using applications and creating services
• Practical hands-on experience to create and maintain
• Stakeholder complex semantic use cases
• Agile methodology
Phase 1 : Capacity building
September 2014 to October 2015
Creativity
Innovation
Curiosity
Motivation
And a lot of
Dedication
Team work
Collaboration
Enthusiasm
• Corporate taxonomies
– Topics
– Geographical areas
• Semantic robots to
– Identify candidate terms to update the corporate taxonomies
– Identify topics/geographical areas used in information at document level
– Identify topics/geographical areas used at fragment level
• Modelise content
– Flexible , adaptable
– Easy to expand
– Data source Independent
• Store all content as RDF
• Develop PoC
Phase 2: Building the foundations
January to October 2015
• Analyse sections of reports using a domain specific ontology and develop a
service to allow manual tagging of sections, Display results in context
• Analyse responses to questionnaires at fragment level using central topics and
geographical area ontologies; identify trends and present results in context
• Analyse legislation from 4 different countries (step 1) in French, English and
Spanish to identify best topics and/or all topics, depending on the desired view
• Analyse press releases from Reuters and AFP to identify acts of civil unrest in
African countries; develop a service to allow manual validation of indicators
following their identification through complex semantic tagging
• Enrich articles from Elsevier, and RSS feeds to present targeted alerts to
stakeholders
• Develop a products taxonomy to semantically tag product Recalls
Phase 3: Stakeholder semantic applications
November 2015 to today
• Build on experience:
– Creating taxonomies/ontologies
– Creating diverse and complex semantic robots and web-services
– Creating complex SPARQL queries
– Developing easy to use solutions to meet complex stakeholder semantic analysis needs
• Develop a fully semantic and linked data environment
• One-stop shop for policy analysts to conduct research
• Semantically enrich content from internal and external sources
Phase 4: Digital strategy project
November 2016 to today (April 2017 for V1 release)
The Team
Knowledge Management Unit
Taxonomy
Semantic robots
Corpora
Validation
Disambiguation
External developersKey stakeholders
Development and maintenance
Model
SPARQL queries
Knowledge contextualisation
Maintenance of environments
(taxonomy, semantic),
triple and XML stores
Functionality
Domain specific taxonomies
Semantic requirements
UX design
Interface development
Architecture
Semantic Layer
Data hub
14
Breaking the barriers
Create concepts and link them so that machines can use them
as a cloud of knowledge (Semantic layer)
Use in:
Semantic analysis tools
Search and discover tools
Capture and formalise Subject-Matter-Experts’ knowledge
Identify synonyms, related terms in official languages
Use URIs not labels
Of LANGUAGE and WORDS
Taxonomies/Ontologies
Breaking the barriers
ingredientof
hates
http://kim.oecd.org/Topics#T4658
vaches
cows
http://kim.oecd.org/Topics#T127
Fondue au fromage
Cheese fondue
http://kim.oecd.org/Topics#T6657
lait
milk
gruyère
http://kim.oecd.org/Topics#T909
gruyère
http://kim.oecd.org/Topics#T222
fromage
cheese
http://kim.oecd.org/Topics#T7
520
chats
cats
matous
http://kim.oecd.org/Topics#T75
20
souris
mice
SEMANTICS
FUN AND GAMES
• Use lexicon (taxonomies)
• Use text patterns (part of speech)
• Identify relationships (ontologies)
• Test the results on a set of documents
• Debug - Disambiguate
• Test on the complete corpus
• Put in production using Web Services
How do we create the semantic robots?
36 different robots in production
• January 2017
Semantic enrichment
Search for protests in
Africa
Returns articles covering protests, demonstrations, manifestations, etc…
Indicator 1 – Strike in the Public sector
Sector: Public Event:
Strike
Must
contain
Identifying knowledge nuggets
Event:
Strike
Sector: Public
Indicator 1 – Strike in the Public sector
How do we achieve this:
• Extensive work on taxonomies
• Selection of Corpora
• Validation
• Disambiguation (limit fuzziness, provide context, specify acronyms…)
Recall, Precision and F-Measure
Average rate 95% - Less not acceptable
Golden Corpora and Validation
Corpus (group) of documents used to validate and check the quality of
the skill cartridges used to semantically enrich information
The OECD Golden Corpora contain
– OECD Publications
– Official Documents
– 17 themes
– both official languages
– items manually tagged against the central taxonomy
Golden Corpora :Size and Quality
• 1.223 publications
• Error margin of 0,9.
Theme Total XML En Fr % XML
Agriculture 45 18 9 9 40
Development Cooperation 89 36 18 18 40
Economic Policy 176 163 84 79 93
Education 68 22 16 6 32
Employment 42 22 11 11 52
Energy 76 4 2 2 5
Enterprise Entrepreneurship 76 44 30 14 58
Environment 88 56 30 26 64
Finance Investment 167 111 58 53 66
Public Governance 94 66 38 28 70
Science 56 32 16 16 57
Social issues 70 52 27 25 74
Taxation 37 23 15 8 62
Territorial Development 37 28 15 13 76
Trade 61 19 12 7 31
Transport 41 8 3 5 20
Total 1223 704 384 320 58
Provide a realistic overview of the
content and volume covered in the
different domains.
OECD index for cartridge reference created
with the XML publications.
The index provides the basis of OECD
language that is used by the cartridge to
rank extraction results.
Disambiguation
It’s raining cats and dogs
Disambiguation is key & takes a lot of time
Queen Elisabeth
We use several semantic robots,
based on several different taxonomies
(generic, innovation-oriented, etc…)
Multi-view annotation graphs
We tag a same resource in different ways
We can see a same resource with different « semantic » angles of view
Demonstration
• 5-year programme
• Looking for candidates from OECD member countries
for 6 month internships, or short term assignments
• To work with us on specific semantic projects
• Can be associated to a PhD thesis
• 6 to 18K€ funding envelope per project
Outreach
mary-ann.grosset@oecd.org
Knowledge Management Unit
Digital Knowledge and Information Services
2 rue André Pascal
75775 Paris cédex 16
France
Tel: +331 45 24 81 53
Contact
For additional information, internship possibilities to work on semantic projects :

Contenu connexe

Tendances

Language technology market and components taxonomy
Language technology market and components taxonomyLanguage technology market and components taxonomy
Language technology market and components taxonomyPretaLLOD
 
Patent scope ompi
Patent scope ompiPatent scope ompi
Patent scope ompiLATIPAT
 
EOSC FAIR Data Session - EOSC Stakeholders Forum 2018
EOSC FAIR Data Session - EOSC Stakeholders Forum 2018EOSC FAIR Data Session - EOSC Stakeholders Forum 2018
EOSC FAIR Data Session - EOSC Stakeholders Forum 2018EOSCpilot .eu
 
Automated interpretability of linked data ontologies: an evaluation within th...
Automated interpretability of linked data ontologies: an evaluation within th...Automated interpretability of linked data ontologies: an evaluation within th...
Automated interpretability of linked data ontologies: an evaluation within th...Nuno Freire
 
Integration and Exploration of Financial Data using Semantics and Ontologies
Integration and Exploration of Financial Data using Semantics and OntologiesIntegration and Exploration of Financial Data using Semantics and Ontologies
Integration and Exploration of Financial Data using Semantics and OntologiesRoberto García
 
The creation of an international core data model
The creation of an international core data modelThe creation of an international core data model
The creation of an international core data modelSemic.eu
 
ICIC 2017: New product presentation EXPERT SYSTEM
ICIC 2017: New product presentation EXPERT SYSTEMICIC 2017: New product presentation EXPERT SYSTEM
ICIC 2017: New product presentation EXPERT SYSTEMDr. Haxel Consult
 
IPv6 Observatory outomes
IPv6 Observatory outomesIPv6 Observatory outomes
IPv6 Observatory outomesFabrice Clari
 
Harvesting Repositories: DPLA, Europeana, & Other Case Studies
Harvesting Repositories:  DPLA, Europeana, & Other Case StudiesHarvesting Repositories:  DPLA, Europeana, & Other Case Studies
Harvesting Repositories: DPLA, Europeana, & Other Case Studieseohallor
 
Building NextGen Enterprise data platforms | Graham Cousins
Building NextGen Enterprise data platforms | Graham CousinsBuilding NextGen Enterprise data platforms | Graham Cousins
Building NextGen Enterprise data platforms | Graham CousinsConnected Data World
 
Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016Oscar Corcho
 
Semantic interoperability courses training module 2 - core vocabularies v0.11
Semantic interoperability courses   training module 2 - core vocabularies v0.11Semantic interoperability courses   training module 2 - core vocabularies v0.11
Semantic interoperability courses training module 2 - core vocabularies v0.11Semic.eu
 
Open Research in Ireland: Infrastructures for Open Research
Open Research in Ireland: Infrastructures for Open ResearchOpen Research in Ireland: Infrastructures for Open Research
Open Research in Ireland: Infrastructures for Open Researchdri_ireland
 
Europeana Newspapers LFT Infoday Genereux
Europeana Newspapers LFT Infoday GenereuxEuropeana Newspapers LFT Infoday Genereux
Europeana Newspapers LFT Infoday GenereuxEuropeana Newspapers
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult
 

Tendances (20)

Language technology market and components taxonomy
Language technology market and components taxonomyLanguage technology market and components taxonomy
Language technology market and components taxonomy
 
Ready, Set, GO FAIR
Ready, Set, GO FAIRReady, Set, GO FAIR
Ready, Set, GO FAIR
 
Patent scope ompi
Patent scope ompiPatent scope ompi
Patent scope ompi
 
EOSC FAIR Data Session - EOSC Stakeholders Forum 2018
EOSC FAIR Data Session - EOSC Stakeholders Forum 2018EOSC FAIR Data Session - EOSC Stakeholders Forum 2018
EOSC FAIR Data Session - EOSC Stakeholders Forum 2018
 
Semantics and Machine Learning
Semantics and Machine LearningSemantics and Machine Learning
Semantics and Machine Learning
 
Automated interpretability of linked data ontologies: an evaluation within th...
Automated interpretability of linked data ontologies: an evaluation within th...Automated interpretability of linked data ontologies: an evaluation within th...
Automated interpretability of linked data ontologies: an evaluation within th...
 
Integration and Exploration of Financial Data using Semantics and Ontologies
Integration and Exploration of Financial Data using Semantics and OntologiesIntegration and Exploration of Financial Data using Semantics and Ontologies
Integration and Exploration of Financial Data using Semantics and Ontologies
 
The creation of an international core data model
The creation of an international core data modelThe creation of an international core data model
The creation of an international core data model
 
ICIC 2017: New product presentation EXPERT SYSTEM
ICIC 2017: New product presentation EXPERT SYSTEMICIC 2017: New product presentation EXPERT SYSTEM
ICIC 2017: New product presentation EXPERT SYSTEM
 
IPv6 Observatory outomes
IPv6 Observatory outomesIPv6 Observatory outomes
IPv6 Observatory outomes
 
Harvesting Repositories: DPLA, Europeana, & Other Case Studies
Harvesting Repositories:  DPLA, Europeana, & Other Case StudiesHarvesting Repositories:  DPLA, Europeana, & Other Case Studies
Harvesting Repositories: DPLA, Europeana, & Other Case Studies
 
ICIC 2017:
ICIC 2017: ICIC 2017:
ICIC 2017:
 
Building NextGen Enterprise data platforms | Graham Cousins
Building NextGen Enterprise data platforms | Graham CousinsBuilding NextGen Enterprise data platforms | Graham Cousins
Building NextGen Enterprise data platforms | Graham Cousins
 
Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016
 
Semantic interoperability courses training module 2 - core vocabularies v0.11
Semantic interoperability courses   training module 2 - core vocabularies v0.11Semantic interoperability courses   training module 2 - core vocabularies v0.11
Semantic interoperability courses training module 2 - core vocabularies v0.11
 
The public sector and integrated operations
The public sector and integrated operationsThe public sector and integrated operations
The public sector and integrated operations
 
Open Research in Ireland: Infrastructures for Open Research
Open Research in Ireland: Infrastructures for Open ResearchOpen Research in Ireland: Infrastructures for Open Research
Open Research in Ireland: Infrastructures for Open Research
 
Membership Intro Presentation
Membership Intro PresentationMembership Intro Presentation
Membership Intro Presentation
 
Europeana Newspapers LFT Infoday Genereux
Europeana Newspapers LFT Infoday GenereuxEuropeana Newspapers LFT Infoday Genereux
Europeana Newspapers LFT Infoday Genereux
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 

Similaire à Session 4.2 unleash the triple: leveraging a corporate discovery interface. the oecd case

OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017openminted_eu
 
Ontological Infrastructure for Interoperable Research Information Systems: HE...
Ontological Infrastructure for Interoperable Research Information Systems: HE...Ontological Infrastructure for Interoperable Research Information Systems: HE...
Ontological Infrastructure for Interoperable Research Information Systems: HE...Diego López-de-Ipiña González-de-Artaza
 
SKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSemantic Web Company
 
Smart cities no ai without ia
Smart cities   no ai without iaSmart cities   no ai without ia
Smart cities no ai without iaFredric Landqvist
 
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR CookbookFAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR CookbookSusanna-Assunta Sansone
 
Tag Piccolo: Improving (CRS) codelist management
Tag Piccolo:  Improving (CRS) codelist management Tag Piccolo:  Improving (CRS) codelist management
Tag Piccolo: Improving (CRS) codelist management FAO
 
Digital Skills for FAIR and Open Science
Digital Skills for FAIR and Open Science Digital Skills for FAIR and Open Science
Digital Skills for FAIR and Open Science dri_ireland
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Enrico Motta
 
Research Data Alliance .. The Why, How, What ...
Research Data Alliance .. The Why, How, What ... Research Data Alliance .. The Why, How, What ...
Research Data Alliance .. The Why, How, What ... Research Data Alliance
 
OER World Map Prototypes
OER World Map PrototypesOER World Map Prototypes
OER World Map PrototypesISKME
 
Knowledge exchange consensus on monitoring oa, presentation open aire, oslo, ...
Knowledge exchange consensus on monitoring oa, presentation open aire, oslo, ...Knowledge exchange consensus on monitoring oa, presentation open aire, oslo, ...
Knowledge exchange consensus on monitoring oa, presentation open aire, oslo, ...Michael Svendsen
 
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...Oscar Corcho
 
The OER World Map: Adolescence of a Community Platform for the OER Movement
The OER World Map: Adolescence of a Community Platform for the OER MovementThe OER World Map: Adolescence of a Community Platform for the OER Movement
The OER World Map: Adolescence of a Community Platform for the OER MovementOpen Education Consortium
 
Linked Open Data and Applications
Linked Open Data and Applications Linked Open Data and Applications
Linked Open Data and Applications Victor de Boer
 
Toward FAIR Semantic Resources
Toward FAIR Semantic ResourcesToward FAIR Semantic Resources
Toward FAIR Semantic ResourcesEUDAT
 
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...The Research Council of Norway, IKTPLUSS
 
Apo presentation research librarians day feb 2017
Apo presentation research librarians day feb 2017Apo presentation research librarians day feb 2017
Apo presentation research librarians day feb 2017SusanMRob
 

Similaire à Session 4.2 unleash the triple: leveraging a corporate discovery interface. the oecd case (20)

OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017
 
Promoting (meta)-data standards - The European Commission ISA Programme per...
 Promoting (meta)-data standards- The European Commission ISA Programme per... Promoting (meta)-data standards- The European Commission ISA Programme per...
Promoting (meta)-data standards - The European Commission ISA Programme per...
 
Ontological Infrastructure for Interoperable Research Information Systems: HE...
Ontological Infrastructure for Interoperable Research Information Systems: HE...Ontological Infrastructure for Interoperable Research Information Systems: HE...
Ontological Infrastructure for Interoperable Research Information Systems: HE...
 
SKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategies
 
Smart cities no ai without ia
Smart cities   no ai without iaSmart cities   no ai without ia
Smart cities no ai without ia
 
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR CookbookFAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
 
Tag Piccolo: Improving (CRS) codelist management
Tag Piccolo:  Improving (CRS) codelist management Tag Piccolo:  Improving (CRS) codelist management
Tag Piccolo: Improving (CRS) codelist management
 
Digital Skills for FAIR and Open Science
Digital Skills for FAIR and Open Science Digital Skills for FAIR and Open Science
Digital Skills for FAIR and Open Science
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
 
Lemon at-mlw3
Lemon at-mlw3Lemon at-mlw3
Lemon at-mlw3
 
Research Data Alliance .. The Why, How, What ...
Research Data Alliance .. The Why, How, What ... Research Data Alliance .. The Why, How, What ...
Research Data Alliance .. The Why, How, What ...
 
OER World Map Prototypes
OER World Map PrototypesOER World Map Prototypes
OER World Map Prototypes
 
Knowledge exchange consensus on monitoring oa, presentation open aire, oslo, ...
Knowledge exchange consensus on monitoring oa, presentation open aire, oslo, ...Knowledge exchange consensus on monitoring oa, presentation open aire, oslo, ...
Knowledge exchange consensus on monitoring oa, presentation open aire, oslo, ...
 
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
 
Application of a theory based approach for evaluating knowledge in fao
Application of a theory based approach for evaluating knowledge in faoApplication of a theory based approach for evaluating knowledge in fao
Application of a theory based approach for evaluating knowledge in fao
 
The OER World Map: Adolescence of a Community Platform for the OER Movement
The OER World Map: Adolescence of a Community Platform for the OER MovementThe OER World Map: Adolescence of a Community Platform for the OER Movement
The OER World Map: Adolescence of a Community Platform for the OER Movement
 
Linked Open Data and Applications
Linked Open Data and Applications Linked Open Data and Applications
Linked Open Data and Applications
 
Toward FAIR Semantic Resources
Toward FAIR Semantic ResourcesToward FAIR Semantic Resources
Toward FAIR Semantic Resources
 
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
 
Apo presentation research librarians day feb 2017
Apo presentation research librarians day feb 2017Apo presentation research librarians day feb 2017
Apo presentation research librarians day feb 2017
 

Plus de semanticsconference

Linear books to open world adventure
Linear books to open world adventureLinear books to open world adventure
Linear books to open world adventuresemanticsconference
 
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
Session 1.2   high-precision, context-free entity linking exploiting unambigu...Session 1.2   high-precision, context-free entity linking exploiting unambigu...
Session 1.2 high-precision, context-free entity linking exploiting unambigu...semanticsconference
 
Session 4.3 semantic annotation for enhancing collaborative ideation
Session 4.3   semantic annotation for enhancing collaborative ideationSession 4.3   semantic annotation for enhancing collaborative ideation
Session 4.3 semantic annotation for enhancing collaborative ideationsemanticsconference
 
Session 1.1 dalicc - data licenses clearance center
Session 1.1   dalicc - data licenses clearance centerSession 1.1   dalicc - data licenses clearance center
Session 1.1 dalicc - data licenses clearance centersemanticsconference
 
Session 1.3 context information management across smart city knowledge domains
Session 1.3   context information management across smart city knowledge domainsSession 1.3   context information management across smart city knowledge domains
Session 1.3 context information management across smart city knowledge domainssemanticsconference
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4semanticsconference
 
Session 0.0 keynote sandeep sacheti - final hi res
Session 0.0   keynote sandeep sacheti - final hi resSession 0.0   keynote sandeep sacheti - final hi res
Session 0.0 keynote sandeep sacheti - final hi ressemanticsconference
 
Session 1.1 linked data applied: a field report from the netherlands
Session 1.1   linked data applied: a field report from the netherlandsSession 1.1   linked data applied: a field report from the netherlands
Session 1.1 linked data applied: a field report from the netherlandssemanticsconference
 
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.2   enrich your knowledge graphs: linked data integration with pool...Session 1.2   enrich your knowledge graphs: linked data integration with pool...
Session 1.2 enrich your knowledge graphs: linked data integration with pool...semanticsconference
 
Session 1.4 connecting information from legislation and datasets using a ca...
Session 1.4   connecting information from legislation and datasets using a ca...Session 1.4   connecting information from legislation and datasets using a ca...
Session 1.4 connecting information from legislation and datasets using a ca...semanticsconference
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage informationsemanticsconference
 
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
Session 0.0   media panel - matthias priem - gtuo - semantics 2017Session 0.0   media panel - matthias priem - gtuo - semantics 2017
Session 0.0 media panel - matthias priem - gtuo - semantics 2017semanticsconference
 
Session 1.3 semantic asset management in the dutch rail engineering and con...
Session 1.3   semantic asset management in the dutch rail engineering and con...Session 1.3   semantic asset management in the dutch rail engineering and con...
Session 1.3 semantic asset management in the dutch rail engineering and con...semanticsconference
 
Session 1.3 energy, smart homes & smart grids: towards interoperability...
Session 1.3   energy, smart homes & smart grids: towards interoperability...Session 1.3   energy, smart homes & smart grids: towards interoperability...
Session 1.3 energy, smart homes & smart grids: towards interoperability...semanticsconference
 
Session 1.2 improving access to digital content by semantic enrichment
Session 1.2   improving access to digital content by semantic enrichmentSession 1.2   improving access to digital content by semantic enrichment
Session 1.2 improving access to digital content by semantic enrichmentsemanticsconference
 
Session 2.3 semantics for safeguarding & security – a police story
Session 2.3   semantics for safeguarding & security – a police storySession 2.3   semantics for safeguarding & security – a police story
Session 2.3 semantics for safeguarding & security – a police storysemanticsconference
 
Session 2.5 semantic similarity based clustering of license excerpts for im...
Session 2.5   semantic similarity based clustering of license excerpts for im...Session 2.5   semantic similarity based clustering of license excerpts for im...
Session 2.5 semantic similarity based clustering of license excerpts for im...semanticsconference
 
Session 1.6 slovak public metadata governance and management based on linke...
Session 1.6   slovak public metadata governance and management based on linke...Session 1.6   slovak public metadata governance and management based on linke...
Session 1.6 slovak public metadata governance and management based on linke...semanticsconference
 
Session 5.6 towards a semantic outlier detection framework in wireless sens...
Session 5.6   towards a semantic outlier detection framework in wireless sens...Session 5.6   towards a semantic outlier detection framework in wireless sens...
Session 5.6 towards a semantic outlier detection framework in wireless sens...semanticsconference
 
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...semanticsconference
 

Plus de semanticsconference (20)

Linear books to open world adventure
Linear books to open world adventureLinear books to open world adventure
Linear books to open world adventure
 
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
Session 1.2   high-precision, context-free entity linking exploiting unambigu...Session 1.2   high-precision, context-free entity linking exploiting unambigu...
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
 
Session 4.3 semantic annotation for enhancing collaborative ideation
Session 4.3   semantic annotation for enhancing collaborative ideationSession 4.3   semantic annotation for enhancing collaborative ideation
Session 4.3 semantic annotation for enhancing collaborative ideation
 
Session 1.1 dalicc - data licenses clearance center
Session 1.1   dalicc - data licenses clearance centerSession 1.1   dalicc - data licenses clearance center
Session 1.1 dalicc - data licenses clearance center
 
Session 1.3 context information management across smart city knowledge domains
Session 1.3   context information management across smart city knowledge domainsSession 1.3   context information management across smart city knowledge domains
Session 1.3 context information management across smart city knowledge domains
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
 
Session 0.0 keynote sandeep sacheti - final hi res
Session 0.0   keynote sandeep sacheti - final hi resSession 0.0   keynote sandeep sacheti - final hi res
Session 0.0 keynote sandeep sacheti - final hi res
 
Session 1.1 linked data applied: a field report from the netherlands
Session 1.1   linked data applied: a field report from the netherlandsSession 1.1   linked data applied: a field report from the netherlands
Session 1.1 linked data applied: a field report from the netherlands
 
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.2   enrich your knowledge graphs: linked data integration with pool...Session 1.2   enrich your knowledge graphs: linked data integration with pool...
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
 
Session 1.4 connecting information from legislation and datasets using a ca...
Session 1.4   connecting information from legislation and datasets using a ca...Session 1.4   connecting information from legislation and datasets using a ca...
Session 1.4 connecting information from legislation and datasets using a ca...
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage information
 
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
Session 0.0   media panel - matthias priem - gtuo - semantics 2017Session 0.0   media panel - matthias priem - gtuo - semantics 2017
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
 
Session 1.3 semantic asset management in the dutch rail engineering and con...
Session 1.3   semantic asset management in the dutch rail engineering and con...Session 1.3   semantic asset management in the dutch rail engineering and con...
Session 1.3 semantic asset management in the dutch rail engineering and con...
 
Session 1.3 energy, smart homes & smart grids: towards interoperability...
Session 1.3   energy, smart homes & smart grids: towards interoperability...Session 1.3   energy, smart homes & smart grids: towards interoperability...
Session 1.3 energy, smart homes & smart grids: towards interoperability...
 
Session 1.2 improving access to digital content by semantic enrichment
Session 1.2   improving access to digital content by semantic enrichmentSession 1.2   improving access to digital content by semantic enrichment
Session 1.2 improving access to digital content by semantic enrichment
 
Session 2.3 semantics for safeguarding & security – a police story
Session 2.3   semantics for safeguarding & security – a police storySession 2.3   semantics for safeguarding & security – a police story
Session 2.3 semantics for safeguarding & security – a police story
 
Session 2.5 semantic similarity based clustering of license excerpts for im...
Session 2.5   semantic similarity based clustering of license excerpts for im...Session 2.5   semantic similarity based clustering of license excerpts for im...
Session 2.5 semantic similarity based clustering of license excerpts for im...
 
Session 1.6 slovak public metadata governance and management based on linke...
Session 1.6   slovak public metadata governance and management based on linke...Session 1.6   slovak public metadata governance and management based on linke...
Session 1.6 slovak public metadata governance and management based on linke...
 
Session 5.6 towards a semantic outlier detection framework in wireless sens...
Session 5.6   towards a semantic outlier detection framework in wireless sens...Session 5.6   towards a semantic outlier detection framework in wireless sens...
Session 5.6 towards a semantic outlier detection framework in wireless sens...
 
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
 

Dernier

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 

Dernier (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Session 4.2 unleash the triple: leveraging a corporate discovery interface. the oecd case

  • 1. UNLEASH THE TRIPLE: LEVERAGING A CORPORATE DISCOVERY INTERFACE THE OECD CASE Mary-Ann Grosset, Digital Practice Team Manager Thierry Vebr, Coordinator Information Systems Jan-Anno Schuur, Coordinator Information Specialist 12 September 2017
  • 2. www.oecd.org 35 member countries Mission: promote policies that will improve the economic and social well-being of people around the world
  • 3. OECD Organisation for Economic Co-operation and Development
  • 4. OECD’s way of working
  • 5. Reading Assistant Less time to identify and collect information More time available to analyse Internal and external sources No language or word constraints Fully semantic environment Content and queries tagged using semantic robots using taxonomies/ontologies and linguistic patterns Tags stored in RDF to link content Makes suggestions to: • Narrow down search • Discover new areas
  • 6. Resources 11-Jul-17 Number of resources Semantic annotations Central taxonomies 10,234 n/a Domain-specific taxonomies 29,081 n/a Indicators 8,245 944,279 Models and ontologies 1,463 n/a Internal resources 295,637 4,614,464 Datasets 1,281 108,930 External resources 4,703,396 40,398,473 Surveys 44,405 87,140 Total 5,093,742 46,153,286
  • 7. • What: – Put structure into unstructured information – Extract knowledge from content – Enable repurposing of knowledge and content nuggets • How: – Fragmenting content (information) – Storing content as structured XML – using taxonomies, semantic tagging – Storing taxonomies, tags, links and triples – Using linked data capabilities • Deliverables: – PoC to search and discover content (October 2015) – Complex semantic analysis tools identifying indicators, trends,…. (from April 2015 to current) – Project specific reading and tagging assistants (from May 2016 to current) – Reading assistant (April 2017) How did we get to where we are
  • 8. • Move from Librarians/Records Managers to Taxonomy , semantic and linked data experts • Training/workshops in using applications and creating services • Practical hands-on experience to create and maintain • Stakeholder complex semantic use cases • Agile methodology Phase 1 : Capacity building September 2014 to October 2015 Creativity Innovation Curiosity Motivation And a lot of Dedication Team work Collaboration Enthusiasm
  • 9. • Corporate taxonomies – Topics – Geographical areas • Semantic robots to – Identify candidate terms to update the corporate taxonomies – Identify topics/geographical areas used in information at document level – Identify topics/geographical areas used at fragment level • Modelise content – Flexible , adaptable – Easy to expand – Data source Independent • Store all content as RDF • Develop PoC Phase 2: Building the foundations January to October 2015
  • 10. • Analyse sections of reports using a domain specific ontology and develop a service to allow manual tagging of sections, Display results in context • Analyse responses to questionnaires at fragment level using central topics and geographical area ontologies; identify trends and present results in context • Analyse legislation from 4 different countries (step 1) in French, English and Spanish to identify best topics and/or all topics, depending on the desired view • Analyse press releases from Reuters and AFP to identify acts of civil unrest in African countries; develop a service to allow manual validation of indicators following their identification through complex semantic tagging • Enrich articles from Elsevier, and RSS feeds to present targeted alerts to stakeholders • Develop a products taxonomy to semantically tag product Recalls Phase 3: Stakeholder semantic applications November 2015 to today
  • 11. • Build on experience: – Creating taxonomies/ontologies – Creating diverse and complex semantic robots and web-services – Creating complex SPARQL queries – Developing easy to use solutions to meet complex stakeholder semantic analysis needs • Develop a fully semantic and linked data environment • One-stop shop for policy analysts to conduct research • Semantically enrich content from internal and external sources Phase 4: Digital strategy project November 2016 to today (April 2017 for V1 release)
  • 12. The Team Knowledge Management Unit Taxonomy Semantic robots Corpora Validation Disambiguation External developersKey stakeholders Development and maintenance Model SPARQL queries Knowledge contextualisation Maintenance of environments (taxonomy, semantic), triple and XML stores Functionality Domain specific taxonomies Semantic requirements UX design Interface development
  • 14. 14 Breaking the barriers Create concepts and link them so that machines can use them as a cloud of knowledge (Semantic layer) Use in: Semantic analysis tools Search and discover tools Capture and formalise Subject-Matter-Experts’ knowledge Identify synonyms, related terms in official languages Use URIs not labels Of LANGUAGE and WORDS Taxonomies/Ontologies
  • 15. Breaking the barriers ingredientof hates http://kim.oecd.org/Topics#T4658 vaches cows http://kim.oecd.org/Topics#T127 Fondue au fromage Cheese fondue http://kim.oecd.org/Topics#T6657 lait milk gruyère http://kim.oecd.org/Topics#T909 gruyère http://kim.oecd.org/Topics#T222 fromage cheese http://kim.oecd.org/Topics#T7 520 chats cats matous http://kim.oecd.org/Topics#T75 20 souris mice
  • 17. • Use lexicon (taxonomies) • Use text patterns (part of speech) • Identify relationships (ontologies) • Test the results on a set of documents • Debug - Disambiguate • Test on the complete corpus • Put in production using Web Services How do we create the semantic robots? 36 different robots in production
  • 18. • January 2017 Semantic enrichment Search for protests in Africa Returns articles covering protests, demonstrations, manifestations, etc…
  • 19. Indicator 1 – Strike in the Public sector Sector: Public Event: Strike Must contain Identifying knowledge nuggets Event: Strike Sector: Public Indicator 1 – Strike in the Public sector
  • 20. How do we achieve this: • Extensive work on taxonomies • Selection of Corpora • Validation • Disambiguation (limit fuzziness, provide context, specify acronyms…) Recall, Precision and F-Measure Average rate 95% - Less not acceptable
  • 21. Golden Corpora and Validation Corpus (group) of documents used to validate and check the quality of the skill cartridges used to semantically enrich information The OECD Golden Corpora contain – OECD Publications – Official Documents – 17 themes – both official languages – items manually tagged against the central taxonomy
  • 22. Golden Corpora :Size and Quality • 1.223 publications • Error margin of 0,9. Theme Total XML En Fr % XML Agriculture 45 18 9 9 40 Development Cooperation 89 36 18 18 40 Economic Policy 176 163 84 79 93 Education 68 22 16 6 32 Employment 42 22 11 11 52 Energy 76 4 2 2 5 Enterprise Entrepreneurship 76 44 30 14 58 Environment 88 56 30 26 64 Finance Investment 167 111 58 53 66 Public Governance 94 66 38 28 70 Science 56 32 16 16 57 Social issues 70 52 27 25 74 Taxation 37 23 15 8 62 Territorial Development 37 28 15 13 76 Trade 61 19 12 7 31 Transport 41 8 3 5 20 Total 1223 704 384 320 58 Provide a realistic overview of the content and volume covered in the different domains. OECD index for cartridge reference created with the XML publications. The index provides the basis of OECD language that is used by the cartridge to rank extraction results.
  • 24. It’s raining cats and dogs Disambiguation is key & takes a lot of time Queen Elisabeth
  • 25. We use several semantic robots, based on several different taxonomies (generic, innovation-oriented, etc…) Multi-view annotation graphs We tag a same resource in different ways We can see a same resource with different « semantic » angles of view
  • 27. • 5-year programme • Looking for candidates from OECD member countries for 6 month internships, or short term assignments • To work with us on specific semantic projects • Can be associated to a PhD thesis • 6 to 18K€ funding envelope per project Outreach
  • 28. mary-ann.grosset@oecd.org Knowledge Management Unit Digital Knowledge and Information Services 2 rue André Pascal 75775 Paris cédex 16 France Tel: +331 45 24 81 53 Contact For additional information, internship possibilities to work on semantic projects :