Support Europeana in Securing Funding for the Connecting Europe Facility (CEF)
Europeana datainaction nov2012
1. Europeana Semantic Data in
Action
(a Pilot Service based on OWLIM)
http://europeana.ontotext.com
Mariana Damova (PhD)
(with contribution to the work by Antoine Isaac, Valentine Charles,
Zdravko Tashev, Svetoslav Petrov)
Europeana AGM
November 2012
3. Europeana Data Standards
• Unified metadata
• ESE – Europeana Semantic Elements
• DublinCore & Europeana fields
• 36 fields: flat, limited ability semantic links
dc:title europeana:provider
dc:creator europeana:dataProvider
dc:subject europeana:rights
dc:description europeana:type
dc:publisher europeana:isShownBy and/or europeana:isShownAt
… …
• EDM - Europeana Data Model
Basic data model Two contextual classes
3
4. Europeana Data in EDM
• 268GB of data in RDF
• 20M+ cultural objects data and linkages to other
datasets, mainly DBpedia
• EDM model
• SKOS
4
5. Semantic Technologies – Main Features
• Semantic technologies (RDF, LOD) allow for an unprecedented ease of
integration of heterogeneous data sources
– Already adopted in pharmaceuticals and publishing industries
BBC – when MySQL was replaced with OWLIM in their “Dynamic Semantic
Publishing” architecture, the BBC team observed considerable reduction of
complexity of database design, query specification, application
development, and query evaluation time. BBC World Cup 2010 dynamic
semantic publishing. Jem Rayfield, Senior Technical Architect BBC News
and Knowledge.
http://www.bbc.co.uk/blogs/bbcinternet/2010/07/bbc_world_cup_2010_dyna
mic_sem.html
6. Linking Open Data
• Linking Open Data (LOD) W3C SWEO Community project
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
• Initiative for publishing “linked data” – a set of principles,
which allows browsing of RDF data, spread across different
servers, in the way HTML is browsed
7. Semantic Technologies and Cultural Heritage
combining facts and knowledge from different datasets need for
convincing real life use cases demonstrating the benefits of these
technologies
The cultural heritage domain can become a useful usecase for the
application of semantic technologies.
MacManus, the Founder and Editor-in-Chief of ReadWriteWeb
defined an exemplary test for the Semantic Web
cities around the world which have Modigliani art works
8. FactForge of Ontotext solves the Modigliani query
by combining knowledge from 6 datasets from the Linked Open Data Cloud
http://factforge.net
9. OWLIM - a scalable, robust and efficient triple store
– Serving the two most important web-sites for the London Olympic Games
• Official Olympics website
• BBC Olympics website
– Performance highlights
• OWLIM loads the 100M and the 200M datasets almost twice as fast as the next best product (17
min. for 100M)
• Best query performance among those repositories that can handle update and multi-client query
tasks (5,285 Query-mixes-per-hour, where a query mix contains 25 queries; e.g. about 100
queries/sec)
• OWLIM v5 is 43% faster than v.4.3 on the BSBM Explore and Update scenario
• OWLIM v5 requires between 25% and 70% less storage space
• OWL 2 RL-type languages have proven to be the only feasible approach for
reasoning with billion statements
10. Reason-able View with Europeana data in EDM
• 268GB of data
• cultural objects data and linkages to other datasets
Loaded into OWLIM with inference wrt OWL-Horst Optimized
Dataset size:
NumberOfStatements=3,899,531,218
NumberOfExplicitStatements= 993,332,911
NumberOfEntities=264,523,842
EDM model
SKOS
13. Semantic Queries over Structured Data
• Available objects with their aggregators
• Data providers having contributing content to Europeana
• Datasets from Italy
• Objects from the 18th century provided to Europeana
• The original URL, the copyright and the creative commons right of objects provided by The
European Library
• Copyrights and Creative Commons rights of Europeana objects per provider
• Enrichment statements produced by Europeana for objects provided by institutions from
the United Kingdom
• List of Europeana enriched objects from Sweden, their equivalents and related entities
• Time enrichment statements produced by Europeana for provided objects
• The complete ordered list of Europeana aggregators and the specific data providers they
gather
15. Other cultural heritage sources available for interlinking
Gothenburg City Museum objects
• Oil paintings from the GIM collection
• Paintings of value less than 5000 Swedish Krona
• Paintings with a Gothenburg motive
• Portraits and their painters
• Museum Objects from Swedish Museums
• Museum objects of height more than 30 centimeter
• Paintings given as a present to the Gothenburg City Museum
http://museum.ontotext.com
17. Outlook …
Europeana Creative - PSP project
lead by the Austrian National Library
26 partners
Objective: experimenting with re-use of cultural
content for creativity
Project: Europeana re-use framework and 6 pilots in
different domains such as education,
tourism, etc.
Ontotext: participate in the infrastructure for re-use with
the semantic repository OWLIM, and data
integration
Sofia, 13 March 2012 17
18. Ontotext
– Top-5 provider of core Semantic Technology
– Established in year 2000; offices in Bulgaria, UK, USA
– Active both in research and commercial projects (FP7 funding for 10 years)
• 360° semantic technology – unique portfolio:
– Semantic Databases: high-performance RDF DBMS, scalable reasoning
– Semantic Search: text-mining (IE), metadata generation, Information Retrieval (IR)
– Web Mining: focused crawling, screen scraping, data fusion
– Linked Data Management and Data Integration
Good recognition in the SemTech community
– Ontotext pages are ranked #1 for “semantic annotation” and “semantic repository” at
GYM, #3 for “linked data management” at Google
Several joint ventures and subsidiaries
– Innovantage: leading online recruitment intelligence provider in UK
19. Ontotext Clients (selected)
British Broadcasting Corporation (BBC)
– Run its World Cup 2010 sites on top of OWLIM
– Since Mar’12 BBC Sports
– 2012 Olympics sections are driven
by OWLIM and a Concept Extraction service developed by Ontotext
Press Association (UK)
– Analysis of Sports news
– Concept extraction
– Linked data generation
Top-3 USA media (not allowed to name)
The National Archives (UK) contracted Ontotext to implement
semantic KB and semantic search for the Government Web Archive
British Museum (UK) Ontotext leads the development of Phase 3 of
ResearchSpace project on collaborative research in cultural heritage;
British Museum’s public SPARQL end-point is powered by OWLIM
20. Ontotext in the Cultural Heritage Domain
Selected commercial projects
ResearchSpace project funded by the Andrew W. Mellon Foundation
Support for collaborative web-based research, information sharing and web publishing for
the cultural heritage scholarly community. An Ontotext-led international consortium.
The Polish Digital National Museum aggregates artifacts from over 70 contributing
cultural institutions in the Digital Libraries Federation PIONIER Network using OWLIM
repository of Ontotext
LODAC (Linked Open Data in Academia), Japan's National Institute of Informatics
aggregates various information across multiple Japanese resources as LOD. The system
uses 8 OWLIM nodes and aggregates 19 collections with 700 000 entities and 15M triples.
SemTech for Cultural Heritage project funded by ITCC
Semantic publishing of Bulgarian cultural heritage to Europeana Establishing a Bulgarian
technical aggregator for Europeana
Selected research projects
MOLTO FP7 project, a use case in cultural heritage for a semantic knowledge
representationinfrastructure for querying RDF and presenting query results, includes close
to 9K museum objects from two collections of The Gothenburg City
Charisma (Cultural Heritage Advanced Research Infrastructures) an EU-funded
integrating activity project, a consortium of 21 partners, metadata from 6 major European
cultural institutions has selected OWLIM repository of Ontotext
21. Thank you for your attention!
mariana.damova@ontotext.com
21