SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Semantic Technologies for Big Data

         Marin Dimitrov (Ontotext)



            XML Amsterdam 2012
XML Amsterdam 2012




 Semantic Technologies for Big Data   Sep 2012   #2
About Ontotext

• Provides products and services for                      creating,
  managing and exploiting semantic data
   – Founded in 2000
   – Offices in Bulgaria, USA and UK
• Major clients and industries
   – Media & Publishing (BBC, Press Association)
   – HCLS (AstraZeneca, UCB)
   – Cultural Heritage (The British Museum, The National
     Archives, Polish National Museum, Dutch Public Library)
   – Defense and Homeland Security


                     Semantic Technologies for Big Data   Sep 2012    #3
Outline

• Semantic Technologies for the Enterprise
• Semantic Technologies for Big Data
• Success stories




                    Semantic Technologies for Big Data   Sep 2012   #4
SEMANTIC TECHNOLOGIES FOR THE
ENTERPRISE



         Semantic Technologies for Big Data   Sep 2012   #5
The need for a smarter Web

• "The Semantic Web is an extension of the current web in
  which information is given well-defined meaning, better
  enabling computers and people to work in cooperation.“ (Tim
  Berners-Lee, 2001)
• “PricewaterhouseCoopers believes a Web of data will develop
  that fully augments the document Web of today. You’ll be
  able to find pieces of data sets from different places,
  aggregate them without warehousing, and analyze them in a
  more straightforward, powerful way than you can now.”
  (PWC, May 2009)




                    Semantic Technologies for Big Data   Sep 2012   #6
Linked Data

• Linked Data is a set of principles that allows
  publishing, querying and consumption of RDF data,
  distributed across different servers
• Design principles
   –   Use unambiguous identifiers for resources (URIs)
   –   Use HTTP URIs (dereference-able)
   –   Provide useful information for URI lookups
   –   Interlink resources




                      Semantic Technologies for Big Data   Sep 2012   #7
The Semantic Web timeline
                 RDF                                                                                 RDF 2
         DAML+OIL               OWL                                         OWL 2
                                                 SPARQL                                     SPARQL 1.1
                                                               RIF
                                                           RDFa
                                                      SAWSDL
                                                                                  LOD
                                                                           SKOS
                                                                                        HCLS
                                                                                    SSN
                                                                                          RDB2RDF
                                                                                                     PIL
                                                                                                       GLD
                                                                                                       LDP
1999   2000   2001   2002    2003    2004    2005     2006     2007      2008   2009    2010    2011    2012


                                    Semantic Technologies for Big Data                    Sep 2012         #8
Enterprise Information Management Challenges

• Many disparate data sources and data silos
• Many point-to-point interfaces
• Data sources with similar/inconsistent information
• Complex data integration processes inadequate for
  changing business requirements
• Most of the knowledge is hidden in texts
• Difficult to integrate & analyse structured data and
  text


                  Semantic Technologies for Big Data   Sep 2012   #9
Semantic Web and Linked Data Opportunities for the
                   Enterprise

• Simplify the information integration processes
   – Flexible, easy to evolve data model
   – Bottom-up / incremental integration
   – Efficiently integrate structured and unstructured data
• Provide an enterprise metadata layer
   – Unified metadata vocabulary for the enterprise
   – Align the legacy data silos
   – Improve the information sharing and reuse




                     Semantic Technologies for Big Data   Sep 2012   #10
Semantic Web and Linked Data Opportunities for the
                 Enterprise (2)

• Discovery and enrichment of information
   – Interlink people, organisations, events, etc.
   – Enrich enterprise content with structured annotations
   – Discover implicit links and relationships
• Unified access to information within the enterprise
   – Simplified infrastructure based on open web standards
• Information interchange across a value chain
   – Easy publishing and consumption of Linked Data
• Augments existing IT assets and technologies
   – No need for disruptive replacement

                    Semantic Technologies for Big Data   Sep 2012   #11
XML and RDF: friends or foes

• Complement each other
   – XML best for content, structure and interchange format
   – RDF for metadata layer and semantics
• Typical use case
   – Many XML content data sources
      • Content stored in an XML store (XQuery and XSLT)
   – Structured data sources & external Linked Data
      • RDF-ized and stored in an RDF store (SPARQL)
   – Metadata extracted from content
      • stored in an RDF store (SPARQL)
      • semantic search and metadata driven content delivery


                      Semantic Technologies for Big Data   Sep 2012   #12
BBC Sports




                                     (c) BBC

Semantic Technologies for Big Data             Sep 2012   #13
Added value of RDF

• Explicit semantics
   – Intended meaning of entities and relations
• Global identifiers (URIs)
• Simple and flexible graph-based data model
• Easier data mapping & integration
   – Bottom-up / incremental data integration with owl:sameAs
• Inference of implicit information
• Working with distributed information
   – Linked Data, federated SPARQL

                    Semantic Technologies for Big Data   Sep 2012   #14
Added value of RDF

• Descriptive / agile schema
   – Open World Assumption, don’t restrict predicates
   – Generated dynamically from data
• Queries based on meaning
   – Not depending on structure / order of statements
• Data and queries may use different vocabularies
• Exploratory queries
• Choice of OWL2 profiles
   – Tradeoff features vs performance
   – New profiles may emerge in the future
                    Semantic Technologies for Big Data   Sep 2012   #15
SEMANTIC TECHNOLOGIES FOR BIG
DATA



         Semantic Technologies for Big Data   Sep 2012   #16
The three V’s of Big Data

• Velocity
  – Streaming, sensor, real-time data
  – Solution: distributed processing & storage
  – Semantic challenge: stream reasoning
• Volume
  – Petabytes of data
  – Solution: distributed processing & storage
  – Semantic challenge: distributed reasoning & querying
• Variety
  – Structured, semi-structured and unstructured data
  – Semantic Technologies (RDF) are a good fit
                    Semantic Technologies for Big Data   Sep 2012   #17
Types of Big Data (NIST)

• Type 1
  – Velocity (-), Volume (-), Variety (+)
  – Perfect fit for Semantic Technologies
• Type 2
  – Velocity and/or Volume, Variety (-)
  – Only horizontal scalability required, traditional approaches
    are a good enough fit
• Type 3
  – All V’s
  – Semantic Technologies not a good fit yet, but moving in
    that direction
                    Semantic Technologies for Big Data   Sep 2012   #18
Semantic Technologies for Volume and Velocity

• Promising ongoing research
• Distributed inference with Hadoop/Storm
• Stream reasoning
   – Continuous queries
   – Continuous (dynamic) semantics
• SPARQL to Pig translation
• Distributed RDF stores on top of NoSQL
• C-SPARQL, EP-SPARQL, CQELS


                   Semantic Technologies for Big Data   Sep 2012   #19
Linked Open Data Cloud (Sep 2011)




                                             (c) Cyganiak & Jentzsch
        Semantic Technologies for Big Data           Sep 2012          #20
From Big Linked Data to Linked Big Data

• Big Linked Data
   – Big Data approach adopted by the Linked Data community
      • In particular handling Volume and Velocity
   – Exponential growth of Linked Data in the last 5 years
• Linked Big Data
   – Linked Data approach adopted by the Big Data community
   – RDF data model for Variety
   – Enrich Big Data with metadata and semantics – more
     powerful analytics on top of it
   – Interlink Big Data sets
   – Simplify data access and data integration

                       Semantic Technologies for Big Data   Sep 2012   #21
SUCCESS STORIES




          Semantic Technologies for Big Data   Sep 2012   #22
Typical Use Cases for Linked Data and Semantic
                    Technologies

• Publish / consume Linked Data across enterprises
   – Linked Data is not necessarily free data
   – Facilitate data interchange within the value chain
• Information integration within the enterprise
   – Integrated asset management / align data silos
   – Master Data Management
• Knowledge discovery and semantic search
   – Integrate structured and unstructured data
   – Enrich and interlink information
   – Semantic search and exploration of information

                     Semantic Technologies for Big Data   Sep 2012   #23
Semantic Information Integration (Ontotext)




             Semantic Technologies for Big Data   Sep 2012   #24
The National Archives (Ontotext)

• Challenge
  – Large archive of various UK Government websites since
    1997
  – Lots of duplicated information & documents
  – Inefficient search & navigation
• Semantic Knowledge Base project goals
  –   Integrate multiple data sources
  –   Extract information & metadata from archived documents
  –   Interlink the web archive with data.gov.uk and LOD data
  –   Advanced search & navigation of the archive


                     Semantic Technologies for Big Data   Sep 2012   #25
The National Archives (Ontotext)


                                                               Front Ends:
                                                                Semantic
                                                                 Search

                                  O1                            SPARQL         A
                                         3rd party                                   C
                                  O2     Ontology                graph         B
                                                                                     D
                                          Editors              exploration
                                  O3                                             Data
                                                                                Trans-
                                                                              formation
                                                                                 and
                                                     Semantic Repository     Integration
                      Semantic
                     Annotation
                                                        SKB Ontologies

                                                      Factual Knowledge
                                                       (TNA data, LOD,
                                                         data.gov.uk)

                                                                              Identity
                                                     Semantic annotations    Resolution
Annotation Process
(GATE Teamware)
                                                       Semantic Index




                                       Semantic Technologies for Big Data                  Sep 2012   #26
The National Archives (Ontotext)

• The numbers
  –   2.5 billion input files
  –   40TB compressed archive data
  –   10 billion RDF triples stored in OWLIM
  –   33,000 EC2 hours used on AWS
  –   Dynamic EC2 cluster (180 instances average, 500 max)
• Major challenges
  – Complex pre-processing of documents
  – De-duplication of information & documents
  – EC2/RRS performance & reliability


                     Semantic Technologies for Big Data   Sep 2012   #27
Dutch Public Library (Ontotext + Dayon)

• Challenge
  – Many disparate data sources, inefficient search
• Goals
  – Data integration
  – Automated metadata generation
  – Open search platform
• Numbers
  – 500 heterogeneous data sources
  – 40 million cultural heritage artifacts to be describes
  – 6-8 billion triples to be stored into the knowledge base

                    Semantic Technologies for Big Data   Sep 2012   #28
Linked Life Data (Ontotext)

• Challenge
  – Disparate, heterogeneous and unaligned data silos lock
    valuable biomedical information
• Goals
  – Semantic warehouse integrating and interlinking public
    biomedical data sources
  – Interactive discovery and exploration
• Numbers
  – 25+ heterogeneous biomedical data sources integrated
  – 1 billion entities described
  – 5.5 billion RDF triples
                   Semantic Technologies for Big Data   Sep 2012   #29
Linked Life Data (Ontotext)




    Semantic Technologies for Big Data   Sep 2012   #30
Linked Life Data-as-a-Service (Ontotext)

• More data sources
• Large scale text mining over the LOD cloud
• Adapted for specific use cases
• UCB use case
   – 2 billion entities described
   – 11 billion RDF triples




                     Semantic Technologies for Big Data   Sep 2012   #31
Dynamic Semantic Publishing (Ontotext)

• Challenge
  – Difficult & slow to aggregate content from various sources
• Goals
  – Metadata generation for news (semantic annotation)
  – Interlink & categorize content
  – Metadata driven web pages
• Numbers
  – Nearly real-time processing & annotation required
  – Tens of millions (SPARQL) queries to the knowledge base
    per day

                    Semantic Technologies for Big Data   Sep 2012   #32
Trillion RDF triples (Franz Inc.)

• Use case
  – Use RDF for the customer management database of a
    telecom
• Challenge
  – 4,000 triples per customer, more than a trillion for the
    whole customer base
• Numbers
  – 1 trillion triples stored in AllegroGraph by Franz Inc
     • Hardware requirements undisclosed
     • The 310 billion triple result used 8-CPU system with 2TB RAM



                     Semantic Technologies for Big Data     Sep 2012   #33
uRiKA (Cray/YarcData)

• Big Data appliance for graph analytics
   – Based on the Threadstormtm architecture
   – Up to 8K processors, 512TB RAM, 350TB/hr IO throughput
• In-memory RDF database
• SPARQL 1.0 engine




                   Semantic Technologies for Big Data   Sep 2012           #34
                                                                   (c) YarcData
TAKEAWAYS




        Semantic Technologies for Big Data   Sep 2012   #35
Semantic Technologies for Big Data

• Rich ecosystem of Semantic Technologies since 1999
• Strong Enterprise focus in the last 5 years
• Semantic Technologies provide opportunity for
  reducing the cost and complexity of data integration
• Common metadata layer for the enterprise
• More powerful ways to find and explore information
• RDF complements XML within the enterprise
• Semantic Technologies are a good fit for Big Data’s
  Variety

                   Semantic Technologies for Big Data   Sep 2012   #36
Semantic Technologies for Big Data

• Velocity and Volume still challenging for Semantic
  Technologies, but lots of progress in that direction
• Linked Data will grow into Big Linked Data, but Big
  Data will also benefit from evolving into Linked Big
  Data
• Interesting success stories for Semantic Technologies
  in Big Data scenarios




                   Semantic Technologies for Big Data   Sep 2012   #37
THANK YOU!




         Semantic Technologies for Big Data   Sep 2012   #38

Contenu connexe

Tendances

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Big Data Ppt PowerPoint Presentation Slides
Big Data Ppt PowerPoint Presentation Slides Big Data Ppt PowerPoint Presentation Slides
Big Data Ppt PowerPoint Presentation Slides SlideTeam
 
Domain Driven Data: Apache Kafka® and the Data Mesh
Domain Driven Data: Apache Kafka® and the Data MeshDomain Driven Data: Apache Kafka® and the Data Mesh
Domain Driven Data: Apache Kafka® and the Data Meshconfluent
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!
 
Developing Data Products
Developing Data ProductsDeveloping Data Products
Developing Data ProductsPeter Skomoroch
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemKiran kumar
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data VisualizationCenterline Digital
 
Big Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business NeedsBig Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business NeedsBernard Marr
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
The 8 Best Examples Of Real-Time Data Analytics
The 8 Best Examples Of Real-Time Data AnalyticsThe 8 Best Examples Of Real-Time Data Analytics
The 8 Best Examples Of Real-Time Data AnalyticsBernard Marr
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionGuido Schmutz
 
Big data Presentation
Big data PresentationBig data Presentation
Big data PresentationAswadmehar
 
Webinar: How Banks Manage Reference Data with MongoDB
 Webinar: How Banks Manage Reference Data with MongoDB Webinar: How Banks Manage Reference Data with MongoDB
Webinar: How Banks Manage Reference Data with MongoDBMongoDB
 

Tendances (20)

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Big Data Ppt PowerPoint Presentation Slides
Big Data Ppt PowerPoint Presentation Slides Big Data Ppt PowerPoint Presentation Slides
Big Data Ppt PowerPoint Presentation Slides
 
Data mining
Data miningData mining
Data mining
 
Big data
Big dataBig data
Big data
 
Domain Driven Data: Apache Kafka® and the Data Mesh
Domain Driven Data: Apache Kafka® and the Data MeshDomain Driven Data: Apache Kafka® and the Data Mesh
Domain Driven Data: Apache Kafka® and the Data Mesh
 
Big data, Big decision
Big data, Big decisionBig data, Big decision
Big data, Big decision
 
Big Data
Big DataBig Data
Big Data
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
Developing Data Products
Developing Data ProductsDeveloping Data Products
Developing Data Products
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data Visualization
 
Big Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business NeedsBig Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business Needs
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
The 8 Best Examples Of Real-Time Data Analytics
The 8 Best Examples Of Real-Time Data AnalyticsThe 8 Best Examples Of Real-Time Data Analytics
The 8 Best Examples Of Real-Time Data Analytics
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in Action
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
Webinar: How Banks Manage Reference Data with MongoDB
 Webinar: How Banks Manage Reference Data with MongoDB Webinar: How Banks Manage Reference Data with MongoDB
Webinar: How Banks Manage Reference Data with MongoDB
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 

En vedette

From Big Data to Smart Data
From Big Data to Smart DataFrom Big Data to Smart Data
From Big Data to Smart DataMarin Dimitrov
 
Using the Semantic Web Stack to Make Big Data Smarter
Using the Semantic Web Stack to Make  Big Data SmarterUsing the Semantic Web Stack to Make  Big Data Smarter
Using the Semantic Web Stack to Make Big Data SmarterMatheus Mota
 
How Semantics Solves Big Data Challenges
How Semantics Solves Big Data ChallengesHow Semantics Solves Big Data Challenges
How Semantics Solves Big Data ChallengesDATAVERSITY
 
Big Data and the Semantic Web: Challenges and Opportunities
Big Data and the Semantic Web: Challenges and OpportunitiesBig Data and the Semantic Web: Challenges and Opportunities
Big Data and the Semantic Web: Challenges and OpportunitiesSrinath Srinivasa
 
Inference using owl 2.0 semantics
Inference using owl 2.0 semanticsInference using owl 2.0 semantics
Inference using owl 2.0 semanticsCraig Trim
 
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big DataPigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big DataAlexander Schätzle
 
PigSPARQL - Mapping SPARQL to Pig Latin
PigSPARQL - Mapping SPARQL to Pig LatinPigSPARQL - Mapping SPARQL to Pig Latin
PigSPARQL - Mapping SPARQL to Pig LatinAlexander Schätzle
 
Sempala - Interactive SPARQL Query Processing on Hadoop
Sempala - Interactive SPARQL Query Processing on HadoopSempala - Interactive SPARQL Query Processing on Hadoop
Sempala - Interactive SPARQL Query Processing on HadoopAlexander Schätzle
 
Knoesis-Semantic filtering-Tutorials
Knoesis-Semantic filtering-TutorialsKnoesis-Semantic filtering-Tutorials
Knoesis-Semantic filtering-TutorialsPavan Kapanipathi
 
OIDC16: Open Data in Belgium
OIDC16: Open Data in BelgiumOIDC16: Open Data in Belgium
OIDC16: Open Data in BelgiumBart Hanssens
 
Twarql Architecture - Streaming Annotated Tweets
Twarql Architecture - Streaming Annotated TweetsTwarql Architecture - Streaming Annotated Tweets
Twarql Architecture - Streaming Annotated TweetsPablo Mendes
 
시스템 엔지니어가 바라보는 시맨틱웹과 빅데이터 기술
시스템 엔지니어가 바라보는 시맨틱웹과 빅데이터 기술시스템 엔지니어가 바라보는 시맨틱웹과 빅데이터 기술
시스템 엔지니어가 바라보는 시맨틱웹과 빅데이터 기술Haklae Kim
 
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016DataStax
 
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionLinking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionRonald Ashri
 
Freebase and the semantic web
Freebase and the semantic webFreebase and the semantic web
Freebase and the semantic webspencermountain
 
OWLIM@AWS - On-demand RDF Data Management in the Cloud
OWLIM@AWS - On-demand RDF Data Management in the CloudOWLIM@AWS - On-demand RDF Data Management in the Cloud
OWLIM@AWS - On-demand RDF Data Management in the CloudMarin Dimitrov
 
Ontotext in EC Funded Projects 2002-2012
Ontotext in EC Funded Projects 2002-2012Ontotext in EC Funded Projects 2002-2012
Ontotext in EC Funded Projects 2002-2012Marin Dimitrov
 
Enabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and ReuseEnabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and ReuseMarin Dimitrov
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceMarin Dimitrov
 
S4: The Self-Service Semantic Suite
S4: The Self-Service Semantic SuiteS4: The Self-Service Semantic Suite
S4: The Self-Service Semantic SuiteMarin Dimitrov
 

En vedette (20)

From Big Data to Smart Data
From Big Data to Smart DataFrom Big Data to Smart Data
From Big Data to Smart Data
 
Using the Semantic Web Stack to Make Big Data Smarter
Using the Semantic Web Stack to Make  Big Data SmarterUsing the Semantic Web Stack to Make  Big Data Smarter
Using the Semantic Web Stack to Make Big Data Smarter
 
How Semantics Solves Big Data Challenges
How Semantics Solves Big Data ChallengesHow Semantics Solves Big Data Challenges
How Semantics Solves Big Data Challenges
 
Big Data and the Semantic Web: Challenges and Opportunities
Big Data and the Semantic Web: Challenges and OpportunitiesBig Data and the Semantic Web: Challenges and Opportunities
Big Data and the Semantic Web: Challenges and Opportunities
 
Inference using owl 2.0 semantics
Inference using owl 2.0 semanticsInference using owl 2.0 semantics
Inference using owl 2.0 semantics
 
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big DataPigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
 
PigSPARQL - Mapping SPARQL to Pig Latin
PigSPARQL - Mapping SPARQL to Pig LatinPigSPARQL - Mapping SPARQL to Pig Latin
PigSPARQL - Mapping SPARQL to Pig Latin
 
Sempala - Interactive SPARQL Query Processing on Hadoop
Sempala - Interactive SPARQL Query Processing on HadoopSempala - Interactive SPARQL Query Processing on Hadoop
Sempala - Interactive SPARQL Query Processing on Hadoop
 
Knoesis-Semantic filtering-Tutorials
Knoesis-Semantic filtering-TutorialsKnoesis-Semantic filtering-Tutorials
Knoesis-Semantic filtering-Tutorials
 
OIDC16: Open Data in Belgium
OIDC16: Open Data in BelgiumOIDC16: Open Data in Belgium
OIDC16: Open Data in Belgium
 
Twarql Architecture - Streaming Annotated Tweets
Twarql Architecture - Streaming Annotated TweetsTwarql Architecture - Streaming Annotated Tweets
Twarql Architecture - Streaming Annotated Tweets
 
시스템 엔지니어가 바라보는 시맨틱웹과 빅데이터 기술
시스템 엔지니어가 바라보는 시맨틱웹과 빅데이터 기술시스템 엔지니어가 바라보는 시맨틱웹과 빅데이터 기술
시스템 엔지니어가 바라보는 시맨틱웹과 빅데이터 기술
 
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
 
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionLinking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
 
Freebase and the semantic web
Freebase and the semantic webFreebase and the semantic web
Freebase and the semantic web
 
OWLIM@AWS - On-demand RDF Data Management in the Cloud
OWLIM@AWS - On-demand RDF Data Management in the CloudOWLIM@AWS - On-demand RDF Data Management in the Cloud
OWLIM@AWS - On-demand RDF Data Management in the Cloud
 
Ontotext in EC Funded Projects 2002-2012
Ontotext in EC Funded Projects 2002-2012Ontotext in EC Funded Projects 2002-2012
Ontotext in EC Funded Projects 2002-2012
 
Enabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and ReuseEnabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and Reuse
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-Service
 
S4: The Self-Service Semantic Suite
S4: The Self-Service Semantic SuiteS4: The Self-Service Semantic Suite
S4: The Self-Service Semantic Suite
 

Similaire à Semantic Technologies for Big Data

Linked Data for the Enterprise: Opportunities and Challenges
Linked Data for the Enterprise: Opportunities and ChallengesLinked Data for the Enterprise: Opportunities and Challenges
Linked Data for the Enterprise: Opportunities and ChallengesMarin Dimitrov
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataMarin Dimitrov
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataEUCLID project
 
Michael Lang Sr. Presentation
Michael Lang Sr. PresentationMichael Lang Sr. Presentation
Michael Lang Sr. PresentationMediabistro
 
On-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudOn-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudMarin Dimitrov
 
Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...
Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...
Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...Denodo
 
How google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrowHow google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrowVasu Jain
 
Red hatpartner2013edb futureofdatabase
Red hatpartner2013edb futureofdatabaseRed hatpartner2013edb futureofdatabase
Red hatpartner2013edb futureofdatabaseEDB
 
SemTechBiz 2012 Panel on Linking Enterprise Data
SemTechBiz 2012 Panel on Linking Enterprise DataSemTechBiz 2012 Panel on Linking Enterprise Data
SemTechBiz 2012 Panel on Linking Enterprise Data3 Round Stones
 
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationSigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationRichard Cyganiak
 
Size does not matter (if your data is in a silo)
Size does not matter (if your data is in a silo)Size does not matter (if your data is in a silo)
Size does not matter (if your data is in a silo)Ora Lassila
 
Enterprise linked data clouds
Enterprise linked data cloudsEnterprise linked data clouds
Enterprise linked data cloudsdamienjoyce
 
Data Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementData Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementRENDER project
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
Webinar: Applying REST to Network Management – An Implementor’s View
Webinar: Applying REST to Network Management – An Implementor’s View Webinar: Applying REST to Network Management – An Implementor’s View
Webinar: Applying REST to Network Management – An Implementor’s View Tail-f Systems
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesDr.-Ing. Thomas Hartmann
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Stefan Dietze
 
How Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryHow Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryAlex Meadows
 

Similaire à Semantic Technologies for Big Data (20)

Linked Data for the Enterprise: Opportunities and Challenges
Linked Data for the Enterprise: Opportunities and ChallengesLinked Data for the Enterprise: Opportunities and Challenges
Linked Data for the Enterprise: Opportunities and Challenges
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Semantic web
Semantic webSemantic web
Semantic web
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Michael Lang Sr. Presentation
Michael Lang Sr. PresentationMichael Lang Sr. Presentation
Michael Lang Sr. Presentation
 
On-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudOn-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the Cloud
 
Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...
Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...
Denodo: Enabling a Data Mesh Architecture and Data Sharing Culture at Landsba...
 
How google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrowHow google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrow
 
Red hatpartner2013edb futureofdatabase
Red hatpartner2013edb futureofdatabaseRed hatpartner2013edb futureofdatabase
Red hatpartner2013edb futureofdatabase
 
SemTechBiz 2012 Panel on Linking Enterprise Data
SemTechBiz 2012 Panel on Linking Enterprise DataSemTechBiz 2012 Panel on Linking Enterprise Data
SemTechBiz 2012 Panel on Linking Enterprise Data
 
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationSigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
 
Size does not matter (if your data is in a silo)
Size does not matter (if your data is in a silo)Size does not matter (if your data is in a silo)
Size does not matter (if your data is in a silo)
 
Enterprise linked data clouds
Enterprise linked data cloudsEnterprise linked data clouds
Enterprise linked data clouds
 
ECSA 2013 (Cuesta)
ECSA 2013 (Cuesta)ECSA 2013 (Cuesta)
ECSA 2013 (Cuesta)
 
Data Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementData Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data Management
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Webinar: Applying REST to Network Management – An Implementor’s View
Webinar: Applying REST to Network Management – An Implementor’s View Webinar: Applying REST to Network Management – An Implementor’s View
Webinar: Applying REST to Network Management – An Implementor’s View
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with Triples
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)
 
How Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryHow Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information Discovery
 

Plus de Marin Dimitrov

Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...Marin Dimitrov
 
Mapping Your Career Journey
Mapping Your Career JourneyMapping Your Career Journey
Mapping Your Career JourneyMarin Dimitrov
 
Trust - the Key Success Factor for Teams & Organisations
Trust - the Key Success Factor for Teams & OrganisationsTrust - the Key Success Factor for Teams & Organisations
Trust - the Key Success Factor for Teams & OrganisationsMarin Dimitrov
 
Uber @ Telerik Academy 2018
Uber @ Telerik Academy 2018Uber @ Telerik Academy 2018
Uber @ Telerik Academy 2018Marin Dimitrov
 
Machine Learning @ Uber
Machine Learning @ UberMachine Learning @ Uber
Machine Learning @ UberMarin Dimitrov
 
Career Advice for My Younger Self
Career Advice for My Younger SelfCareer Advice for My Younger Self
Career Advice for My Younger SelfMarin Dimitrov
 
Scaling Your Engineering Organization with Distributed Sites
Scaling Your Engineering Organization with Distributed SitesScaling Your Engineering Organization with Distributed Sites
Scaling Your Engineering Organization with Distributed SitesMarin Dimitrov
 
Building, Scaling and Leading High-Performance Teams
Building, Scaling and Leading High-Performance TeamsBuilding, Scaling and Leading High-Performance Teams
Building, Scaling and Leading High-Performance TeamsMarin Dimitrov
 
Uber @ Career Days 2017 (Sofia University)
Uber @ Career Days 2017 (Sofia University)Uber @ Career Days 2017 (Sofia University)
Uber @ Career Days 2017 (Sofia University)Marin Dimitrov
 
GraphDB Connectors – Powering Complex SPARQL Queries
GraphDB Connectors – Powering Complex SPARQL QueriesGraphDB Connectors – Powering Complex SPARQL Queries
GraphDB Connectors – Powering Complex SPARQL QueriesMarin Dimitrov
 
Low-cost Open Data As-a-Service
Low-cost Open Data As-a-ServiceLow-cost Open Data As-a-Service
Low-cost Open Data As-a-ServiceMarin Dimitrov
 
Text Analytics & Linked Data Management As-a-Service
Text Analytics & Linked Data Management As-a-ServiceText Analytics & Linked Data Management As-a-Service
Text Analytics & Linked Data Management As-a-ServiceMarin Dimitrov
 
RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4Marin Dimitrov
 
Scaling to Millions of Concurrent SPARQL Queries on the Cloud
Scaling to Millions of Concurrent SPARQL Queries on the CloudScaling to Millions of Concurrent SPARQL Queries on the Cloud
Scaling to Millions of Concurrent SPARQL Queries on the CloudMarin Dimitrov
 
Crossing the Chasm with Semantic Technology
Crossing the Chasm with Semantic TechnologyCrossing the Chasm with Semantic Technology
Crossing the Chasm with Semantic TechnologyMarin Dimitrov
 
Delivering Linked Data Training to Data Science Practitioners
Delivering Linked Data Training to Data Science PractitionersDelivering Linked Data Training to Data Science Practitioners
Delivering Linked Data Training to Data Science PractitionersMarin Dimitrov
 
Career Days 2012 @ Sofia University
Career Days 2012 @ Sofia UniversityCareer Days 2012 @ Sofia University
Career Days 2012 @ Sofia UniversityMarin Dimitrov
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceMarin Dimitrov
 
Linked Data Marketplaces
Linked Data MarketplacesLinked Data Marketplaces
Linked Data MarketplacesMarin Dimitrov
 

Plus de Marin Dimitrov (20)

Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...
 
Mapping Your Career Journey
Mapping Your Career JourneyMapping Your Career Journey
Mapping Your Career Journey
 
Open Source @ Uber
Open Source @ Uber Open Source @ Uber
Open Source @ Uber
 
Trust - the Key Success Factor for Teams & Organisations
Trust - the Key Success Factor for Teams & OrganisationsTrust - the Key Success Factor for Teams & Organisations
Trust - the Key Success Factor for Teams & Organisations
 
Uber @ Telerik Academy 2018
Uber @ Telerik Academy 2018Uber @ Telerik Academy 2018
Uber @ Telerik Academy 2018
 
Machine Learning @ Uber
Machine Learning @ UberMachine Learning @ Uber
Machine Learning @ Uber
 
Career Advice for My Younger Self
Career Advice for My Younger SelfCareer Advice for My Younger Self
Career Advice for My Younger Self
 
Scaling Your Engineering Organization with Distributed Sites
Scaling Your Engineering Organization with Distributed SitesScaling Your Engineering Organization with Distributed Sites
Scaling Your Engineering Organization with Distributed Sites
 
Building, Scaling and Leading High-Performance Teams
Building, Scaling and Leading High-Performance TeamsBuilding, Scaling and Leading High-Performance Teams
Building, Scaling and Leading High-Performance Teams
 
Uber @ Career Days 2017 (Sofia University)
Uber @ Career Days 2017 (Sofia University)Uber @ Career Days 2017 (Sofia University)
Uber @ Career Days 2017 (Sofia University)
 
GraphDB Connectors – Powering Complex SPARQL Queries
GraphDB Connectors – Powering Complex SPARQL QueriesGraphDB Connectors – Powering Complex SPARQL Queries
GraphDB Connectors – Powering Complex SPARQL Queries
 
Low-cost Open Data As-a-Service
Low-cost Open Data As-a-ServiceLow-cost Open Data As-a-Service
Low-cost Open Data As-a-Service
 
Text Analytics & Linked Data Management As-a-Service
Text Analytics & Linked Data Management As-a-ServiceText Analytics & Linked Data Management As-a-Service
Text Analytics & Linked Data Management As-a-Service
 
RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4
 
Scaling to Millions of Concurrent SPARQL Queries on the Cloud
Scaling to Millions of Concurrent SPARQL Queries on the CloudScaling to Millions of Concurrent SPARQL Queries on the Cloud
Scaling to Millions of Concurrent SPARQL Queries on the Cloud
 
Crossing the Chasm with Semantic Technology
Crossing the Chasm with Semantic TechnologyCrossing the Chasm with Semantic Technology
Crossing the Chasm with Semantic Technology
 
Delivering Linked Data Training to Data Science Practitioners
Delivering Linked Data Training to Data Science PractitionersDelivering Linked Data Training to Data Science Practitioners
Delivering Linked Data Training to Data Science Practitioners
 
Career Days 2012 @ Sofia University
Career Days 2012 @ Sofia UniversityCareer Days 2012 @ Sofia University
Career Days 2012 @ Sofia University
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
 
Linked Data Marketplaces
Linked Data MarketplacesLinked Data Marketplaces
Linked Data Marketplaces
 

Dernier

Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Dernier (20)

Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Semantic Technologies for Big Data

  • 1. Semantic Technologies for Big Data Marin Dimitrov (Ontotext) XML Amsterdam 2012
  • 2. XML Amsterdam 2012 Semantic Technologies for Big Data Sep 2012 #2
  • 3. About Ontotext • Provides products and services for creating, managing and exploiting semantic data – Founded in 2000 – Offices in Bulgaria, USA and UK • Major clients and industries – Media & Publishing (BBC, Press Association) – HCLS (AstraZeneca, UCB) – Cultural Heritage (The British Museum, The National Archives, Polish National Museum, Dutch Public Library) – Defense and Homeland Security Semantic Technologies for Big Data Sep 2012 #3
  • 4. Outline • Semantic Technologies for the Enterprise • Semantic Technologies for Big Data • Success stories Semantic Technologies for Big Data Sep 2012 #4
  • 5. SEMANTIC TECHNOLOGIES FOR THE ENTERPRISE Semantic Technologies for Big Data Sep 2012 #5
  • 6. The need for a smarter Web • "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.“ (Tim Berners-Lee, 2001) • “PricewaterhouseCoopers believes a Web of data will develop that fully augments the document Web of today. You’ll be able to find pieces of data sets from different places, aggregate them without warehousing, and analyze them in a more straightforward, powerful way than you can now.” (PWC, May 2009) Semantic Technologies for Big Data Sep 2012 #6
  • 7. Linked Data • Linked Data is a set of principles that allows publishing, querying and consumption of RDF data, distributed across different servers • Design principles – Use unambiguous identifiers for resources (URIs) – Use HTTP URIs (dereference-able) – Provide useful information for URI lookups – Interlink resources Semantic Technologies for Big Data Sep 2012 #7
  • 8. The Semantic Web timeline RDF RDF 2 DAML+OIL OWL OWL 2 SPARQL SPARQL 1.1 RIF RDFa SAWSDL LOD SKOS HCLS SSN RDB2RDF PIL GLD LDP 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 Semantic Technologies for Big Data Sep 2012 #8
  • 9. Enterprise Information Management Challenges • Many disparate data sources and data silos • Many point-to-point interfaces • Data sources with similar/inconsistent information • Complex data integration processes inadequate for changing business requirements • Most of the knowledge is hidden in texts • Difficult to integrate & analyse structured data and text Semantic Technologies for Big Data Sep 2012 #9
  • 10. Semantic Web and Linked Data Opportunities for the Enterprise • Simplify the information integration processes – Flexible, easy to evolve data model – Bottom-up / incremental integration – Efficiently integrate structured and unstructured data • Provide an enterprise metadata layer – Unified metadata vocabulary for the enterprise – Align the legacy data silos – Improve the information sharing and reuse Semantic Technologies for Big Data Sep 2012 #10
  • 11. Semantic Web and Linked Data Opportunities for the Enterprise (2) • Discovery and enrichment of information – Interlink people, organisations, events, etc. – Enrich enterprise content with structured annotations – Discover implicit links and relationships • Unified access to information within the enterprise – Simplified infrastructure based on open web standards • Information interchange across a value chain – Easy publishing and consumption of Linked Data • Augments existing IT assets and technologies – No need for disruptive replacement Semantic Technologies for Big Data Sep 2012 #11
  • 12. XML and RDF: friends or foes • Complement each other – XML best for content, structure and interchange format – RDF for metadata layer and semantics • Typical use case – Many XML content data sources • Content stored in an XML store (XQuery and XSLT) – Structured data sources & external Linked Data • RDF-ized and stored in an RDF store (SPARQL) – Metadata extracted from content • stored in an RDF store (SPARQL) • semantic search and metadata driven content delivery Semantic Technologies for Big Data Sep 2012 #12
  • 13. BBC Sports (c) BBC Semantic Technologies for Big Data Sep 2012 #13
  • 14. Added value of RDF • Explicit semantics – Intended meaning of entities and relations • Global identifiers (URIs) • Simple and flexible graph-based data model • Easier data mapping & integration – Bottom-up / incremental data integration with owl:sameAs • Inference of implicit information • Working with distributed information – Linked Data, federated SPARQL Semantic Technologies for Big Data Sep 2012 #14
  • 15. Added value of RDF • Descriptive / agile schema – Open World Assumption, don’t restrict predicates – Generated dynamically from data • Queries based on meaning – Not depending on structure / order of statements • Data and queries may use different vocabularies • Exploratory queries • Choice of OWL2 profiles – Tradeoff features vs performance – New profiles may emerge in the future Semantic Technologies for Big Data Sep 2012 #15
  • 16. SEMANTIC TECHNOLOGIES FOR BIG DATA Semantic Technologies for Big Data Sep 2012 #16
  • 17. The three V’s of Big Data • Velocity – Streaming, sensor, real-time data – Solution: distributed processing & storage – Semantic challenge: stream reasoning • Volume – Petabytes of data – Solution: distributed processing & storage – Semantic challenge: distributed reasoning & querying • Variety – Structured, semi-structured and unstructured data – Semantic Technologies (RDF) are a good fit Semantic Technologies for Big Data Sep 2012 #17
  • 18. Types of Big Data (NIST) • Type 1 – Velocity (-), Volume (-), Variety (+) – Perfect fit for Semantic Technologies • Type 2 – Velocity and/or Volume, Variety (-) – Only horizontal scalability required, traditional approaches are a good enough fit • Type 3 – All V’s – Semantic Technologies not a good fit yet, but moving in that direction Semantic Technologies for Big Data Sep 2012 #18
  • 19. Semantic Technologies for Volume and Velocity • Promising ongoing research • Distributed inference with Hadoop/Storm • Stream reasoning – Continuous queries – Continuous (dynamic) semantics • SPARQL to Pig translation • Distributed RDF stores on top of NoSQL • C-SPARQL, EP-SPARQL, CQELS Semantic Technologies for Big Data Sep 2012 #19
  • 20. Linked Open Data Cloud (Sep 2011) (c) Cyganiak & Jentzsch Semantic Technologies for Big Data Sep 2012 #20
  • 21. From Big Linked Data to Linked Big Data • Big Linked Data – Big Data approach adopted by the Linked Data community • In particular handling Volume and Velocity – Exponential growth of Linked Data in the last 5 years • Linked Big Data – Linked Data approach adopted by the Big Data community – RDF data model for Variety – Enrich Big Data with metadata and semantics – more powerful analytics on top of it – Interlink Big Data sets – Simplify data access and data integration Semantic Technologies for Big Data Sep 2012 #21
  • 22. SUCCESS STORIES Semantic Technologies for Big Data Sep 2012 #22
  • 23. Typical Use Cases for Linked Data and Semantic Technologies • Publish / consume Linked Data across enterprises – Linked Data is not necessarily free data – Facilitate data interchange within the value chain • Information integration within the enterprise – Integrated asset management / align data silos – Master Data Management • Knowledge discovery and semantic search – Integrate structured and unstructured data – Enrich and interlink information – Semantic search and exploration of information Semantic Technologies for Big Data Sep 2012 #23
  • 24. Semantic Information Integration (Ontotext) Semantic Technologies for Big Data Sep 2012 #24
  • 25. The National Archives (Ontotext) • Challenge – Large archive of various UK Government websites since 1997 – Lots of duplicated information & documents – Inefficient search & navigation • Semantic Knowledge Base project goals – Integrate multiple data sources – Extract information & metadata from archived documents – Interlink the web archive with data.gov.uk and LOD data – Advanced search & navigation of the archive Semantic Technologies for Big Data Sep 2012 #25
  • 26. The National Archives (Ontotext) Front Ends: Semantic Search O1 SPARQL A 3rd party C O2 Ontology graph B D Editors exploration O3 Data Trans- formation and Semantic Repository Integration Semantic Annotation SKB Ontologies Factual Knowledge (TNA data, LOD, data.gov.uk) Identity Semantic annotations Resolution Annotation Process (GATE Teamware) Semantic Index Semantic Technologies for Big Data Sep 2012 #26
  • 27. The National Archives (Ontotext) • The numbers – 2.5 billion input files – 40TB compressed archive data – 10 billion RDF triples stored in OWLIM – 33,000 EC2 hours used on AWS – Dynamic EC2 cluster (180 instances average, 500 max) • Major challenges – Complex pre-processing of documents – De-duplication of information & documents – EC2/RRS performance & reliability Semantic Technologies for Big Data Sep 2012 #27
  • 28. Dutch Public Library (Ontotext + Dayon) • Challenge – Many disparate data sources, inefficient search • Goals – Data integration – Automated metadata generation – Open search platform • Numbers – 500 heterogeneous data sources – 40 million cultural heritage artifacts to be describes – 6-8 billion triples to be stored into the knowledge base Semantic Technologies for Big Data Sep 2012 #28
  • 29. Linked Life Data (Ontotext) • Challenge – Disparate, heterogeneous and unaligned data silos lock valuable biomedical information • Goals – Semantic warehouse integrating and interlinking public biomedical data sources – Interactive discovery and exploration • Numbers – 25+ heterogeneous biomedical data sources integrated – 1 billion entities described – 5.5 billion RDF triples Semantic Technologies for Big Data Sep 2012 #29
  • 30. Linked Life Data (Ontotext) Semantic Technologies for Big Data Sep 2012 #30
  • 31. Linked Life Data-as-a-Service (Ontotext) • More data sources • Large scale text mining over the LOD cloud • Adapted for specific use cases • UCB use case – 2 billion entities described – 11 billion RDF triples Semantic Technologies for Big Data Sep 2012 #31
  • 32. Dynamic Semantic Publishing (Ontotext) • Challenge – Difficult & slow to aggregate content from various sources • Goals – Metadata generation for news (semantic annotation) – Interlink & categorize content – Metadata driven web pages • Numbers – Nearly real-time processing & annotation required – Tens of millions (SPARQL) queries to the knowledge base per day Semantic Technologies for Big Data Sep 2012 #32
  • 33. Trillion RDF triples (Franz Inc.) • Use case – Use RDF for the customer management database of a telecom • Challenge – 4,000 triples per customer, more than a trillion for the whole customer base • Numbers – 1 trillion triples stored in AllegroGraph by Franz Inc • Hardware requirements undisclosed • The 310 billion triple result used 8-CPU system with 2TB RAM Semantic Technologies for Big Data Sep 2012 #33
  • 34. uRiKA (Cray/YarcData) • Big Data appliance for graph analytics – Based on the Threadstormtm architecture – Up to 8K processors, 512TB RAM, 350TB/hr IO throughput • In-memory RDF database • SPARQL 1.0 engine Semantic Technologies for Big Data Sep 2012 #34 (c) YarcData
  • 35. TAKEAWAYS Semantic Technologies for Big Data Sep 2012 #35
  • 36. Semantic Technologies for Big Data • Rich ecosystem of Semantic Technologies since 1999 • Strong Enterprise focus in the last 5 years • Semantic Technologies provide opportunity for reducing the cost and complexity of data integration • Common metadata layer for the enterprise • More powerful ways to find and explore information • RDF complements XML within the enterprise • Semantic Technologies are a good fit for Big Data’s Variety Semantic Technologies for Big Data Sep 2012 #36
  • 37. Semantic Technologies for Big Data • Velocity and Volume still challenging for Semantic Technologies, but lots of progress in that direction • Linked Data will grow into Big Linked Data, but Big Data will also benefit from evolving into Linked Big Data • Interesting success stories for Semantic Technologies in Big Data scenarios Semantic Technologies for Big Data Sep 2012 #37
  • 38. THANK YOU! Semantic Technologies for Big Data Sep 2012 #38