The document discusses enabling on-demand access to big data through semantic technologies. It describes how semantic technologies like Linked Data and ontologies can be used to virtually integrate and provide access to large, heterogeneous datasets across different data silos. The key points are that semantic technologies allow for big data to be accessed and analyzed on-demand in a self-service manner through a "Linked Data as a Service" approach, providing scalable end user access to big data.
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
On demand access to Big Data through Semantic Technologies
1. ON
DEMAND
ACCESS
TO
BIG
DATA
THROUGH
SEMANTIC
TECHNOLOGIES
Peter
Haase
fluid
Opera/ons
AG!
2. fluid
Opera/ons
(fluidOps)
Linked
Data
&
Seman;c
Technologies
Enterprise
Cloud
Compu;ng
So7ware
company
founded
Q1/2008
by
team
of
serial
entrepreneurs,
privately
held,
VC
funded
Headquarters
in
Walldorf
/
Germany,
SAP
Partner
Port
Currently
45
employees
Named
“Cool
Vendor”
by
Gartner
Mar
2010
Global
reseller
agreement
with
EMC
focus
large
enterprise
customers
Apr
2010
NetApp
Advantage
Alliance
Partner
Oct
2010
3. Outline
• Big
Data
Challenges:
Beyond
Volume
• Seman;c
Technologies
for
Big
Data
Challenges
• On
Demand
Data
Access
in
a
Self-‐service
Process
• Outlook:
Op;que
-‐
Scalable
End
User
Access
to
Big
Data
4. Big
Data
“Big
data
consists
of
data
sets
that
grow
so
large
that
they
become
awkward
to
work
with
using
on-‐hand
database
management
tools.”
(Wikipedia)
• 12
terabytes
of
Tweets
created
daily
• 30
terabytes
of
telescope
data
each
night
• 350
billion
meter
readings
• ...
5. Op/que
Case
Study:
Statoil
Explora/on
Experts
in
geology
and
geophysics
develop
stra/graphic
models
of
unexplored
areas.
• Based
on
produc/on
and
explora/on
data
from
nearby
loca/ons
• Analy/cs
on:
• 1,000
TB
of
rela/onal
data
• using
diverse
schemata
• spread
over
2,000
tables
• spread
over
mul/ple
individual
data
bases
• 900
experts
in
Statoil
Explora/on
• up
to
4
days
for
new
data
access
queries
• assistance
from
IT-‐experts
required
8. Semantic Technologies for
Horizontal Big Data
Linked
Data
• Set
of
standards,
principles
for
publishing,
sharing
and
interrela/ng
structured
data:
RDF
as
data
model,
SPARQL
for
querying
• Graph-‐based
data
model
for
achieving
higher
degree
of
variety
• Seman/cally
interlink
data
scabered
among
different
informa/on
spaces:
from
data
silos
to
a
Web
of
Data
Ontologies
• For
describing
the
seman;cs
of
the
data
• As
conceptual
models
for
end-‐user
oriented
access
• For
the
integra;on
of
heterogeneous
sources
• For
(light-‐weight)
reasoning
8
9. On
Demand
Access
to
Big
Data
Enabling
on
demand
data
access
1. discovery
of
relevant
data
sources
2. automated
integra/on
and
interlinking
of
sources,
and
3. interac/ve
explora/on
and
ad
hoc
analysis
of
data
=>
Linked
Data
as
a
Service
10. Everything
as
a
Service
• Abstract
from
physical
implementa;on
details
and
loca;on
of
resources
• Regardless
of
geographic
or
organiza/onal
separa;on
of
provider
and
consumer
• “In
the
cloud”
Data as a Service
• Web
based
• Virtualized
Software as a Service
• On-‐demand
• Self-‐service
Platform as a Service
• Scalable
• Pay
as
you
go
Infrastructure as a Service
11. Linked
Data
as
a
Service
“Like
all
members
of
the
"as
a
Service”
family,
DaaS
is
based
on
the
concept
that
the
product,
data
in
this
case,
can
be
provided
on
demand
to
the
user
regardless
of
geographic
or
organiza/onal
separa/on
of
provider
and
consumer.”
Source:
Wikipedia
• Data
virtualiza;on
supported
by
Linked
Data
principles
1. Use
URIs
as
names
for
things
2. Use
HTTP
URIs
so
that
people
can
look
up
those
names.
3. When
someone
looks
up
a
URI,
provide
useful
informa/on,
using
the
standards:
RDF,
SPARQL
4. Include
links
to
other
URIs,
to
discover
more
things.
• Linked
Data
as
abstrac/on
layer
for
virtualized
data
access
across
data
spaces
• Enables
data
portability
across
current
data
silos
• Plaform
independent
data
access
• Basis
for
enabling
automa;on
of
discovery,
composi;on,
and
use
of
datasets
11
12. Informa/on
Workbench
-‐
Linked
Data
Plaform
Informa;on
Workbench:
§ Seman/cs-‐
&
Linked
Data-‐based
integra;on
of
private
and
public
data
sources
§ Intelligent
Data
Access
and
Analy;cs
§ Visual
Explora/on
§ Seman/c
Search
§ Dashboarding
and
Repor/ng
§ Collabora;on
and
knowledge
management
plaform
§ Wiki-‐based
cura/on
&
authoring
of
data
Seman/c
Web
Data
§ Collabora/ve
workflows
12
13. Enabling
Data
Discovery:
Metadata
about
Data
Sets
• Metadata
about
data
sources
essen/al
for
dynamic
discovery
• Based
on
metadata
vocabularies
(VoID,
DCAT)
• Access
to
data
registered
at
global
registries,
e.g.
ckan.org,
data.gov,
…
• Sort/filter
data
sets
by
topic,
license,
size
and
many
more
facets
to
iden/fy
relevant
data
• Visually
explore
data
sets
14. FedX
Federated
Query
Processing
1) Involve
only
relevant
sources
in
the
evalua/on
Problem:
Subqueries
are
sent
to
all
sources,
although
poten/ally
irrelevant
2) Compute
joins
close
to
the
data
Problem:
All
joins
are
executed
locally
in
a
nested
loop
fashion
3) Reduce
remote
communica/on
Problem:
Nested
loop
join
causes
many
remote
requests
14
15. Enabling
Data
Composi/on:
FederaAon
of
Virtualized
Data
Sources
Applica'on
Layer
Virtualiza'on
Layer
Data
Layer
SPARQL
SPARQL
SPARQL
SPARQL
Endpoint
Endpoint
Endpoint
Endpoint
Metadata
Registry
Data
Source
Data
Source
Data
Source
Data
Source
See
also:
FedX:
Op'miza'on
Techniques
for
Federated
Query
Processing
on
Linked
Data
(ISWC2011)
16. Enabling
On
Demand
Use:
Self-‐service
Linked
Data
Frontend
• Seman/c
Wiki
as
user
frontend
• Declara;ve
specifica;on
of
the
UI
based
on
available
pool
of
widgets
and
declara/ve
wiki-‐based
syntax
• Widgets
have
direct
access
to
the
DB
• Type-‐based
template
mechanism
• Ad
hoc
data
explora/on,
visualiza/on,
analy/cs,
dashboards,
...
Wiki
Page
in
Edit
Mode
…
…
and
Displayed
Result
Page
17. Informa/on
Workbench
–
Linked
Data
as
a
Service
ApplicaAon
Areas
Knowledge
Management
in
the
Life
Sciences
Digital
Libraries,
Media
and
Content
Management
Intelligent
Data
Center
Management
18. Example:
Linked
Data
in
Pharma
Main
Use
Cases
• Integrate
data
from
company-‐internal
Search,
Interrogate
and
Visualize,
Analyze
and
Capture
and
Augment
Reason
Explore
Knowledge
data
silos
• Augment
company-‐
Integrated
data
graph
over
all
data
sources
internal
data
with
Integ
Linked
Open
Data
• Collabora/ve
knowledge
management
• Support
of
internal
processes
(drug
development)
Private
Data
Sources
Public
Data
Sources
19. Example:
Data
Center
Management
• Support
collabora;ve
opera;ons
management
in
the
data
center
• Link
business
data
to
technical
data
• Technical
Documenta/on
• Analy/cs
and
Repor/ng
• Performance
and
Capacity
Monitoring
• Responsibility
Management
• Resource
Management
• Change
Management
• Technical
Ticke/ng
System
19
20. Example:
A
Cloud
Portal
for
Access
to
Open
Data
with
the
Informa/on
Workbench
Goal
...
using
the
• Collect
meta
data
from
global
data
markets
(LOD
fluid
OperaAons
Cloud,
WorldBank,
CKAN,
…)
• Allow
integrated
search
and
ad
hoc
integra/on
of
data
Technology
Stack
sources
from
different
repositories
• Link
data
with
private/internal
data
sources,
if
desired
• Support
semi-‐automated
linking
between
data
sets
• Provide
visualiza/on,
explora/on,
and
analy/cs
func/onality
on
top
of
integrated
data
sources
Realiza;on
• Currently
running
project
with
the
Hasso
Plabner
Ins/tute
(Potsdam,
Germany)
• Create
local
repository
containing
data
market
metadata
• Use
self-‐service
technology
to
make
services
publicly
available
+
Informa/on
Workbench
for
analy/cs
21. Informa/on
Workbench:
Linked
Data
as
a
Service
in
a
Cloud
Plaform
Architecture
Applica'on
Layer
(SaaS)
Provisioning,
Monitoring
and
Management
Virtualiza'on
Layer
Infrastructure
Layer
(IaaS)
Data
Layer
(DaaS)
Netw.-‐Ab.
Storage
Network
Compu/ng
Resources
Enterprise
Data
Sources Open
Data
Sources
22. Provisioning,
Monitoring
and
Management
Applica'on
Layer
(SaaS)
Virtualiza'on
Layer
Infrastructure
Layer
(IaaS)
Data
Layer
(DaaS)
Netw.-‐Ab.
Storage
Network
Compu/ng
Resources
Enterprise
Data
Sources Open
Data
Sources
Self-‐service
Data
Integra/on
Self-‐service
UI
&
Data
Discovery
Deployment
&
Federa/on
Analy/cs
• Self-‐service
deployment
• On
demand
access
to
• Virtualized
data
• Living
UI,
composed
of
the
InformaAon
private
and
public
access
from
seman/cs-‐aware
Workbench
in
the
cloud
data
sources
• Dynamic
integra/on
&
widgets
• Pay-‐per-‐use
• Dynamic
Discovery
federa/on
of
data
• Ad
hoc
data
• Scalability
on
demand
sources
explora/on,
visualiza/on,
analy/cs
26. Summary
• Big
Data
means
more
than
volume
and
ver/cal
scale
• Seman/c
Technologies
for
Big
Data
management
• Linked
Data
as
adequate
data
model
• Ontologies
as
conceptual
models
to
access
big
data
• Integra/on
of
diverse,
heterogeneus
data
sources
• Linked
Data
as
a
Service
for
enabling
on
demand
data
access
1. discovery
of
relevant
data
sources
2. automated
integra/on
and
interlinking
of
sources,
and
3. interac/ve
explora/on
and
ad
hoc
analysis
of
data
• Outlook:
Op/que
–
Scalable
end-‐user
access
to
Big
Data