Marie-Aude Aufaure keynote ieee cist 2014

Challenges
and
opportuni1es
induced
by
Big
Data
and
Open
Data
for
Business
Intelligence
Keynote
@
IEEE
CIST’2014
Marie-‐Aude
AUFAURE
20/10/2014
IEEE
CIST
conference
2014
1

Agenda
• EvoluDon
of
business
intelligence
– SemanDc
Business
Intelligence
– Real-‐Time
Business
Intelligence
• Challenges
and
opportuniDes:
– Taking
into
account
unstructured
data
20/10/2014
IEEE
CIST
conference
2014
2

Business
Intelligence
• Business
Intelligence
(BI)
refers
to
a
set
of
tools
and
methods
dedicated
to
collecDng,
represenDng
and
analyzing
data
to
support
decision-‐making
in
enterprises.
• BI
is
defined
as
the
ability
for
an
organizaDon
to
take
all
input
data
and
convert
them
into
knowledge,
ulDmately,
providing
the
right
informaDon
to
the
right
people
at
the
right
Dme
via
the
right
channel.
20/10/2014
IEEE
CIST
conference
2014
3

EvoluDon
of
Business
Intelligence
Output
User
InteracDon
Store
Gathering
InformaDon
Data
sources
Seman1c
Business
Intelligence
Visual
analyDcs
Flexible
queries
/
SPARQL
C
Triple
Sore
SemanDc
ETL/Batch
processing
Structured/unstructured
data
Classical
Business
Intelligence
StaDc
report
Ad-‐hoc
queries
AnalyDcs
C
Data
Warehouse
ETL/Batch
processing
databases
Real-‐1me
Business
Intelligence
Real-‐Dme
analyDcs
Databases/
Triplestores
Real
Dme
visual-‐analyDcs
Knowledge
enrichment
ConDnuous
queries/
Business
rules
SemanDc
ETL
stream
processing
Load
shedding
sensors
Data
streams
Retro-‐
acDon
StaDc
data
20/10/2014
IEEE
CIST
conference
2014
4

Change
factors
• Data
heterogeneity
20/10/2014
IEEE
CIST
conference
2014
5

Change
factors
• The
way
we
interact
together
and
with
data/
informaDon
20/10/2014
IEEE
CIST
conference
2014
6

BI
needs
to
focus
on:
• Being
simple
to
use
• Turning
any
data
into
informaDon/acDonable
knowledge
• Empowering
collabora1on
• Being
integrated
with
the
business
processes
20/10/2014
IEEE
CIST
conference
2014
7

EvoluDon
of
Business
Intelligence
Output
User
InteracDon
Store
Gathering
InformaDon
Data
sources
Seman1c
Business
Intelligence
Visual
analyDcs
Flexible
queries
/
SPARQL
C
Triple
Sore
SemanDc
ETL/Batch
processing
data
Real-‐1me
Business
Intelligence
Real-‐Dme
analyDcs
Databases/
Triplestores
Real
Dme
visual-‐analyDcs
Knowledge
enrichment
ConDnuous
queries/
Business
rules
SemanDc
ETL
stream
processing
Load
shedding
sensors
Data
streams
Retro-‐
acDon
StaDc
data
Classical
Business
Intelligence
StaDc
report
Ad-‐hoc
queries
AnalyDcs
C
Data
Warehouse
ETL/Batch
processing
databases
20/10/2014
IEEE
CIST
conference
2014
8

And
now?
Big
Data
Open
Data
/Linked
Data
Connected
objects
20/10/2014
IEEE
CIST
conference
2014
9

Aspect
Characteris1cs
Challenges
and
technological
answers
Volume
More
visible
aspect
of
b i g
d a t a
b u t
l e s s
challenging
Storage
Virtualisa1on
in
data
centers,
generalizaDon
of
cloud-‐based
soluDons
NoSQL
Solu1ons
for
storing
and
querying
highly
distributed
data
Velocity
Data
produced
and
collected
in
a
shorter
Dme
window
Real-‐1me
Plateforms
Connected
objects
will
increase
volume
but
also
real-‐Dme
needs
Variety
MulDplicaDon
of
data
sources,
from
structured
data
to
free
text
New
data
stores
intégraDng
lexibles
data
models
Collect
and
analyze
unstructured
data
Value
More
subjecDve
aspect
dealing
withe
the
non
exploitaDon
of
these
massive
datasets
Transform
raw
data
into
valuable
informaDon
New
Business
models
20/10/2014
IEEE
CIST
conference
2014
10

Open
data
• An
open
data
is
a
digital
data
public
or
private
and
published
in
a
way
allowing
user
to
freely
access
and
reuse,
without
any
technical,
jridic
or
financial
restricDon.
• Examples
:
data
on
public
transportaDon,
cartography,
les
staDsDcs,
géography,
la
sociology,
environnement,
etc.
• Governemental
wave
in
the
2000:
– data.gov
project
in
2009,
USA
– European
DirecDve
in
2003
on
reuse
of
public
data
– In
France
Etalab
(2011)
is
in
charge
of
data.gouv.fr,
an
open
data
portail
for
public
data..
• Benefits
for
the
public
sector
:
– Transparency,
costs
reducDon,
beher
services
• Economic
benefits:
– Access
to
data,
mainly
for
SMEs
20/10/2014
IEEE
CIST
conference
2014
11
!
!!

Connected
objetcs
:
smart
applicaDons
Connected
Health
Quan1fied-‐self
Connected
car
Smart
ci1es
Smart
grids
20/10/2014
IEEE
CIST
conference
2014
12

More
and
more
connected
objects
20/10/2014
IEEE
CIST
conference
2014
13

Connected
Cars
• 200
Millions
véhicules
equiped
with
Android
Auto
or
Apple
Carplay
in
2020
• Emergency
call
• Eco-‐driving
• Autonomous
Véhicule
• Assistancy
• Towards
automaDc
driving
• 54
millions
vehicles
totally
or
parDally
automated
in
2035
(source:
HIS
AutomoDve/
Polk)
20/10/2014
IEEE
CIST
conference
2014
14

Big
Data
:
Challenges?
• Vector
of
innovaDon
– DisrupDve
technologies:
cloud,
internet
of
things,
AnalyDcs
– Open
InnovaDon
• Enhancement
of
producDvity,
services
and
compeDDvity
– Public
services,
«
sokware-‐intensive
»
companies
• Economic
impact
– Benefits
for
the
analysis
of
internal
and
external
data
– New
jobs
• Big
Data
Centres
of
excellence
(Hack/Reduce
in
Boston)
20/10/2014
IEEE
CIST
conference
2014
15

BIG
DATA:
SOCIETAL
CHALLENGES
• Big
Data
for
Society:
can
we
expect
a
posiDve
impact
on
society?
• Generate
acDonable
informaDon
that
can
be
used
to
idenDfy
needs,
provide
services,
and
predict
and
prevent
crisis
for
the
benefit
of
populaDons.
• Health
and
well-‐being,
environment,
energy,
climate
change,
etc.
20/10/2014
IEEE
CIST
conference
2014
16

BIG
DATA:
ENERGY
CHALLENGE
•
supercomputeurs
20/10/2014
IEEE
CIST
conference
2014
17

BIG
DATA:
TECHNOLOGICAL
CHALLENGES
• Data
storage
:
data
centers,
cloud
infrastructures,
noSQL
databases,
in-‐memory
databases
• Data
processing
:
supercomputers,
distributed
or
massively
parallel-‐compuDng
20/10/2014
IEEE
CIST
conference
2014
18

Some
scienDfic
challenges
• Big
data
analyDcs
• Context
management
• VisualizaDon
and
Human-‐Computer
Interfaces
• Algorthms
distribuDon
• CorrelaDons
and
causality
• Real-‐Dme
analysis
of
data
streams
• ValidaDon,
trust
20/10/2014
IEEE
CIST
conference
2014
19

Big
Data
value
chain
Source
:
InternaDonal
Working
Group
on
Data
ProtecDon
in
TelecommunicaDons
20/10/2014
IEEE
CIST
conference
2014
20

PotenDal
of
Big
Data
Analysis
• Adapt
and
enhance
services
and
processes
– TransportaDon
and
logisDc
– Online
EducaDon
– Job
seeking
– SenDment
analysis
and
customers/ciDzens
needs
– Enhancement
of
public
services
– E-‐markeDng
• OpDmize
performances
– Assist
decision-‐making
– Less
resources
consumpDon
– Fraud
detecDon
• Predict
and
prevent
– Health
– Needs
anDcipaDon
– Security
20/10/2014
IEEE
CIST
conference
2014
21

BIG
DATA:
USE
CASES
20/10/2014
IEEE
CIST
conference
2014
22

Big
Data
opportuniDes
Source:
Big
Data
opportuniDes
survey,
Unisphere
/
SAP,
May
2013.
20/10/2014
IEEE
CIST
conference
2014
23

PredicDve
analyDcs:
flu
trends
United
states
Flu
AcDvity
United
States
Data
Google
Flu
Trends
es1mate
20/10/2014
IEEE
CIST
conference
2014
24

360-‐degree
view
of
the
customer
Why?
What?
Who?
When/ How?
Where?
OperaDonal
data
Behavioral
data
DescripDve
data
InteracDon
Contextual
data
data
20/10/2014
IEEE
CIST
conference
2014
25

Types
of
data
used
in
Big
Data
iniDaDves
Internal
data
Tradi,onal
sources
«
New
data
»
Source:
Big
Data
opportuniDes
survey,
Unisphere
/
SAP,
May
2013.
20/10/2014
IEEE
CIST
conference
2014
26

EvoluDon
of
Business
Intelligence
Output
User
InteracDon
Store
Gathering
InformaDon
Data
sources
Seman1c
Business
Intelligence
Visual
analyDcs
Flexible
queries
/
SPARQL
C
Triple
Sore
SemanDc
ETL
Batch
processing
data
Real-‐1me
Business
Intelligence
Real-‐Dme
analyDcs
Databases/
Triplestores
(
Real
Dme
visual-‐analyDcs
Knowledge
enrichment
ConDnuous
queries/
Business
rules
SemanDcETL
stream
processing
Load
shedding
sensors
Data
stream
Retro-‐
acDon
StaDc
data
Classical
Business
Intelligence
StaDc
report
Ad-‐hoc
queries
AnalyDcs
C
Data
Warehouse
ETL
Batch
processing
databases
20/10/2014
IEEE
CIST
conference
2014
27

Coping
with
unstructured
data
SemanDc
BI
SemanDc
Technologies
for
Bi
Data
Social
Networks
20/10/2014
IEEE
CIST
conference
2014
28

Unstructured
data
analyDcs
process
Data
• Web
content
• Ontologies
• Social
data
• Logs
• Texts
• Pictures,
etc.
Collect
• Web
crawling
• Web
scraping
• API
(Twiher,
Google,
…)
• Clics
(logs)
• Crowdsourcing
(Mechanical
Turk)
ExtracDon
/
StructuraDon
• SemanDc
ETL
• Named
enDDes
• lexico-‐syntacDc
paherns
• Dependancy
trees
• N-‐grams
Analyze
• clustering
• Galois
larce
• Unsupervised
and
supervised
learning
20/10/2014
Séminaire
Big
Data
29

SEMANTIC
BI
AND
VISUAL
ANALYTICS:
THE
FP7
CUBIST
PROJECT
20/10/2014
IEEE
CIST
conference
2014
30

CUBIST:
Combining
and
UniDng
Business
Intelligence
with
SemanDc
Technologies
flexible
and
visual
queries
/
analyDcs
databases
Forums,
blogs
office
SemanDc
ETL
Office
docs
Triple
Store
Exploitable
Results
Seman1c
Business
Intelligence
Comprehensive
Informa1on
Access
Means
Advanced
Visual
Analy1cs
■
Searching,
exploring,
analyzing
data
■
qualitaDve
data
analysis
■ graph-‐based
visualizaDons
No
exis1ng
solu1ons
from
BI-‐vendors
Seman1cally
enriched
BI
■ using
a
triple
store
for
BI
■ using
ontologies
as
schema
Partly
addressed
by
BI-‐
or
ST-‐vendors
BI
over
both
structured
and
unstructured
data
■ text
analyDcs
■ linking
unstructured
and
structured
sources
Already
addressed/developed
by
BI-‐vendors
20/10/2014
IEEE
CIST
conference
2014
31

Formal
Concept
Analysis
32
• Formal
Concept
Analysis
is
a
method
used
for
invesDgaDng
and
processing
explicitely
given
informaDon
– An
analysis
of
data
– Structures
of
formal
abstracDons
of
concepts
of
human
thought
– Formal
emphasizes
that
the
concepts
are
mathemaDcal
objects,
rather
than
concepts
of
mind
– Formal
Concept
Analysis
help
to
draw
inferences,
to
group
objects,
and
hence
to
create
concepts
• Visual
representaDon
by
a
Hasse
Diagram
20/10/2014
IEEE
CIST
conference
2014

Charts,
Graphs,
FCA
for
BI:
A
Toy
Example
Skill
Persons
with
that
Skill
IE
Anja,
Ben,
Ernst,
Fred,
Ken
ETL
Chris,
Fred,
Mark
BI
Ben,
Chris,
Fred,
Lemmy,
Mark,
Naomi
ST
Anja,
Diana,
Ernst,
Fred,
Gerald,
Harriet,
Ken,
Owen
FCA
Anja,
Diana,
Gerald,
Harriet,
Ian,
John,
Ken,
Owen
VIZ
Anja,
Diana,
Ian
Possible
Informa1on
Needs:
1) Show
me
the
count
of
people
for
a
given
skill
2) Show
me
the
skills
and
how
many
people
share
some
skills,
in
order
to
get
an
idea
on
how
strongly
skills
are
related
3) Show
me
the
skills
and
people
such
that
I
get
an
idea
of
the
distribuDon
of
skills
among
people
and
dependencies
between
skills
20/10/2014
IEEE
CIST
conference
2014
33

ConverDng
the
data
(analyDc
model)
Raw
Data
Bar
Chart
Data
CounDng
the
number
of
people
per
skill
Skill
Persons
with
that
Skill
IE
Anja,
Ben,
Ernst,
Fred,
Ken
ETL
Chris,
Fred,
Mark
BI
Ben,
Chris,
Fred,
Lemmy,
Mark,
Naomi
ST
Anja,
Diana,
Ernst,
Fred,
Gerald,
Harriet,
Ken,
Owen
FCA
Anja,
Diana,
Gerald,
Harriet,
Ian,
John,
Ken,
Owen
VIZ
Anja,
Diana,
Ian
Graph
Data
FCA
Data
(Formal
Context)
CounDng
the
number
of
people
who
share
two
skills
20/10/2014
IEEE
CIST
conference
2014
34

Visualizing
the
data
Raw
Data
Bar
Chart
Skill
Persons
with
that
Skill
IE
Anja,
Ben,
Ernst,
Fred,
Ken
ETL
Chris,
Fred,
Mark
BI
Ben,
Chris,
Fred,
Lemmy,
Mark,
Naomi
ST
Anja,
Diana,
Ernst,
Fred,
Gerald,
Harriet,
Ken,
Owen
FCA
Anja,
Diana,
Gerald,
Harriet,
Ian,
John,
Ken,
Owen
VIZ
Anja,
Diana,
Ian
Graph
FCA
Concept
La^ce
20/10/2014
IEEE
CIST
conference
2014
35

Some
InformaDon
which
can
be
read
off
Bar
Chart
Graph
FCA
la^ce
§ ST
and
FCA
are
the
skills
most
people
have
§ ETL
and
VIZ
are
the
skills
least
people
have
§ The
skills
FCA
and
ST
are
strongly
related
§ Because
the
link
between
them
is
strong
§ The
skills
FCA
and
IE
are
only
weakly
related
§ Because
the
link
between
them
is
weak
§ No
one
has
knowledge
on
both
FCA
and
ETL
§ Because
there
is
no
link
between
FCA
and
ETL
§ Owen,
Harriet
and
Gerald
have
exactly
the
same
skills
§ Because
they
belong
to
the
same
node
§ Whoever
is
skilled
in
ETL
is
skilled
in
BI,
too
§ Because
the
BI-‐node
is
above
the
ETL-‐node
§ Anja
has
more
skills
than
Ken,
and
Ken
has
more
skills
than
Ernst
§ Because
the
nodes
are
ordered
that
way
20/10/2014
IEEE
CIST
conference
2014
36

Comparison
Bar
Chart
Graph
FCA
la^ce
Ý Many
well-‐known
visualizaDons
Ý Good
(readable
and
comprehensible)
layouts
Ý Good
for
analyzing
numbers
Þ Loss
of
informaDon
(what
people)
Þ Misleading
for
overlapping
ahributes
(counDng
people
manifold)
Þ Not
uDlizing
relaDonships
between
enDDes
Ý AhracDve
visualizaDons
Ý (RelaDvely)
easy
to
understand
Ý UDlizing
and
showing
links
between
enDDes
(skills)
Þ Loss
of
informaDon
(what
people)
Þ Bad
for
analyzing
numbers
Þ Number
of
nodes
might
explode
Þ Finding
good
layout
is
unsolved
(nice
layout
in
example
is
accidenDal
and
has
been
manually
created)
Þ Unfamiliar
means
for
analyDcs
Þ Scalability
Þ Bad
for
analyzing
numbers
Ý No
loss
of
informaDon
Ý Meaningful
clusters
in
one
node
Ý Showing
dependencies
between
enDDes
(both
people
and
skills)
20/10/2014
IEEE
CIST
conference
2014
37

Which
visualizaDon
should
I
choose?
Remember
the
informa1on
needs
from
the
beginning
Show
me
the
skills
and
how
many
people
share
some
skills,
in
order
to
get
an
idea
on
how
strongly
skills
are
related
Show
me
the
skills
and
people
such
that
I
get
an
idea
of
the
distribuDon
of
skills
among
people
and
dependencies
between
skills
Show
me
the
count
of
people
for
a
given
skill
Conclusion
§ Each
visualizaDon
has
its
own
strengths
and
weaknesses
§ Each
type
of
visualizaDon
is
suited
for
a
specific
type
of
informaDon
needs
§ Thus
the
visualizaDons
are
complemenDng
§ Thus
future
BI
tools
should
provide
all
types
of
visualizaDons
20/10/2014
IEEE
CIST
conference
2014
38

Can
you
understand
this?
39
Traffic
accidents
dataset:
34
ahributes,
150
objects,
344
concepts
–
minimal
edge
crossing
layout
20/10/2014
IEEE
CIST
conference
2014

Visual
AnalyDcs
• Visual
analyDcs
supports
human
judgment
by
means
of
visual
representaDons
and
interacDon
techniques
[Keim
et
al.
2001]
• “Overview
first,
zoom
and
filter,
then
details-‐
on-‐demand.”[Shneiderman,
1996]
• Visual
AnalyDcs
for
FCA
combines:
– TradiDonal
BI
operaDons
and
visualizaDons
– Concept
Larce
transformaDon
and
visualizaDon
20/10/2014
IEEE
CIST
conference
2014
40

FCA-‐based
Visual
AnalyDcs
41
• Idea:
Create
visual
analyDcs
for
large
contexts
– Context
reducDon
– Allow
visual
queries
through
selecDon
and
filtering
– Dynamic
visualizaDon
– Visual
exploraDon
becomes
a
navigaDon
problem
20/10/2014
IEEE
CIST
conference
2014

Cubix:
A
Visual
AnalyDcs
tool
for
FCA
42
• Combines
interac1ve
features
to
overcome
drawbacks
of
single
techniques
• Features
– VisualisaDons
– Dashboard
– Metrics
– Filtering
&
Search
– Clustering
– Tree-‐ExtracDon
Publica0on:
ICDM
2012
[Melo
et
al.]
live:
cubix.alwaysdata.com
20/10/2014
IEEE
CIST
conference
2014

Summary
of
VisualisaDons
Analysis
Task
Data
Visualisa1on
Co-‐occurence
analysis
Concept
Larce
Enhanced
Hasse
diagram
Exploratory
Hierarchical
analysis
Tree
from
the
concept
larce
Sunburst
Frequent
itemsets
analysis
Ahributes
and
objects
matrix
Concept
stacking
(matrix)
SimulaDon
parameters
analysis
MulD-‐valued
ahributes
Heatmap
larce
ImplicaDon
analysis
AssociaDon
Rules
Radial/Matrix
visualisaDon
for
AssociaDon
Rules
20/10/2014
IEEE
CIST
conference
2014
43

Coming
back
to
ease
of
use
• Cubix
was
experimented
on
three
use
cases
– The
workflow
(data
selecDon,
scaling,
filtering
and
analysis)
needed
to
be
simplified
• User
creaDon
of
AnalyDcs
– Leading
to
«
BI
as
a
service
»
• AutomaDc
recommendaDon
of
VisualizaDon
and
gadgets:
– Decision
tree
• Based
on
the
data
type
and
volume
– CollaboraDve
filtering
• Based
on
other
user’s
preferences
for
similar
datasets
– Supervised
Learning
methods
• Based
on
users
profile
and
history
20/10/2014
IEEE
CIST
conference
2014
44

Coping
with
big
data
for
FCA
• ReducDon
techniques
– Filtering
(support,
stability)
• Distributed
compuDng
of
concepts
• Mining
Formal
Concepts
over
data
streams
• Visual
AnalyDcs
– New
metaphors
for
large
data
– Data
overview
view:
dashboards
• Filtering
20/10/2014
IEEE
CIST
conference
2014
45

SemanDc
Technologies
for
Big
Data
20/10/2014
IEEE
CIST
conference
2014
46

SemanDc
Technologies
for
Big
Data
• Data-‐driven
approaches
(structure
learning,
data
mining,
staDsDcal
approaches)
are
not
always
sufficient
to
find
all
correlaDons
among
parameters
• SemanDc
approaches
can
provide
complementary
informaDon:
–
Simplify
the
informaDon
integraDon
process
–
Provide
a
unified
metadata
layer
–
Discover
and
enrich
informaDon
–
Provide
a
unified
access
to
informaDon
20/10/2014
IEEE
CIST
conference
2014
47

SemanDc
processing
• helping
to
make
sense
of
large
or
complex
sets
of
data
without
being
supplied
with
any
knowledge
about
the
data
• Turning
any
data
into
informaDon/acDonable
knowledge
• Some
examples:
– NLP
technologies
– Data
Mining
– ArDficial
Intelligence
– ClassificaDon
– SemanDc
Search
20/10/2014
IEEE
CIST
conference
2014
48

SemanDc
technologies
/
SemanDc
Web
• "The
Seman0c
Web
is
an
extension
of
the
current
web
in
which
informa0on
is
given
well-‐defined
meaning,
beKer
enabling
computers
and
people
to
work
in
coopera0on.“
(Tim
Berners-‐Lee,
2001)
• Standards
include:
– a
flexible
data
model
(RDF)
– schema
and
ontology
languages
for
describing
concepts
and
relaDonships
(RDFS
and
OWL)
– a
query
language
(SPARQL)
• Use
of
semanDc
technologies
in
semanDc
processing
(e.g.
semanDc
search)
• Use
of
semanDc
technologies
for
storing
and
querying
data
(triple
store
and
SPARQL)
20/10/2014
IEEE
CIST
conference
2014
49

SemanDc
Data
AggregaDon
and
Linking
for
Big
Data
• Transforming
unstructu
red
content
into
a
structured
format
for
later
analysis
is
a
major
challenge.
• The
value
of
data
explodes
when
it
can
be
linked
with
other
data,
thus
data
integraDon
is
a
major
creator
of
value
• Data
aggregaDon
from
various
sources
can
establish
the
veracity
• SemanDc
technologies
are
a
way
of
addressing
variety
20/10/2014
IEEE
CIST
conference
2014
50

Linked
Data
/
Web
of
Data
• Linked
Data
is
a
set
of
principles
that
allows
publishing,
querying
and
consump1on
of
RDF
data,
distributed
across
different
servers
• Not
necessarily
free
/
open
data
• ExponenDal
growth
-‐>
a
Big
Data
approach:
enriching
Big
Data
with
metadata
&
semanDcs,
interlinking
Big
Data
sets
• PricewaterhouseCoopers,
2009:
«
You’ll
be
able
to
find
pieces
of
data
sets
from
different
places,
aggregate
them
without
warehousing,
and
analyse
them
in
a
more
straighSorward,
powerful
way
»
20/10/2014
IEEE
CIST
conference
2014
51

SemanDc
Technologies
for
Big
Data
• Natural
Language
Processing
(NLP)
• Ontology
Engineering
techniques
• SemanDc
enrichment:
– AddiDon
of
contextual
informaDon
– SemanDc
annotaDon
– Data
categorizaDon
/
classificaDon
– Improved
informaDon
retrieval
– Reasoning
20/10/2014
IEEE
CIST
conference
2014
52

SemanDc
Data
AggregaDng
and
Linking
for
Big
Data
Ontologies
Linked Open Data
Linked Open Data
Structured Non-structured
LAYER
Documents
DATA Web pages
Sensor data
Textual content Social Media
KNOWLEDGE LAYER
SemanDc
aggregaDon
SemanDc
Enrichment
and
disambiguaDon
Linking
data
Database
20/10/2014
IEEE
CIST
conference
2014
53

LOD-‐Based
SemanDc
Enrichment
Structured
Big
Data
20/10/2014
IEEE
CIST
conference
2014
54

Pahern-‐based
Technique
Query
=“Olive
Garden"+“Darden
Rest"
The
first
owner
of
[Olive
Garden]
was
the
famous
[Darden
Rest]VAL
20/10/2014
IEEE
CIST
conference
2014
55

SemanDc
Enrichment
¢ Ownership
Subject
(owned,X),
object
(owned,Y)
20/10/2014
IEEE
CIST
conference
2014
56

Value
of
SemanDc
Technologies
• SemanDc
Technologies
provide
opportuniDes
for
reducing
the
cost
and
complexity
of
data
integraDon
• Common
metadata
layer
• Powerful
soluDons
to
find
and
explore
informaDon
• SemanDc
Technologies
are
a
good
fit
for
Big
Data’s
Variety
• Velocity
and
Volume:
challenging
issues
for
SemanDc
Technologies
• Linked
Data
will
grow
into
Big
Linked
Data,
but
Big
Data
will
also
benefit
from
evolving
into
Linked
Big
Data
20/10/2014
IEEE
CIST
conference
2014
57

Social
Networks
20/10/2014
IEEE
CIST
conference
2014
58

Graphs
everywhere
IEEE
CIST
conference
2014
59
- Social networks
- Web
- Enterprise databases
- Biology
- Etc.
20/10/2014
Simple
management
of
structured,
semi-‐structured
and
unstructured
informaDon
Rela1onal
databases
XML Web

Graphs:
what
can
we
do
with?
• Traversing
linked
informaDon,
finding
shortest
path,
doing
(semanDc)
parDDon
• RecommendaDon
and
discovery
of
potenDally
interesDng
linked
informaDon
• Exploit
the
graph
structure
of
large
repositories
– Web
environment
– Digital
documents
repositories
– Databases
with
metadata
• Use
cases
:
recommendaDon,
social
networks
IEEE
CIST
20/10/2014
conference
2014
60

Graphs
for
Social
networks:
enterprises
use
case
• A
technology
for
internal
communicaDon,
informaDon
sharing
and
collaboraDon
• A
technology
for
informaDon
communicaDon
towards
clients
– Vote
for
the
best
product,
– Understand
the
clients
needs
• A
technology
for
watching
the
gossip
– E-‐reputaDon,
opinion
mining
• A
technology
for
creaDng
collecDve
intelligence
– CollaboraDve
common
knowledge
– Wikis
and
blogs
associated
to
social
networks
20/10/2014
IEEE
CIST
conference
2014
61

Graphs
for
Social
networks:
public
administraDons
use
case
• Public
administraDons
need
social
networks:
– As
enterprises:
• To
analyze
internal
networks
(projects,
organizaDon…)
• To
analyze
external
networks
(suppliers,
clients,
partners…)
– As
an
interface
for
ciDzens:
• To
be
well-‐understood
by
ciDzens
(who
does
what)
• To
understand
ciDzens
(who
says
what)
• Scenarios
examples:
– Need
to
look
over
the
organizaDonal
structure
(employees,
departments,
transversal
projects)
and
idenDfy
costs
– Need
for
ciDzens
to
understand
the
impact
of
public
poliDcs
(offered
services,
available
resources
for
each
district
of
the
city,
which
projects
are
the
most
relevant,
ciDzens
complains)
– Opinion
analysis
from
external
social
networks
(Twiher
for
example)
20/10/2014
IEEE
CIST
conference
2014
62

Social
web
–
Social
Networks
• The
Social
SemanDc
Web
combines
technologies,
strategies
and
methodologies
from
the
SemanDc
Web,
social
sokware
and
the
Web
2.0.
• Web
2.0
allows
users
to
express
their
opinion
on
products
and
services
• Understanding
“what
people
think”
can
support
decision-‐making,
both
for
consumers
and
producers
20/10/2014
IEEE
CIST
conference
2014
63

SenDment
Analysis
–
Opinion
mining
Find
out
what
other
people
think.
Is
it
possible?
What does it mean opinion mining?
The beginning of wisdom is the definition of terms! (socrates)
Today, vendors, practitioners, and the media alike call this still-nascent arena everything from
‘brand monitoring,’ ‘buzz monitoring’ and ‘online anthropology,’ to ‘market influence analytics,’
‘conversation mining’ and ‘online consumer intelligence’. . . . In the end, the term ‘social media
monitoring and analysis’ is itself a verbal crutch. It is placeholder [sic], to be used until
something better (and shorter) takes hold in the English language to describe the topic of this
report.
Zabin and Jefferies: “Social media monitoring and analysis: Generating
consumer insights from online conversation,”
20/10/2014
IEEE
CIST
conference
2014
64

Opinion
mining
–
possible
uses
Recommender systems (avoid recommending items that received a lot
of negative feedback).
Information Filtering
Business Intelligence (why aren’t consumers buying my laptop?).
Question answering (what did you want to say?)
Clarification of politicians positions!
eDemocracy…and so on
20/10/2014
IEEE
CIST
conference
2014
65

Opinion
mining
–
Sociology
who is positively or negatively disposed toward whom
Who would be more or less receptive to new information transmission
from a given source.
Structural balance theory: group cohesion and overall polarity among
people.
20/10/2014
IEEE
CIST
conference
2014
66

Opinion mining – The perfect tool
The development of a complete opinion-search application might involve
1) Determine which documents or portions of documents contain
opinionated material.
2) Identify the overall sentiment expressed by these documents and/
or the specific opinions regarding particular features or aspects of the
items or topics in question, as necessary.
3) Finally, the system needs to present the sentiment information
it has garnered in some reasonable summary fashion (aggregation
of “votes”, selective highlighting of some opinions, etc)

Opinion
mining
–
Polarity
A basic task in sentiment analysis is classifying the polarity of a given
text at the document, sentence, or feature/aspect level — whether
the expressed opinion in a document, a sentence or an entity feature/
aspect is positive, negative, or neutral.
A polarity is a real number quantifying the user’s positive, negative or
neutral opinion.
20/10/2014
IEEE
CIST
conference
2014
68

DetecDng
feature
senDment
in
user-‐
generated
reviews
It is not possible to summarize everything with a unique vote/
polarity ⇒ detect local polarities expressed about the salient
features of a considered domain.
Extract the most frequent domain-related features
Good
LocaDon,
Terrible
Food:
DetecDng
Feature
SenDment
in
User-‐Generated
Review
Cataldi
et
al,
2013
-‐
SNAM
20/10/2014
IEEE
CIST
conference
2014
69

Combining
staDsDcs
and
NLP
1) We
idenDfy
the
most
characterizing
aspects
of
one
domain
(hotels,
restaurant,
products)
by
analyzing
the
domain
corpus
and
extracDng
the
most
frequent
terms
(eventually
structuring
them
as
a
vocabulary
and/or
ontology)
2) We
formalize
the
content
of
each
review
as
a
dependency
tree
among
its
terms
and
retrieve
(if
they
exist)
the
features
discussed
within
it.
Then,
by
using
the
tree,
we
aim
at
discovering
all
the
other
terms
that
vehiculate
some
polarity
linguisDcally
connected
to
them.
20/10/2014
IEEE
CIST
conference
2014
70

E R
V
1 ,i φ
…
n i, φ 2 , i φ
Feature
Extractor
Raw
text
POS-‐
tagging
τ
Linguis1c
Parser
feature1
feature3
feature2
feature4
F
ranking
synset
WordNet
term
pos.
polar
neg.
polar
Synset
Polarity
computa1on
Subset
of
features
i F
in
G
feature1
Polarity
for
feature1
Sen1ment
Computa1on
Phrase
Structure
English
Corpus
Dep.
Graph
G
Feature
Set
Dep.
Graph
G
synset1
synset2
Synsets
in
G,
carrying
some
sen0ment,
referred
to
a
feature
in
i F
20/10/2014
IEEE
CIST
conference
2014
71

Graphs
and
social
networks
• Can
be
useful
for
many
applicaDons:
– E-‐reputaDon
and
trust
management
– Monitoring
of
social
networks
for
security
– RecommendaDon
of
corporate
data/informaDon
– Retail
Is
TwiKer
just
a
mirror
of
mass
sen0ment
or
is
it
also
able
to
influence
opinion
?
20/10/2014
IEEE
CIST
conference
2014
72

Conclusion
• Many
models
should
be
combined:
– Ontologies,
graphs,
formal
concepts,
predicDve
models
• Many
techniques
should
be
combined:
– Natural
language
processing
– Machine
learning
and
staDsDcs
– Ontology
engineering,
Linked
Data
Management
– Graphs
processing
– VisualizaDon
– Crowdsourcing,
scrapping
• For
SemanDc
Enrichment
20/10/2014
IEEE
CIST
conference
2014
73

Challenges
• SemanDc
InformaDon
aggregaDon
– Pahern
extracDon
from
streams
and
cross-‐analysis
– InformaDon
extracDon
from
Linked
Open
Data:
concepts
and
relaDons
linked
to
the
streams
paherns
– Opinion
aggregaDon
from
social
media
and
web
– Social
aspects
for
collaboraDon
– InformaDon
aggregaDon:
“too
much
data
to
assimilate
but
not
enough
knowledge
to
act”
• Distributed
and
real-‐Dme
processing
– Design
of
real-‐Dme
and
distributed
algorithms
for
stream
processing
and
informaDon
aggregaDon
– Storage
and
indexaDon
of
a
knowledge
base
– IntegraDon
of
business
processes
with
aggregated
informaDon
– DistribuDon
and
parallelizaDon
of
data
mining
algorithms
• visual
analyDcs
and
user
modeling
– Dynamic
user
model
– Novel
visualizaDons
for
very
large
datasets
20/10/2014
IEEE
CIST
conference
2014
74

QUESTIONS?
20/10/2014
IEEE
CIST
conference
2014
75

Marie-Aude Aufaure keynote ieee cist 2014

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Marie-Aude Aufaure keynote ieee cist 2014

Similaire à Marie-Aude Aufaure keynote ieee cist 2014 (20)

Dernier

Dernier (20)

Marie-Aude Aufaure keynote ieee cist 2014