SlideShare une entreprise Scribd logo
1  sur  75
Télécharger pour lire hors ligne
Challenges 
and 
opportuni1es 
induced 
by 
Big 
Data 
and 
Open 
Data 
for 
Business 
Intelligence 
Keynote 
@ 
IEEE 
CIST’2014 
Marie-­‐Aude 
AUFAURE 
20/10/2014 
IEEE 
CIST 
conference 
2014 
1
Agenda 
• EvoluDon 
of 
business 
intelligence 
– SemanDc 
Business 
Intelligence 
– Real-­‐Time 
Business 
Intelligence 
• Challenges 
and 
opportuniDes: 
– Taking 
into 
account 
unstructured 
data 
20/10/2014 
IEEE 
CIST 
conference 
2014 
2
Business 
Intelligence 
• Business 
Intelligence 
(BI) 
refers 
to 
a 
set 
of 
tools 
and 
methods 
dedicated 
to 
collecDng, 
represenDng 
and 
analyzing 
data 
to 
support 
decision-­‐making 
in 
enterprises. 
• BI 
is 
defined 
as 
the 
ability 
for 
an 
organizaDon 
to 
take 
all 
input 
data 
and 
convert 
them 
into 
knowledge, 
ulDmately, 
providing 
the 
right 
informaDon 
to 
the 
right 
people 
at 
the 
right 
Dme 
via 
the 
right 
channel. 
20/10/2014 
IEEE 
CIST 
conference 
2014 
3
EvoluDon 
of 
Business 
Intelligence 
Output 
User 
InteracDon 
Store 
Gathering 
InformaDon 
Data 
sources 
Seman1c 
Business 
Intelligence 
Visual 
analyDcs 
Flexible 
queries 
/ 
SPARQL 
C 
Triple 
Sore 
SemanDc 
ETL/Batch 
processing 
Structured/unstructured 
data 
Classical 
Business 
Intelligence 
StaDc 
report 
Ad-­‐hoc 
queries 
AnalyDcs 
C 
Data 
Warehouse 
ETL/Batch 
processing 
databases 
Real-­‐1me 
Business 
Intelligence 
Real-­‐Dme 
analyDcs 
Databases/ 
Triplestores 
Real 
Dme 
visual-­‐analyDcs 
Knowledge 
enrichment 
ConDnuous 
queries/ 
Business 
rules 
SemanDc 
ETL 
stream 
processing 
Load 
shedding 
sensors 
Data 
streams 
Retro-­‐ 
acDon 
StaDc 
data 
20/10/2014 
IEEE 
CIST 
conference 
2014 
4
Change 
factors 
• Data 
heterogeneity 
20/10/2014 
IEEE 
CIST 
conference 
2014 
5
Change 
factors 
• The 
way 
we 
interact 
together 
and 
with 
data/ 
informaDon 
20/10/2014 
IEEE 
CIST 
conference 
2014 
6
BI 
needs 
to 
focus 
on: 
• Being 
simple 
to 
use 
• Turning 
any 
data 
into 
informaDon/acDonable 
knowledge 
• Empowering 
collabora1on 
• Being 
integrated 
with 
the 
business 
processes 
20/10/2014 
IEEE 
CIST 
conference 
2014 
7
EvoluDon 
of 
Business 
Intelligence 
Output 
User 
InteracDon 
Store 
Gathering 
InformaDon 
Data 
sources 
Seman1c 
Business 
Intelligence 
Visual 
analyDcs 
Flexible 
queries 
/ 
SPARQL 
C 
Triple 
Sore 
SemanDc 
ETL/Batch 
processing 
Structured/unstructured 
data 
Real-­‐1me 
Business 
Intelligence 
Real-­‐Dme 
analyDcs 
Databases/ 
Triplestores 
Real 
Dme 
visual-­‐analyDcs 
Knowledge 
enrichment 
ConDnuous 
queries/ 
Business 
rules 
SemanDc 
ETL 
stream 
processing 
Load 
shedding 
sensors 
Data 
streams 
Retro-­‐ 
acDon 
StaDc 
data 
Classical 
Business 
Intelligence 
StaDc 
report 
Ad-­‐hoc 
queries 
AnalyDcs 
C 
Data 
Warehouse 
ETL/Batch 
processing 
databases 
20/10/2014 
IEEE 
CIST 
conference 
2014 
8
And 
now? 
Big 
Data 
Open 
Data 
/Linked 
Data 
Connected 
objects 
20/10/2014 
IEEE 
CIST 
conference 
2014 
9
Aspect 
Characteris1cs 
Challenges 
and 
technological 
answers 
Volume 
More 
visible 
aspect 
of 
b i g 
d a t a 
b u t 
l e s s 
challenging 
Storage 
Virtualisa1on 
in 
data 
centers, 
generalizaDon 
of 
cloud-­‐based 
soluDons 
NoSQL 
Solu1ons 
for 
storing 
and 
querying 
highly 
distributed 
data 
Velocity 
Data 
produced 
and 
collected 
in 
a 
shorter 
Dme 
window 
Real-­‐1me 
Plateforms 
Connected 
objects 
will 
increase 
volume 
but 
also 
real-­‐Dme 
needs 
Variety 
MulDplicaDon 
of 
data 
sources, 
from 
structured 
data 
to 
free 
text 
New 
data 
stores 
intégraDng 
lexibles 
data 
models 
Collect 
and 
analyze 
unstructured 
data 
Value 
More 
subjecDve 
aspect 
dealing 
withe 
the 
non 
exploitaDon 
of 
these 
massive 
datasets 
Transform 
raw 
data 
into 
valuable 
informaDon 
New 
Business 
models 
20/10/2014 
IEEE 
CIST 
conference 
2014 
10
Open 
data 
• An 
open 
data 
is 
a 
digital 
data 
public 
or 
private 
and 
published 
in 
a 
way 
allowing 
user 
to 
freely 
access 
and 
reuse, 
without 
any 
technical, 
jridic 
or 
financial 
restricDon. 
• Examples 
: 
data 
on 
public 
transportaDon, 
cartography, 
les 
staDsDcs, 
géography, 
la 
sociology, 
environnement, 
etc. 
• Governemental 
wave 
in 
the 
2000: 
– data.gov 
project 
in 
2009, 
USA 
– European 
DirecDve 
in 
2003 
on 
reuse 
of 
public 
data 
– In 
France 
Etalab 
(2011) 
is 
in 
charge 
of 
data.gouv.fr, 
an 
open 
data 
portail 
for 
public 
data.. 
• Benefits 
for 
the 
public 
sector 
: 
– Transparency, 
costs 
reducDon, 
beher 
services 
• Economic 
benefits: 
– Access 
to 
data, 
mainly 
for 
SMEs 
20/10/2014 
IEEE 
CIST 
conference 
2014 
11 
! 
!!
Connected 
objetcs 
: 
smart 
applicaDons 
Connected 
Health 
Quan1fied-­‐self 
Connected 
car 
Smart 
ci1es 
Smart 
grids 
20/10/2014 
IEEE 
CIST 
conference 
2014 
12
More 
and 
more 
connected 
objects 
20/10/2014 
IEEE 
CIST 
conference 
2014 
13
Connected 
Cars 
• 200 
Millions 
véhicules 
equiped 
with 
Android 
Auto 
or 
Apple 
Carplay 
in 
2020 
• Emergency 
call 
• Eco-­‐driving 
• Autonomous 
Véhicule 
• Assistancy 
• Towards 
automaDc 
driving 
• 54 
millions 
vehicles 
totally 
or 
parDally 
automated 
in 
2035 
(source: 
HIS 
AutomoDve/ 
Polk) 
20/10/2014 
IEEE 
CIST 
conference 
2014 
14
Big 
Data 
: 
Challenges? 
• Vector 
of 
innovaDon 
– DisrupDve 
technologies: 
cloud, 
internet 
of 
things, 
AnalyDcs 
– Open 
InnovaDon 
• Enhancement 
of 
producDvity, 
services 
and 
compeDDvity 
– Public 
services, 
« 
sokware-­‐intensive 
» 
companies 
• Economic 
impact 
– Benefits 
for 
the 
analysis 
of 
internal 
and 
external 
data 
– New 
jobs 
• Big 
Data 
Centres 
of 
excellence 
(Hack/Reduce 
in 
Boston) 
20/10/2014 
IEEE 
CIST 
conference 
2014 
15
BIG 
DATA: 
SOCIETAL 
CHALLENGES 
• Big 
Data 
for 
Society: 
can 
we 
expect 
a 
posiDve 
impact 
on 
society? 
• Generate 
acDonable 
informaDon 
that 
can 
be 
used 
to 
idenDfy 
needs, 
provide 
services, 
and 
predict 
and 
prevent 
crisis 
for 
the 
benefit 
of 
populaDons. 
• Health 
and 
well-­‐being, 
environment, 
energy, 
climate 
change, 
etc. 
20/10/2014 
IEEE 
CIST 
conference 
2014 
16
BIG 
DATA: 
ENERGY 
CHALLENGE 
• 
supercomputeurs 
20/10/2014 
IEEE 
CIST 
conference 
2014 
17
BIG 
DATA: 
TECHNOLOGICAL 
CHALLENGES 
• Data 
storage 
: 
data 
centers, 
cloud 
infrastructures, 
noSQL 
databases, 
in-­‐memory 
databases 
• Data 
processing 
: 
supercomputers, 
distributed 
or 
massively 
parallel-­‐compuDng 
20/10/2014 
IEEE 
CIST 
conference 
2014 
18
Some 
scienDfic 
challenges 
• Big 
data 
analyDcs 
• Context 
management 
• VisualizaDon 
and 
Human-­‐Computer 
Interfaces 
• Algorthms 
distribuDon 
• CorrelaDons 
and 
causality 
• Real-­‐Dme 
analysis 
of 
data 
streams 
• ValidaDon, 
trust 
20/10/2014 
IEEE 
CIST 
conference 
2014 
19
Big 
Data 
value 
chain 
Source 
: 
InternaDonal 
Working 
Group 
on 
Data 
ProtecDon 
in 
TelecommunicaDons 
20/10/2014 
IEEE 
CIST 
conference 
2014 
20
PotenDal 
of 
Big 
Data 
Analysis 
• Adapt 
and 
enhance 
services 
and 
processes 
– TransportaDon 
and 
logisDc 
– Online 
EducaDon 
– Job 
seeking 
– SenDment 
analysis 
and 
customers/ciDzens 
needs 
– Enhancement 
of 
public 
services 
– E-­‐markeDng 
• OpDmize 
performances 
– Assist 
decision-­‐making 
– Less 
resources 
consumpDon 
– Fraud 
detecDon 
• Predict 
and 
prevent 
– Health 
– Needs 
anDcipaDon 
– Security 
20/10/2014 
IEEE 
CIST 
conference 
2014 
21
BIG 
DATA: 
USE 
CASES 
20/10/2014 
IEEE 
CIST 
conference 
2014 
22
Big 
Data 
opportuniDes 
Source: 
Big 
Data 
opportuniDes 
survey, 
Unisphere 
/ 
SAP, 
May 
2013. 
20/10/2014 
IEEE 
CIST 
conference 
2014 
23
PredicDve 
analyDcs: 
flu 
trends 
United 
states 
Flu 
AcDvity 
United 
States 
Data 
Google 
Flu 
Trends 
es1mate 
20/10/2014 
IEEE 
CIST 
conference 
2014 
24
360-­‐degree 
view 
of 
the 
customer 
Why? 
What? 
Who? 
When/ How? 
Where? 
OperaDonal 
data 
Behavioral 
data 
DescripDve 
data 
InteracDon 
Contextual 
data 
data 
20/10/2014 
IEEE 
CIST 
conference 
2014 
25
Types 
of 
data 
used 
in 
Big 
Data 
iniDaDves 
Internal 
data 
Tradi,onal 
sources 
« 
New 
data 
» 
Source: 
Big 
Data 
opportuniDes 
survey, 
Unisphere 
/ 
SAP, 
May 
2013. 
20/10/2014 
IEEE 
CIST 
conference 
2014 
26
EvoluDon 
of 
Business 
Intelligence 
Output 
User 
InteracDon 
Store 
Gathering 
InformaDon 
Data 
sources 
Seman1c 
Business 
Intelligence 
Visual 
analyDcs 
Flexible 
queries 
/ 
SPARQL 
C 
Triple 
Sore 
SemanDc 
ETL 
Batch 
processing 
Structured/unstructured 
data 
Real-­‐1me 
Business 
Intelligence 
Real-­‐Dme 
analyDcs 
Databases/ 
Triplestores 
( 
Real 
Dme 
visual-­‐analyDcs 
Knowledge 
enrichment 
ConDnuous 
queries/ 
Business 
rules 
SemanDcETL 
stream 
processing 
Load 
shedding 
sensors 
Data 
stream 
Retro-­‐ 
acDon 
StaDc 
data 
Classical 
Business 
Intelligence 
StaDc 
report 
Ad-­‐hoc 
queries 
AnalyDcs 
C 
Data 
Warehouse 
ETL 
Batch 
processing 
databases 
20/10/2014 
IEEE 
CIST 
conference 
2014 
27
Coping 
with 
unstructured 
data 
SemanDc 
BI 
SemanDc 
Technologies 
for 
Bi 
Data 
Social 
Networks 
20/10/2014 
IEEE 
CIST 
conference 
2014 
28
Unstructured 
data 
analyDcs 
process 
Data 
• Web 
content 
• Ontologies 
• Social 
data 
• Logs 
• Texts 
• Pictures, 
etc. 
Collect 
• Web 
crawling 
• Web 
scraping 
• API 
(Twiher, 
Google, 
…) 
• Clics 
(logs) 
• Crowdsourcing 
(Mechanical 
Turk) 
ExtracDon 
/ 
StructuraDon 
• SemanDc 
ETL 
• Named 
enDDes 
• lexico-­‐syntacDc 
paherns 
• Dependancy 
trees 
• N-­‐grams 
Analyze 
• clustering 
• Galois 
larce 
• Unsupervised 
and 
supervised 
learning 
20/10/2014 
Séminaire 
Big 
Data 
29
SEMANTIC 
BI 
AND 
VISUAL 
ANALYTICS: 
THE 
FP7 
CUBIST 
PROJECT 
20/10/2014 
IEEE 
CIST 
conference 
2014 
30
CUBIST: 
Combining 
and 
UniDng 
Business 
Intelligence 
with 
SemanDc 
Technologies 
flexible 
and 
visual 
queries 
/ 
analyDcs 
databases 
Forums, 
blogs 
office 
SemanDc 
ETL 
Office 
docs 
Triple 
Store 
Exploitable 
Results 
Seman1c 
Business 
Intelligence 
Comprehensive 
Informa1on 
Access 
Means 
Advanced 
Visual 
Analy1cs 
■ 
Searching, 
exploring, 
analyzing 
data 
■ 
qualitaDve 
data 
analysis 
■ graph-­‐based 
visualizaDons 
No 
exis1ng 
solu1ons 
from 
BI-­‐vendors 
Seman1cally 
enriched 
BI 
■ using 
a 
triple 
store 
for 
BI 
■ using 
ontologies 
as 
schema 
Partly 
addressed 
by 
BI-­‐ 
or 
ST-­‐vendors 
BI 
over 
both 
structured 
and 
unstructured 
data 
■ text 
analyDcs 
■ linking 
unstructured 
and 
structured 
sources 
Already 
addressed/developed 
by 
BI-­‐vendors 
20/10/2014 
IEEE 
CIST 
conference 
2014 
31
Formal 
Concept 
Analysis 
32 
• Formal 
Concept 
Analysis 
is 
a 
method 
used 
for 
invesDgaDng 
and 
processing 
explicitely 
given 
informaDon 
– An 
analysis 
of 
data 
– Structures 
of 
formal 
abstracDons 
of 
concepts 
of 
human 
thought 
– Formal 
emphasizes 
that 
the 
concepts 
are 
mathemaDcal 
objects, 
rather 
than 
concepts 
of 
mind 
– Formal 
Concept 
Analysis 
help 
to 
draw 
inferences, 
to 
group 
objects, 
and 
hence 
to 
create 
concepts 
• Visual 
representaDon 
by 
a 
Hasse 
Diagram 
20/10/2014 
IEEE 
CIST 
conference 
2014
Charts, 
Graphs, 
FCA 
for 
BI: 
A 
Toy 
Example 
Skill 
Persons 
with 
that 
Skill 
IE 
Anja, 
Ben, 
Ernst, 
Fred, 
Ken 
ETL 
Chris, 
Fred, 
Mark 
BI 
Ben, 
Chris, 
Fred, 
Lemmy, 
Mark, 
Naomi 
ST 
Anja, 
Diana, 
Ernst, 
Fred, 
Gerald, 
Harriet, 
Ken, 
Owen 
FCA 
Anja, 
Diana, 
Gerald, 
Harriet, 
Ian, 
John, 
Ken, 
Owen 
VIZ 
Anja, 
Diana, 
Ian 
Possible 
Informa1on 
Needs: 
1) Show 
me 
the 
count 
of 
people 
for 
a 
given 
skill 
2) Show 
me 
the 
skills 
and 
how 
many 
people 
share 
some 
skills, 
in 
order 
to 
get 
an 
idea 
on 
how 
strongly 
skills 
are 
related 
3) Show 
me 
the 
skills 
and 
people 
such 
that 
I 
get 
an 
idea 
of 
the 
distribuDon 
of 
skills 
among 
people 
and 
dependencies 
between 
skills 
20/10/2014 
IEEE 
CIST 
conference 
2014 
33
ConverDng 
the 
data 
(analyDc 
model) 
Raw 
Data 
Bar 
Chart 
Data 
CounDng 
the 
number 
of 
people 
per 
skill 
Skill 
Persons 
with 
that 
Skill 
IE 
Anja, 
Ben, 
Ernst, 
Fred, 
Ken 
ETL 
Chris, 
Fred, 
Mark 
BI 
Ben, 
Chris, 
Fred, 
Lemmy, 
Mark, 
Naomi 
ST 
Anja, 
Diana, 
Ernst, 
Fred, 
Gerald, 
Harriet, 
Ken, 
Owen 
FCA 
Anja, 
Diana, 
Gerald, 
Harriet, 
Ian, 
John, 
Ken, 
Owen 
VIZ 
Anja, 
Diana, 
Ian 
Graph 
Data 
FCA 
Data 
(Formal 
Context) 
CounDng 
the 
number 
of 
people 
who 
share 
two 
skills 
20/10/2014 
IEEE 
CIST 
conference 
2014 
34
Visualizing 
the 
data 
Raw 
Data 
Bar 
Chart 
Skill 
Persons 
with 
that 
Skill 
IE 
Anja, 
Ben, 
Ernst, 
Fred, 
Ken 
ETL 
Chris, 
Fred, 
Mark 
BI 
Ben, 
Chris, 
Fred, 
Lemmy, 
Mark, 
Naomi 
ST 
Anja, 
Diana, 
Ernst, 
Fred, 
Gerald, 
Harriet, 
Ken, 
Owen 
FCA 
Anja, 
Diana, 
Gerald, 
Harriet, 
Ian, 
John, 
Ken, 
Owen 
VIZ 
Anja, 
Diana, 
Ian 
Graph 
FCA 
Concept 
La^ce 
20/10/2014 
IEEE 
CIST 
conference 
2014 
35
Some 
InformaDon 
which 
can 
be 
read 
off 
Bar 
Chart 
Graph 
FCA 
la^ce 
§ ST 
and 
FCA 
are 
the 
skills 
most 
people 
have 
§ ETL 
and 
VIZ 
are 
the 
skills 
least 
people 
have 
§ The 
skills 
FCA 
and 
ST 
are 
strongly 
related 
§ Because 
the 
link 
between 
them 
is 
strong 
§ The 
skills 
FCA 
and 
IE 
are 
only 
weakly 
related 
§ Because 
the 
link 
between 
them 
is 
weak 
§ No 
one 
has 
knowledge 
on 
both 
FCA 
and 
ETL 
§ Because 
there 
is 
no 
link 
between 
FCA 
and 
ETL 
§ Owen, 
Harriet 
and 
Gerald 
have 
exactly 
the 
same 
skills 
§ Because 
they 
belong 
to 
the 
same 
node 
§ Whoever 
is 
skilled 
in 
ETL 
is 
skilled 
in 
BI, 
too 
§ Because 
the 
BI-­‐node 
is 
above 
the 
ETL-­‐node 
§ Anja 
has 
more 
skills 
than 
Ken, 
and 
Ken 
has 
more 
skills 
than 
Ernst 
§ Because 
the 
nodes 
are 
ordered 
that 
way 
20/10/2014 
IEEE 
CIST 
conference 
2014 
36
Comparison 
Bar 
Chart 
Graph 
FCA 
la^ce 
Ý Many 
well-­‐known 
visualizaDons 
Ý Good 
(readable 
and 
comprehensible) 
layouts 
Ý Good 
for 
analyzing 
numbers 
Þ Loss 
of 
informaDon 
(what 
people) 
Þ Misleading 
for 
overlapping 
ahributes 
(counDng 
people 
manifold) 
Þ Not 
uDlizing 
relaDonships 
between 
enDDes 
Ý AhracDve 
visualizaDons 
Ý (RelaDvely) 
easy 
to 
understand 
Ý UDlizing 
and 
showing 
links 
between 
enDDes 
(skills) 
Þ Loss 
of 
informaDon 
(what 
people) 
Þ Bad 
for 
analyzing 
numbers 
Þ Number 
of 
nodes 
might 
explode 
Þ Finding 
good 
layout 
is 
unsolved 
(nice 
layout 
in 
example 
is 
accidenDal 
and 
has 
been 
manually 
created) 
Þ Unfamiliar 
means 
for 
analyDcs 
Þ Scalability 
Þ Bad 
for 
analyzing 
numbers 
Ý No 
loss 
of 
informaDon 
Ý Meaningful 
clusters 
in 
one 
node 
Ý Showing 
dependencies 
between 
enDDes 
(both 
people 
and 
skills) 
20/10/2014 
IEEE 
CIST 
conference 
2014 
37
Which 
visualizaDon 
should 
I 
choose? 
Remember 
the 
informa1on 
needs 
from 
the 
beginning 
Show 
me 
the 
skills 
and 
how 
many 
people 
share 
some 
skills, 
in 
order 
to 
get 
an 
idea 
on 
how 
strongly 
skills 
are 
related 
Show 
me 
the 
skills 
and 
people 
such 
that 
I 
get 
an 
idea 
of 
the 
distribuDon 
of 
skills 
among 
people 
and 
dependencies 
between 
skills 
Show 
me 
the 
count 
of 
people 
for 
a 
given 
skill 
Conclusion 
§ Each 
visualizaDon 
has 
its 
own 
strengths 
and 
weaknesses 
§ Each 
type 
of 
visualizaDon 
is 
suited 
for 
a 
specific 
type 
of 
informaDon 
needs 
§ Thus 
the 
visualizaDons 
are 
complemenDng 
§ Thus 
future 
BI 
tools 
should 
provide 
all 
types 
of 
visualizaDons 
20/10/2014 
IEEE 
CIST 
conference 
2014 
38
Can 
you 
understand 
this? 
39 
Traffic 
accidents 
dataset: 
34 
ahributes, 
150 
objects, 
344 
concepts 
– 
minimal 
edge 
crossing 
layout 
20/10/2014 
IEEE 
CIST 
conference 
2014
Visual 
AnalyDcs 
• Visual 
analyDcs 
supports 
human 
judgment 
by 
means 
of 
visual 
representaDons 
and 
interacDon 
techniques 
[Keim 
et 
al. 
2001] 
• “Overview 
first, 
zoom 
and 
filter, 
then 
details-­‐ 
on-­‐demand.”[Shneiderman, 
1996] 
• Visual 
AnalyDcs 
for 
FCA 
combines: 
– TradiDonal 
BI 
operaDons 
and 
visualizaDons 
– Concept 
Larce 
transformaDon 
and 
visualizaDon 
20/10/2014 
IEEE 
CIST 
conference 
2014 
40
FCA-­‐based 
Visual 
AnalyDcs 
41 
• Idea: 
Create 
visual 
analyDcs 
for 
large 
contexts 
– Context 
reducDon 
– Allow 
visual 
queries 
through 
selecDon 
and 
filtering 
– Dynamic 
visualizaDon 
– Visual 
exploraDon 
becomes 
a 
navigaDon 
problem 
20/10/2014 
IEEE 
CIST 
conference 
2014
Cubix: 
A 
Visual 
AnalyDcs 
tool 
for 
FCA 
42 
• Combines 
interac1ve 
features 
to 
overcome 
drawbacks 
of 
single 
techniques 
• Features 
– VisualisaDons 
– Dashboard 
– Metrics 
– Filtering 
& 
Search 
– Clustering 
– Tree-­‐ExtracDon 
Publica0on: 
ICDM 
2012 
[Melo 
et 
al.] 
live: 
cubix.alwaysdata.com 
20/10/2014 
IEEE 
CIST 
conference 
2014
Summary 
of 
VisualisaDons 
Analysis 
Task 
Data 
Visualisa1on 
Co-­‐occurence 
analysis 
Concept 
Larce 
Enhanced 
Hasse 
diagram 
Exploratory 
Hierarchical 
analysis 
Tree 
from 
the 
concept 
larce 
Sunburst 
Frequent 
itemsets 
analysis 
Ahributes 
and 
objects 
matrix 
Concept 
stacking 
(matrix) 
SimulaDon 
parameters 
analysis 
MulD-­‐valued 
ahributes 
Heatmap 
larce 
ImplicaDon 
analysis 
AssociaDon 
Rules 
Radial/Matrix 
visualisaDon 
for 
AssociaDon 
Rules 
20/10/2014 
IEEE 
CIST 
conference 
2014 
43
Coming 
back 
to 
ease 
of 
use 
• Cubix 
was 
experimented 
on 
three 
use 
cases 
– The 
workflow 
(data 
selecDon, 
scaling, 
filtering 
and 
analysis) 
needed 
to 
be 
simplified 
• User 
creaDon 
of 
AnalyDcs 
– Leading 
to 
« 
BI 
as 
a 
service 
» 
• AutomaDc 
recommendaDon 
of 
VisualizaDon 
and 
gadgets: 
– Decision 
tree 
• Based 
on 
the 
data 
type 
and 
volume 
– CollaboraDve 
filtering 
• Based 
on 
other 
user’s 
preferences 
for 
similar 
datasets 
– Supervised 
Learning 
methods 
• Based 
on 
users 
profile 
and 
history 
20/10/2014 
IEEE 
CIST 
conference 
2014 
44
Coping 
with 
big 
data 
for 
FCA 
• ReducDon 
techniques 
– Filtering 
(support, 
stability) 
• Distributed 
compuDng 
of 
concepts 
• Mining 
Formal 
Concepts 
over 
data 
streams 
• Visual 
AnalyDcs 
– New 
metaphors 
for 
large 
data 
– Data 
overview 
view: 
dashboards 
• Filtering 
20/10/2014 
IEEE 
CIST 
conference 
2014 
45
SemanDc 
Technologies 
for 
Big 
Data 
20/10/2014 
IEEE 
CIST 
conference 
2014 
46
SemanDc 
Technologies 
for 
Big 
Data 
• Data-­‐driven 
approaches 
(structure 
learning, 
data 
mining, 
staDsDcal 
approaches) 
are 
not 
always 
sufficient 
to 
find 
all 
correlaDons 
among 
parameters 
• SemanDc 
approaches 
can 
provide 
complementary 
informaDon: 
– 
Simplify 
the 
informaDon 
integraDon 
process 
– 
Provide 
a 
unified 
metadata 
layer 
– 
Discover 
and 
enrich 
informaDon 
– 
Provide 
a 
unified 
access 
to 
informaDon 
20/10/2014 
IEEE 
CIST 
conference 
2014 
47
SemanDc 
processing 
• helping 
to 
make 
sense 
of 
large 
or 
complex 
sets 
of 
data 
without 
being 
supplied 
with 
any 
knowledge 
about 
the 
data 
• Turning 
any 
data 
into 
informaDon/acDonable 
knowledge 
• Some 
examples: 
– NLP 
technologies 
– Data 
Mining 
– ArDficial 
Intelligence 
– ClassificaDon 
– SemanDc 
Search 
20/10/2014 
IEEE 
CIST 
conference 
2014 
48
SemanDc 
technologies 
/ 
SemanDc 
Web 
• "The 
Seman0c 
Web 
is 
an 
extension 
of 
the 
current 
web 
in 
which 
informa0on 
is 
given 
well-­‐defined 
meaning, 
beKer 
enabling 
computers 
and 
people 
to 
work 
in 
coopera0on.“ 
(Tim 
Berners-­‐Lee, 
2001) 
• Standards 
include: 
– a 
flexible 
data 
model 
(RDF) 
– schema 
and 
ontology 
languages 
for 
describing 
concepts 
and 
relaDonships 
(RDFS 
and 
OWL) 
– a 
query 
language 
(SPARQL) 
• Use 
of 
semanDc 
technologies 
in 
semanDc 
processing 
(e.g. 
semanDc 
search) 
• Use 
of 
semanDc 
technologies 
for 
storing 
and 
querying 
data 
(triple 
store 
and 
SPARQL) 
20/10/2014 
IEEE 
CIST 
conference 
2014 
49
SemanDc 
Data 
AggregaDon 
and 
Linking 
for 
Big 
Data 
• Transforming 
unstructu 
red 
content 
into 
a 
structured 
format 
for 
later 
analysis 
is 
a 
major 
challenge. 
• The 
value 
of 
data 
explodes 
when 
it 
can 
be 
linked 
with 
other 
data, 
thus 
data 
integraDon 
is 
a 
major 
creator 
of 
value 
• Data 
aggregaDon 
from 
various 
sources 
can 
establish 
the 
veracity 
• SemanDc 
technologies 
are 
a 
way 
of 
addressing 
variety 
20/10/2014 
IEEE 
CIST 
conference 
2014 
50
Linked 
Data 
/ 
Web 
of 
Data 
• Linked 
Data 
is 
a 
set 
of 
principles 
that 
allows 
publishing, 
querying 
and 
consump1on 
of 
RDF 
data, 
distributed 
across 
different 
servers 
• Not 
necessarily 
free 
/ 
open 
data 
• ExponenDal 
growth 
-­‐> 
a 
Big 
Data 
approach: 
enriching 
Big 
Data 
with 
metadata 
& 
semanDcs, 
interlinking 
Big 
Data 
sets 
• PricewaterhouseCoopers, 
2009: 
« 
You’ll 
be 
able 
to 
find 
pieces 
of 
data 
sets 
from 
different 
places, 
aggregate 
them 
without 
warehousing, 
and 
analyse 
them 
in 
a 
more 
straighSorward, 
powerful 
way 
» 
20/10/2014 
IEEE 
CIST 
conference 
2014 
51
SemanDc 
Technologies 
for 
Big 
Data 
• Natural 
Language 
Processing 
(NLP) 
• Ontology 
Engineering 
techniques 
• SemanDc 
enrichment: 
– AddiDon 
of 
contextual 
informaDon 
– SemanDc 
annotaDon 
– Data 
categorizaDon 
/ 
classificaDon 
– Improved 
informaDon 
retrieval 
– Reasoning 
20/10/2014 
IEEE 
CIST 
conference 
2014 
52
SemanDc 
Data 
AggregaDng 
and 
Linking 
for 
Big 
Data 
Ontologies 
Linked Open Data 
Linked Open Data 
Structured Non-structured 
LAYER 
Documents 
DATA Web pages 
Sensor data 
Textual content Social Media 
KNOWLEDGE LAYER 
SemanDc 
aggregaDon 
SemanDc 
Enrichment 
and 
disambiguaDon 
Linking 
data 
Database 
20/10/2014 
IEEE 
CIST 
conference 
2014 
53
LOD-­‐Based 
SemanDc 
Enrichment 
Structured 
Big 
Data 
20/10/2014 
IEEE 
CIST 
conference 
2014 
54
Pahern-­‐based 
Technique 
Query 
=“Olive 
Garden"+“Darden 
Rest" 
The 
first 
owner 
of 
[Olive 
Garden] 
was 
the 
famous 
[Darden 
Rest]VAL 
20/10/2014 
IEEE 
CIST 
conference 
2014 
55
SemanDc 
Enrichment 
¢ Ownership 
Subject 
(owned,X), 
object 
(owned,Y) 
20/10/2014 
IEEE 
CIST 
conference 
2014 
56
Value 
of 
SemanDc 
Technologies 
• SemanDc 
Technologies 
provide 
opportuniDes 
for 
reducing 
the 
cost 
and 
complexity 
of 
data 
integraDon 
• Common 
metadata 
layer 
• Powerful 
soluDons 
to 
find 
and 
explore 
informaDon 
• SemanDc 
Technologies 
are 
a 
good 
fit 
for 
Big 
Data’s 
Variety 
• Velocity 
and 
Volume: 
challenging 
issues 
for 
SemanDc 
Technologies 
• Linked 
Data 
will 
grow 
into 
Big 
Linked 
Data, 
but 
Big 
Data 
will 
also 
benefit 
from 
evolving 
into 
Linked 
Big 
Data 
20/10/2014 
IEEE 
CIST 
conference 
2014 
57
Social 
Networks 
20/10/2014 
IEEE 
CIST 
conference 
2014 
58
Graphs 
everywhere 
IEEE 
CIST 
conference 
2014 
59 
- Social networks 
- Web 
- Enterprise databases 
- Biology 
- Etc. 
20/10/2014 
Simple 
management 
of 
structured, 
semi-­‐structured 
and 
unstructured 
informaDon 
Rela1onal 
databases 
XML Web
Graphs: 
what 
can 
we 
do 
with? 
• Traversing 
linked 
informaDon, 
finding 
shortest 
path, 
doing 
(semanDc) 
parDDon 
• RecommendaDon 
and 
discovery 
of 
potenDally 
interesDng 
linked 
informaDon 
• Exploit 
the 
graph 
structure 
of 
large 
repositories 
– Web 
environment 
– Digital 
documents 
repositories 
– Databases 
with 
metadata 
• Use 
cases 
: 
recommendaDon, 
social 
networks 
IEEE 
CIST 
20/10/2014 
conference 
2014 
60
Graphs 
for 
Social 
networks: 
enterprises 
use 
case 
• A 
technology 
for 
internal 
communicaDon, 
informaDon 
sharing 
and 
collaboraDon 
• A 
technology 
for 
informaDon 
communicaDon 
towards 
clients 
– Vote 
for 
the 
best 
product, 
– Understand 
the 
clients 
needs 
• A 
technology 
for 
watching 
the 
gossip 
– E-­‐reputaDon, 
opinion 
mining 
• A 
technology 
for 
creaDng 
collecDve 
intelligence 
– CollaboraDve 
common 
knowledge 
– Wikis 
and 
blogs 
associated 
to 
social 
networks 
20/10/2014 
IEEE 
CIST 
conference 
2014 
61
Graphs 
for 
Social 
networks: 
public 
administraDons 
use 
case 
• Public 
administraDons 
need 
social 
networks: 
– As 
enterprises: 
• To 
analyze 
internal 
networks 
(projects, 
organizaDon…) 
• To 
analyze 
external 
networks 
(suppliers, 
clients, 
partners…) 
– As 
an 
interface 
for 
ciDzens: 
• To 
be 
well-­‐understood 
by 
ciDzens 
(who 
does 
what) 
• To 
understand 
ciDzens 
(who 
says 
what) 
• Scenarios 
examples: 
– Need 
to 
look 
over 
the 
organizaDonal 
structure 
(employees, 
departments, 
transversal 
projects) 
and 
idenDfy 
costs 
– Need 
for 
ciDzens 
to 
understand 
the 
impact 
of 
public 
poliDcs 
(offered 
services, 
available 
resources 
for 
each 
district 
of 
the 
city, 
which 
projects 
are 
the 
most 
relevant, 
ciDzens 
complains) 
– Opinion 
analysis 
from 
external 
social 
networks 
(Twiher 
for 
example) 
20/10/2014 
IEEE 
CIST 
conference 
2014 
62
Social 
web 
– 
Social 
Networks 
• The 
Social 
SemanDc 
Web 
combines 
technologies, 
strategies 
and 
methodologies 
from 
the 
SemanDc 
Web, 
social 
sokware 
and 
the 
Web 
2.0. 
• Web 
2.0 
allows 
users 
to 
express 
their 
opinion 
on 
products 
and 
services 
• Understanding 
“what 
people 
think” 
can 
support 
decision-­‐making, 
both 
for 
consumers 
and 
producers 
20/10/2014 
IEEE 
CIST 
conference 
2014 
63
SenDment 
Analysis 
– 
Opinion 
mining 
Find 
out 
what 
other 
people 
think. 
Is 
it 
possible? 
What does it mean opinion mining? 
The beginning of wisdom is the definition of terms! (socrates) 
Today, vendors, practitioners, and the media alike call this still-nascent arena everything from 
‘brand monitoring,’ ‘buzz monitoring’ and ‘online anthropology,’ to ‘market influence analytics,’ 
‘conversation mining’ and ‘online consumer intelligence’. . . . In the end, the term ‘social media 
monitoring and analysis’ is itself a verbal crutch. It is placeholder [sic], to be used until 
something better (and shorter) takes hold in the English language to describe the topic of this 
report. 
Zabin and Jefferies: “Social media monitoring and analysis: Generating 
consumer insights from online conversation,” 
20/10/2014 
IEEE 
CIST 
conference 
2014 
64
Opinion 
mining 
– 
possible 
uses 
Recommender systems (avoid recommending items that received a lot 
of negative feedback). 
Information Filtering 
Business Intelligence (why aren’t consumers buying my laptop?). 
Question answering (what did you want to say?) 
Clarification of politicians positions! 
eDemocracy…and so on 
20/10/2014 
IEEE 
CIST 
conference 
2014 
65
Opinion 
mining 
– 
Sociology 
who is positively or negatively disposed toward whom 
Who would be more or less receptive to new information transmission 
from a given source. 
Structural balance theory: group cohesion and overall polarity among 
people. 
20/10/2014 
IEEE 
CIST 
conference 
2014 
66
Opinion mining – The perfect tool 
The development of a complete opinion-search application might involve 
1) Determine which documents or portions of documents contain 
opinionated material. 
2) Identify the overall sentiment expressed by these documents and/ 
or the specific opinions regarding particular features or aspects of the 
items or topics in question, as necessary. 
3) Finally, the system needs to present the sentiment information 
it has garnered in some reasonable summary fashion (aggregation 
of “votes”, selective highlighting of some opinions, etc)
Opinion 
mining 
– 
Polarity 
A basic task in sentiment analysis is classifying the polarity of a given 
text at the document, sentence, or feature/aspect level — whether 
the expressed opinion in a document, a sentence or an entity feature/ 
aspect is positive, negative, or neutral. 
A polarity is a real number quantifying the user’s positive, negative or 
neutral opinion. 
20/10/2014 
IEEE 
CIST 
conference 
2014 
68
DetecDng 
feature 
senDment 
in 
user-­‐ 
generated 
reviews 
It is not possible to summarize everything with a unique vote/ 
polarity ⇒ detect local polarities expressed about the salient 
features of a considered domain. 
Extract the most frequent domain-related features 
Good 
LocaDon, 
Terrible 
Food: 
DetecDng 
Feature 
SenDment 
in 
User-­‐Generated 
Review 
Cataldi 
et 
al, 
2013 
-­‐ 
SNAM 
20/10/2014 
IEEE 
CIST 
conference 
2014 
69
Combining 
staDsDcs 
and 
NLP 
1) We 
idenDfy 
the 
most 
characterizing 
aspects 
of 
one 
domain 
(hotels, 
restaurant, 
products) 
by 
analyzing 
the 
domain 
corpus 
and 
extracDng 
the 
most 
frequent 
terms 
(eventually 
structuring 
them 
as 
a 
vocabulary 
and/or 
ontology) 
2) We 
formalize 
the 
content 
of 
each 
review 
as 
a 
dependency 
tree 
among 
its 
terms 
and 
retrieve 
(if 
they 
exist) 
the 
features 
discussed 
within 
it. 
Then, 
by 
using 
the 
tree, 
we 
aim 
at 
discovering 
all 
the 
other 
terms 
that 
vehiculate 
some 
polarity 
linguisDcally 
connected 
to 
them. 
20/10/2014 
IEEE 
CIST 
conference 
2014 
70
E R 
V 
1 ,i φ 
… 
n i, φ 2 , i φ 
Feature 
Extractor 
Raw 
text 
POS-­‐ 
tagging 
τ 
Linguis1c 
Parser 
feature1 
feature3 
feature2 
feature4 
F 
ranking 
synset 
WordNet 
term 
pos. 
polar 
neg. 
polar 
Synset 
Polarity 
computa1on 
Subset 
of 
features 
i F 
in 
G 
feature1 
Polarity 
for 
feature1 
Sen1ment 
Computa1on 
Phrase 
Structure 
English 
Corpus 
Dep. 
Graph 
G 
Feature 
Set 
Dep. 
Graph 
G 
synset1 
synset2 
Synsets 
in 
G, 
carrying 
some 
sen0ment, 
referred 
to 
a 
feature 
in 
i F 
20/10/2014 
IEEE 
CIST 
conference 
2014 
71
Graphs 
and 
social 
networks 
• Can 
be 
useful 
for 
many 
applicaDons: 
– E-­‐reputaDon 
and 
trust 
management 
– Monitoring 
of 
social 
networks 
for 
security 
– RecommendaDon 
of 
corporate 
data/informaDon 
– Retail 
Is 
TwiKer 
just 
a 
mirror 
of 
mass 
sen0ment 
or 
is 
it 
also 
able 
to 
influence 
opinion 
? 
20/10/2014 
IEEE 
CIST 
conference 
2014 
72
Conclusion 
• Many 
models 
should 
be 
combined: 
– Ontologies, 
graphs, 
formal 
concepts, 
predicDve 
models 
• Many 
techniques 
should 
be 
combined: 
– Natural 
language 
processing 
– Machine 
learning 
and 
staDsDcs 
– Ontology 
engineering, 
Linked 
Data 
Management 
– Graphs 
processing 
– VisualizaDon 
– Crowdsourcing, 
scrapping 
• For 
SemanDc 
Enrichment 
20/10/2014 
IEEE 
CIST 
conference 
2014 
73
Challenges 
• SemanDc 
InformaDon 
aggregaDon 
– Pahern 
extracDon 
from 
streams 
and 
cross-­‐analysis 
– InformaDon 
extracDon 
from 
Linked 
Open 
Data: 
concepts 
and 
relaDons 
linked 
to 
the 
streams 
paherns 
– Opinion 
aggregaDon 
from 
social 
media 
and 
web 
– Social 
aspects 
for 
collaboraDon 
– InformaDon 
aggregaDon: 
“too 
much 
data 
to 
assimilate 
but 
not 
enough 
knowledge 
to 
act” 
• Distributed 
and 
real-­‐Dme 
processing 
– Design 
of 
real-­‐Dme 
and 
distributed 
algorithms 
for 
stream 
processing 
and 
informaDon 
aggregaDon 
– Storage 
and 
indexaDon 
of 
a 
knowledge 
base 
– IntegraDon 
of 
business 
processes 
with 
aggregated 
informaDon 
– DistribuDon 
and 
parallelizaDon 
of 
data 
mining 
algorithms 
• visual 
analyDcs 
and 
user 
modeling 
– Dynamic 
user 
model 
– Novel 
visualizaDons 
for 
very 
large 
datasets 
20/10/2014 
IEEE 
CIST 
conference 
2014 
74
QUESTIONS? 
20/10/2014 
IEEE 
CIST 
conference 
2014 
75

Contenu connexe

Tendances

How real is multi-cloud for enterprises? Challenges of multi-cloud architecture
How real is multi-cloud for enterprises? Challenges of multi-cloud architectureHow real is multi-cloud for enterprises? Challenges of multi-cloud architecture
How real is multi-cloud for enterprises? Challenges of multi-cloud architectureDenodo
 
Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Edward Curry
 
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsSustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsEdward Curry
 
The Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for EnterprisesThe Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for EnterprisesEdward Curry
 
Citizen Actuation For Lightweight Energy Management
Citizen Actuation For Lightweight Energy ManagementCitizen Actuation For Lightweight Energy Management
Citizen Actuation For Lightweight Energy ManagementEdward Curry
 
Smart Manufacturing
Smart ManufacturingSmart Manufacturing
Smart ManufacturingLukas Ott
 
IDS: Update on Reference Architecture and Ecosystem Design
IDS: Update on Reference Architecture and Ecosystem DesignIDS: Update on Reference Architecture and Ecosystem Design
IDS: Update on Reference Architecture and Ecosystem DesignBoris Otto
 
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)Prof. Dr. Diego Kuonen
 
Tools for data warehousing
Tools  for data warehousingTools  for data warehousing
Tools for data warehousingManju Rajput
 
Virtual Reality Training in Smart Factory A Perspective View
Virtual Reality Training in Smart Factory A Perspective ViewVirtual Reality Training in Smart Factory A Perspective View
Virtual Reality Training in Smart Factory A Perspective Viewijtsrd
 
International Data Spaces: Data Sovereignty for Business Model Innovation
International Data Spaces: Data Sovereignty for Business Model InnovationInternational Data Spaces: Data Sovereignty for Business Model Innovation
International Data Spaces: Data Sovereignty for Business Model InnovationBoris Otto
 
Data-Centric Business Transformation Using Knowledge Graphs
Data-Centric Business Transformation Using Knowledge GraphsData-Centric Business Transformation Using Knowledge Graphs
Data-Centric Business Transformation Using Knowledge GraphsAlan Morrison
 
Lange - Industrial Data Space – Digital Sovereignty over Data
Lange - Industrial Data Space – Digital Sovereignty over DataLange - Industrial Data Space – Digital Sovereignty over Data
Lange - Industrial Data Space – Digital Sovereignty over DataVienna Data Science Group
 
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Edward Curry
 
Digital Thread & Digital Twin
Digital Thread & Digital TwinDigital Thread & Digital Twin
Digital Thread & Digital TwinAccenture Hungary
 
International Data Spaces: Data Sovereignty and Interoperability for Business...
International Data Spaces: Data Sovereignty and Interoperability for Business...International Data Spaces: Data Sovereignty and Interoperability for Business...
International Data Spaces: Data Sovereignty and Interoperability for Business...Boris Otto
 

Tendances (20)

How real is multi-cloud for enterprises? Challenges of multi-cloud architecture
How real is multi-cloud for enterprises? Challenges of multi-cloud architectureHow real is multi-cloud for enterprises? Challenges of multi-cloud architecture
How real is multi-cloud for enterprises? Challenges of multi-cloud architecture
 
Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013
 
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsSustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
 
The Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for EnterprisesThe Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for Enterprises
 
Citizen Actuation For Lightweight Energy Management
Citizen Actuation For Lightweight Energy ManagementCitizen Actuation For Lightweight Energy Management
Citizen Actuation For Lightweight Energy Management
 
Point de vue n° 28 - english
Point de vue n° 28 - englishPoint de vue n° 28 - english
Point de vue n° 28 - english
 
Smart Manufacturing
Smart ManufacturingSmart Manufacturing
Smart Manufacturing
 
Borqs Technologies Presentation 2021
Borqs Technologies Presentation 2021Borqs Technologies Presentation 2021
Borqs Technologies Presentation 2021
 
IDS: Update on Reference Architecture and Ecosystem Design
IDS: Update on Reference Architecture and Ecosystem DesignIDS: Update on Reference Architecture and Ecosystem Design
IDS: Update on Reference Architecture and Ecosystem Design
 
International Society of Service Innovation Professionals
International Society of Service Innovation ProfessionalsInternational Society of Service Innovation Professionals
International Society of Service Innovation Professionals
 
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)
 
Tools for data warehousing
Tools  for data warehousingTools  for data warehousing
Tools for data warehousing
 
Virtual Reality Training in Smart Factory A Perspective View
Virtual Reality Training in Smart Factory A Perspective ViewVirtual Reality Training in Smart Factory A Perspective View
Virtual Reality Training in Smart Factory A Perspective View
 
International Data Spaces: Data Sovereignty for Business Model Innovation
International Data Spaces: Data Sovereignty for Business Model InnovationInternational Data Spaces: Data Sovereignty for Business Model Innovation
International Data Spaces: Data Sovereignty for Business Model Innovation
 
Data-Centric Business Transformation Using Knowledge Graphs
Data-Centric Business Transformation Using Knowledge GraphsData-Centric Business Transformation Using Knowledge Graphs
Data-Centric Business Transformation Using Knowledge Graphs
 
Lange - Industrial Data Space – Digital Sovereignty over Data
Lange - Industrial Data Space – Digital Sovereignty over DataLange - Industrial Data Space – Digital Sovereignty over Data
Lange - Industrial Data Space – Digital Sovereignty over Data
 
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
 
Digital Thread & Digital Twin
Digital Thread & Digital TwinDigital Thread & Digital Twin
Digital Thread & Digital Twin
 
International Data Spaces: Data Sovereignty and Interoperability for Business...
International Data Spaces: Data Sovereignty and Interoperability for Business...International Data Spaces: Data Sovereignty and Interoperability for Business...
International Data Spaces: Data Sovereignty and Interoperability for Business...
 
Service System Engineering
Service System EngineeringService System Engineering
Service System Engineering
 

Similaire à Marie-Aude Aufaure keynote ieee cist 2014

II-SDV 2012 Towards Unified Access Systems for Data Exploration
II-SDV 2012 Towards Unified Access Systems for Data ExplorationII-SDV 2012 Towards Unified Access Systems for Data Exploration
II-SDV 2012 Towards Unified Access Systems for Data ExplorationDr. Haxel Consult
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Denodo
 
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...Denodo
 
¿Cómo las manufacturas están evolucionando hacia la Industria 4.0 con la virt...
¿Cómo las manufacturas están evolucionando hacia la Industria 4.0 con la virt...¿Cómo las manufacturas están evolucionando hacia la Industria 4.0 con la virt...
¿Cómo las manufacturas están evolucionando hacia la Industria 4.0 con la virt...Denodo
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItDenodo
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIBig Data Week
 
IDC Portugal | Como Libertar os Seus Dados com Virtualização de Dados
IDC Portugal | Como Libertar os Seus Dados com Virtualização de DadosIDC Portugal | Como Libertar os Seus Dados com Virtualização de Dados
IDC Portugal | Como Libertar os Seus Dados com Virtualização de DadosDenodo
 
DataAquitaine February 2022
DataAquitaine February 2022DataAquitaine February 2022
DataAquitaine February 2022Yves Caseau
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)Denodo
 
Data sharing between private companies and research facilities
Data sharing between private companies and research facilitiesData sharing between private companies and research facilities
Data sharing between private companies and research facilitiesInstitute of Contemporary Sciences
 
Data virtualization an introduction
Data virtualization an introductionData virtualization an introduction
Data virtualization an introductionDenodo
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector Webinar
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector WebinarBigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector Webinar
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector WebinarBig Data Value Association
 
Data Pioneers - Roland Haeve (Atos Nederland) - Big data in organisaties
Data Pioneers - Roland Haeve (Atos Nederland) - Big data in organisatiesData Pioneers - Roland Haeve (Atos Nederland) - Big data in organisaties
Data Pioneers - Roland Haeve (Atos Nederland) - Big data in organisatiesMultiscope
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
What is the future of data strategy?
What is the future of data strategy?What is the future of data strategy?
What is the future of data strategy?Denodo
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Three Dimensions of Data as a Service
Three Dimensions of Data as a ServiceThree Dimensions of Data as a Service
Three Dimensions of Data as a ServiceDenodo
 

Similaire à Marie-Aude Aufaure keynote ieee cist 2014 (20)

II-SDV 2012 Towards Unified Access Systems for Data Exploration
II-SDV 2012 Towards Unified Access Systems for Data ExplorationII-SDV 2012 Towards Unified Access Systems for Data Exploration
II-SDV 2012 Towards Unified Access Systems for Data Exploration
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
 
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...
 
¿Cómo las manufacturas están evolucionando hacia la Industria 4.0 con la virt...
¿Cómo las manufacturas están evolucionando hacia la Industria 4.0 con la virt...¿Cómo las manufacturas están evolucionando hacia la Industria 4.0 con la virt...
¿Cómo las manufacturas están evolucionando hacia la Industria 4.0 con la virt...
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
 
IDC Portugal | Como Libertar os Seus Dados com Virtualização de Dados
IDC Portugal | Como Libertar os Seus Dados com Virtualização de DadosIDC Portugal | Como Libertar os Seus Dados com Virtualização de Dados
IDC Portugal | Como Libertar os Seus Dados com Virtualização de Dados
 
DataAquitaine February 2022
DataAquitaine February 2022DataAquitaine February 2022
DataAquitaine February 2022
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
Data sharing between private companies and research facilities
Data sharing between private companies and research facilitiesData sharing between private companies and research facilities
Data sharing between private companies and research facilities
 
Data virtualization an introduction
Data virtualization an introductionData virtualization an introduction
Data virtualization an introduction
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector Webinar
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector WebinarBigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector Webinar
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector Webinar
 
Data Pioneers - Roland Haeve (Atos Nederland) - Big data in organisaties
Data Pioneers - Roland Haeve (Atos Nederland) - Big data in organisatiesData Pioneers - Roland Haeve (Atos Nederland) - Big data in organisaties
Data Pioneers - Roland Haeve (Atos Nederland) - Big data in organisaties
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
What is the future of data strategy?
What is the future of data strategy?What is the future of data strategy?
What is the future of data strategy?
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Three Dimensions of Data as a Service
Three Dimensions of Data as a ServiceThree Dimensions of Data as a Service
Three Dimensions of Data as a Service
 

Dernier

Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 

Dernier (20)

Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 

Marie-Aude Aufaure keynote ieee cist 2014

  • 1. Challenges and opportuni1es induced by Big Data and Open Data for Business Intelligence Keynote @ IEEE CIST’2014 Marie-­‐Aude AUFAURE 20/10/2014 IEEE CIST conference 2014 1
  • 2. Agenda • EvoluDon of business intelligence – SemanDc Business Intelligence – Real-­‐Time Business Intelligence • Challenges and opportuniDes: – Taking into account unstructured data 20/10/2014 IEEE CIST conference 2014 2
  • 3. Business Intelligence • Business Intelligence (BI) refers to a set of tools and methods dedicated to collecDng, represenDng and analyzing data to support decision-­‐making in enterprises. • BI is defined as the ability for an organizaDon to take all input data and convert them into knowledge, ulDmately, providing the right informaDon to the right people at the right Dme via the right channel. 20/10/2014 IEEE CIST conference 2014 3
  • 4. EvoluDon of Business Intelligence Output User InteracDon Store Gathering InformaDon Data sources Seman1c Business Intelligence Visual analyDcs Flexible queries / SPARQL C Triple Sore SemanDc ETL/Batch processing Structured/unstructured data Classical Business Intelligence StaDc report Ad-­‐hoc queries AnalyDcs C Data Warehouse ETL/Batch processing databases Real-­‐1me Business Intelligence Real-­‐Dme analyDcs Databases/ Triplestores Real Dme visual-­‐analyDcs Knowledge enrichment ConDnuous queries/ Business rules SemanDc ETL stream processing Load shedding sensors Data streams Retro-­‐ acDon StaDc data 20/10/2014 IEEE CIST conference 2014 4
  • 5. Change factors • Data heterogeneity 20/10/2014 IEEE CIST conference 2014 5
  • 6. Change factors • The way we interact together and with data/ informaDon 20/10/2014 IEEE CIST conference 2014 6
  • 7. BI needs to focus on: • Being simple to use • Turning any data into informaDon/acDonable knowledge • Empowering collabora1on • Being integrated with the business processes 20/10/2014 IEEE CIST conference 2014 7
  • 8. EvoluDon of Business Intelligence Output User InteracDon Store Gathering InformaDon Data sources Seman1c Business Intelligence Visual analyDcs Flexible queries / SPARQL C Triple Sore SemanDc ETL/Batch processing Structured/unstructured data Real-­‐1me Business Intelligence Real-­‐Dme analyDcs Databases/ Triplestores Real Dme visual-­‐analyDcs Knowledge enrichment ConDnuous queries/ Business rules SemanDc ETL stream processing Load shedding sensors Data streams Retro-­‐ acDon StaDc data Classical Business Intelligence StaDc report Ad-­‐hoc queries AnalyDcs C Data Warehouse ETL/Batch processing databases 20/10/2014 IEEE CIST conference 2014 8
  • 9. And now? Big Data Open Data /Linked Data Connected objects 20/10/2014 IEEE CIST conference 2014 9
  • 10. Aspect Characteris1cs Challenges and technological answers Volume More visible aspect of b i g d a t a b u t l e s s challenging Storage Virtualisa1on in data centers, generalizaDon of cloud-­‐based soluDons NoSQL Solu1ons for storing and querying highly distributed data Velocity Data produced and collected in a shorter Dme window Real-­‐1me Plateforms Connected objects will increase volume but also real-­‐Dme needs Variety MulDplicaDon of data sources, from structured data to free text New data stores intégraDng lexibles data models Collect and analyze unstructured data Value More subjecDve aspect dealing withe the non exploitaDon of these massive datasets Transform raw data into valuable informaDon New Business models 20/10/2014 IEEE CIST conference 2014 10
  • 11. Open data • An open data is a digital data public or private and published in a way allowing user to freely access and reuse, without any technical, jridic or financial restricDon. • Examples : data on public transportaDon, cartography, les staDsDcs, géography, la sociology, environnement, etc. • Governemental wave in the 2000: – data.gov project in 2009, USA – European DirecDve in 2003 on reuse of public data – In France Etalab (2011) is in charge of data.gouv.fr, an open data portail for public data.. • Benefits for the public sector : – Transparency, costs reducDon, beher services • Economic benefits: – Access to data, mainly for SMEs 20/10/2014 IEEE CIST conference 2014 11 ! !!
  • 12. Connected objetcs : smart applicaDons Connected Health Quan1fied-­‐self Connected car Smart ci1es Smart grids 20/10/2014 IEEE CIST conference 2014 12
  • 13. More and more connected objects 20/10/2014 IEEE CIST conference 2014 13
  • 14. Connected Cars • 200 Millions véhicules equiped with Android Auto or Apple Carplay in 2020 • Emergency call • Eco-­‐driving • Autonomous Véhicule • Assistancy • Towards automaDc driving • 54 millions vehicles totally or parDally automated in 2035 (source: HIS AutomoDve/ Polk) 20/10/2014 IEEE CIST conference 2014 14
  • 15. Big Data : Challenges? • Vector of innovaDon – DisrupDve technologies: cloud, internet of things, AnalyDcs – Open InnovaDon • Enhancement of producDvity, services and compeDDvity – Public services, « sokware-­‐intensive » companies • Economic impact – Benefits for the analysis of internal and external data – New jobs • Big Data Centres of excellence (Hack/Reduce in Boston) 20/10/2014 IEEE CIST conference 2014 15
  • 16. BIG DATA: SOCIETAL CHALLENGES • Big Data for Society: can we expect a posiDve impact on society? • Generate acDonable informaDon that can be used to idenDfy needs, provide services, and predict and prevent crisis for the benefit of populaDons. • Health and well-­‐being, environment, energy, climate change, etc. 20/10/2014 IEEE CIST conference 2014 16
  • 17. BIG DATA: ENERGY CHALLENGE • supercomputeurs 20/10/2014 IEEE CIST conference 2014 17
  • 18. BIG DATA: TECHNOLOGICAL CHALLENGES • Data storage : data centers, cloud infrastructures, noSQL databases, in-­‐memory databases • Data processing : supercomputers, distributed or massively parallel-­‐compuDng 20/10/2014 IEEE CIST conference 2014 18
  • 19. Some scienDfic challenges • Big data analyDcs • Context management • VisualizaDon and Human-­‐Computer Interfaces • Algorthms distribuDon • CorrelaDons and causality • Real-­‐Dme analysis of data streams • ValidaDon, trust 20/10/2014 IEEE CIST conference 2014 19
  • 20. Big Data value chain Source : InternaDonal Working Group on Data ProtecDon in TelecommunicaDons 20/10/2014 IEEE CIST conference 2014 20
  • 21. PotenDal of Big Data Analysis • Adapt and enhance services and processes – TransportaDon and logisDc – Online EducaDon – Job seeking – SenDment analysis and customers/ciDzens needs – Enhancement of public services – E-­‐markeDng • OpDmize performances – Assist decision-­‐making – Less resources consumpDon – Fraud detecDon • Predict and prevent – Health – Needs anDcipaDon – Security 20/10/2014 IEEE CIST conference 2014 21
  • 22. BIG DATA: USE CASES 20/10/2014 IEEE CIST conference 2014 22
  • 23. Big Data opportuniDes Source: Big Data opportuniDes survey, Unisphere / SAP, May 2013. 20/10/2014 IEEE CIST conference 2014 23
  • 24. PredicDve analyDcs: flu trends United states Flu AcDvity United States Data Google Flu Trends es1mate 20/10/2014 IEEE CIST conference 2014 24
  • 25. 360-­‐degree view of the customer Why? What? Who? When/ How? Where? OperaDonal data Behavioral data DescripDve data InteracDon Contextual data data 20/10/2014 IEEE CIST conference 2014 25
  • 26. Types of data used in Big Data iniDaDves Internal data Tradi,onal sources « New data » Source: Big Data opportuniDes survey, Unisphere / SAP, May 2013. 20/10/2014 IEEE CIST conference 2014 26
  • 27. EvoluDon of Business Intelligence Output User InteracDon Store Gathering InformaDon Data sources Seman1c Business Intelligence Visual analyDcs Flexible queries / SPARQL C Triple Sore SemanDc ETL Batch processing Structured/unstructured data Real-­‐1me Business Intelligence Real-­‐Dme analyDcs Databases/ Triplestores ( Real Dme visual-­‐analyDcs Knowledge enrichment ConDnuous queries/ Business rules SemanDcETL stream processing Load shedding sensors Data stream Retro-­‐ acDon StaDc data Classical Business Intelligence StaDc report Ad-­‐hoc queries AnalyDcs C Data Warehouse ETL Batch processing databases 20/10/2014 IEEE CIST conference 2014 27
  • 28. Coping with unstructured data SemanDc BI SemanDc Technologies for Bi Data Social Networks 20/10/2014 IEEE CIST conference 2014 28
  • 29. Unstructured data analyDcs process Data • Web content • Ontologies • Social data • Logs • Texts • Pictures, etc. Collect • Web crawling • Web scraping • API (Twiher, Google, …) • Clics (logs) • Crowdsourcing (Mechanical Turk) ExtracDon / StructuraDon • SemanDc ETL • Named enDDes • lexico-­‐syntacDc paherns • Dependancy trees • N-­‐grams Analyze • clustering • Galois larce • Unsupervised and supervised learning 20/10/2014 Séminaire Big Data 29
  • 30. SEMANTIC BI AND VISUAL ANALYTICS: THE FP7 CUBIST PROJECT 20/10/2014 IEEE CIST conference 2014 30
  • 31. CUBIST: Combining and UniDng Business Intelligence with SemanDc Technologies flexible and visual queries / analyDcs databases Forums, blogs office SemanDc ETL Office docs Triple Store Exploitable Results Seman1c Business Intelligence Comprehensive Informa1on Access Means Advanced Visual Analy1cs ■ Searching, exploring, analyzing data ■ qualitaDve data analysis ■ graph-­‐based visualizaDons No exis1ng solu1ons from BI-­‐vendors Seman1cally enriched BI ■ using a triple store for BI ■ using ontologies as schema Partly addressed by BI-­‐ or ST-­‐vendors BI over both structured and unstructured data ■ text analyDcs ■ linking unstructured and structured sources Already addressed/developed by BI-­‐vendors 20/10/2014 IEEE CIST conference 2014 31
  • 32. Formal Concept Analysis 32 • Formal Concept Analysis is a method used for invesDgaDng and processing explicitely given informaDon – An analysis of data – Structures of formal abstracDons of concepts of human thought – Formal emphasizes that the concepts are mathemaDcal objects, rather than concepts of mind – Formal Concept Analysis help to draw inferences, to group objects, and hence to create concepts • Visual representaDon by a Hasse Diagram 20/10/2014 IEEE CIST conference 2014
  • 33. Charts, Graphs, FCA for BI: A Toy Example Skill Persons with that Skill IE Anja, Ben, Ernst, Fred, Ken ETL Chris, Fred, Mark BI Ben, Chris, Fred, Lemmy, Mark, Naomi ST Anja, Diana, Ernst, Fred, Gerald, Harriet, Ken, Owen FCA Anja, Diana, Gerald, Harriet, Ian, John, Ken, Owen VIZ Anja, Diana, Ian Possible Informa1on Needs: 1) Show me the count of people for a given skill 2) Show me the skills and how many people share some skills, in order to get an idea on how strongly skills are related 3) Show me the skills and people such that I get an idea of the distribuDon of skills among people and dependencies between skills 20/10/2014 IEEE CIST conference 2014 33
  • 34. ConverDng the data (analyDc model) Raw Data Bar Chart Data CounDng the number of people per skill Skill Persons with that Skill IE Anja, Ben, Ernst, Fred, Ken ETL Chris, Fred, Mark BI Ben, Chris, Fred, Lemmy, Mark, Naomi ST Anja, Diana, Ernst, Fred, Gerald, Harriet, Ken, Owen FCA Anja, Diana, Gerald, Harriet, Ian, John, Ken, Owen VIZ Anja, Diana, Ian Graph Data FCA Data (Formal Context) CounDng the number of people who share two skills 20/10/2014 IEEE CIST conference 2014 34
  • 35. Visualizing the data Raw Data Bar Chart Skill Persons with that Skill IE Anja, Ben, Ernst, Fred, Ken ETL Chris, Fred, Mark BI Ben, Chris, Fred, Lemmy, Mark, Naomi ST Anja, Diana, Ernst, Fred, Gerald, Harriet, Ken, Owen FCA Anja, Diana, Gerald, Harriet, Ian, John, Ken, Owen VIZ Anja, Diana, Ian Graph FCA Concept La^ce 20/10/2014 IEEE CIST conference 2014 35
  • 36. Some InformaDon which can be read off Bar Chart Graph FCA la^ce § ST and FCA are the skills most people have § ETL and VIZ are the skills least people have § The skills FCA and ST are strongly related § Because the link between them is strong § The skills FCA and IE are only weakly related § Because the link between them is weak § No one has knowledge on both FCA and ETL § Because there is no link between FCA and ETL § Owen, Harriet and Gerald have exactly the same skills § Because they belong to the same node § Whoever is skilled in ETL is skilled in BI, too § Because the BI-­‐node is above the ETL-­‐node § Anja has more skills than Ken, and Ken has more skills than Ernst § Because the nodes are ordered that way 20/10/2014 IEEE CIST conference 2014 36
  • 37. Comparison Bar Chart Graph FCA la^ce Ý Many well-­‐known visualizaDons Ý Good (readable and comprehensible) layouts Ý Good for analyzing numbers Þ Loss of informaDon (what people) Þ Misleading for overlapping ahributes (counDng people manifold) Þ Not uDlizing relaDonships between enDDes Ý AhracDve visualizaDons Ý (RelaDvely) easy to understand Ý UDlizing and showing links between enDDes (skills) Þ Loss of informaDon (what people) Þ Bad for analyzing numbers Þ Number of nodes might explode Þ Finding good layout is unsolved (nice layout in example is accidenDal and has been manually created) Þ Unfamiliar means for analyDcs Þ Scalability Þ Bad for analyzing numbers Ý No loss of informaDon Ý Meaningful clusters in one node Ý Showing dependencies between enDDes (both people and skills) 20/10/2014 IEEE CIST conference 2014 37
  • 38. Which visualizaDon should I choose? Remember the informa1on needs from the beginning Show me the skills and how many people share some skills, in order to get an idea on how strongly skills are related Show me the skills and people such that I get an idea of the distribuDon of skills among people and dependencies between skills Show me the count of people for a given skill Conclusion § Each visualizaDon has its own strengths and weaknesses § Each type of visualizaDon is suited for a specific type of informaDon needs § Thus the visualizaDons are complemenDng § Thus future BI tools should provide all types of visualizaDons 20/10/2014 IEEE CIST conference 2014 38
  • 39. Can you understand this? 39 Traffic accidents dataset: 34 ahributes, 150 objects, 344 concepts – minimal edge crossing layout 20/10/2014 IEEE CIST conference 2014
  • 40. Visual AnalyDcs • Visual analyDcs supports human judgment by means of visual representaDons and interacDon techniques [Keim et al. 2001] • “Overview first, zoom and filter, then details-­‐ on-­‐demand.”[Shneiderman, 1996] • Visual AnalyDcs for FCA combines: – TradiDonal BI operaDons and visualizaDons – Concept Larce transformaDon and visualizaDon 20/10/2014 IEEE CIST conference 2014 40
  • 41. FCA-­‐based Visual AnalyDcs 41 • Idea: Create visual analyDcs for large contexts – Context reducDon – Allow visual queries through selecDon and filtering – Dynamic visualizaDon – Visual exploraDon becomes a navigaDon problem 20/10/2014 IEEE CIST conference 2014
  • 42. Cubix: A Visual AnalyDcs tool for FCA 42 • Combines interac1ve features to overcome drawbacks of single techniques • Features – VisualisaDons – Dashboard – Metrics – Filtering & Search – Clustering – Tree-­‐ExtracDon Publica0on: ICDM 2012 [Melo et al.] live: cubix.alwaysdata.com 20/10/2014 IEEE CIST conference 2014
  • 43. Summary of VisualisaDons Analysis Task Data Visualisa1on Co-­‐occurence analysis Concept Larce Enhanced Hasse diagram Exploratory Hierarchical analysis Tree from the concept larce Sunburst Frequent itemsets analysis Ahributes and objects matrix Concept stacking (matrix) SimulaDon parameters analysis MulD-­‐valued ahributes Heatmap larce ImplicaDon analysis AssociaDon Rules Radial/Matrix visualisaDon for AssociaDon Rules 20/10/2014 IEEE CIST conference 2014 43
  • 44. Coming back to ease of use • Cubix was experimented on three use cases – The workflow (data selecDon, scaling, filtering and analysis) needed to be simplified • User creaDon of AnalyDcs – Leading to « BI as a service » • AutomaDc recommendaDon of VisualizaDon and gadgets: – Decision tree • Based on the data type and volume – CollaboraDve filtering • Based on other user’s preferences for similar datasets – Supervised Learning methods • Based on users profile and history 20/10/2014 IEEE CIST conference 2014 44
  • 45. Coping with big data for FCA • ReducDon techniques – Filtering (support, stability) • Distributed compuDng of concepts • Mining Formal Concepts over data streams • Visual AnalyDcs – New metaphors for large data – Data overview view: dashboards • Filtering 20/10/2014 IEEE CIST conference 2014 45
  • 46. SemanDc Technologies for Big Data 20/10/2014 IEEE CIST conference 2014 46
  • 47. SemanDc Technologies for Big Data • Data-­‐driven approaches (structure learning, data mining, staDsDcal approaches) are not always sufficient to find all correlaDons among parameters • SemanDc approaches can provide complementary informaDon: – Simplify the informaDon integraDon process – Provide a unified metadata layer – Discover and enrich informaDon – Provide a unified access to informaDon 20/10/2014 IEEE CIST conference 2014 47
  • 48. SemanDc processing • helping to make sense of large or complex sets of data without being supplied with any knowledge about the data • Turning any data into informaDon/acDonable knowledge • Some examples: – NLP technologies – Data Mining – ArDficial Intelligence – ClassificaDon – SemanDc Search 20/10/2014 IEEE CIST conference 2014 48
  • 49. SemanDc technologies / SemanDc Web • "The Seman0c Web is an extension of the current web in which informa0on is given well-­‐defined meaning, beKer enabling computers and people to work in coopera0on.“ (Tim Berners-­‐Lee, 2001) • Standards include: – a flexible data model (RDF) – schema and ontology languages for describing concepts and relaDonships (RDFS and OWL) – a query language (SPARQL) • Use of semanDc technologies in semanDc processing (e.g. semanDc search) • Use of semanDc technologies for storing and querying data (triple store and SPARQL) 20/10/2014 IEEE CIST conference 2014 49
  • 50. SemanDc Data AggregaDon and Linking for Big Data • Transforming unstructu red content into a structured format for later analysis is a major challenge. • The value of data explodes when it can be linked with other data, thus data integraDon is a major creator of value • Data aggregaDon from various sources can establish the veracity • SemanDc technologies are a way of addressing variety 20/10/2014 IEEE CIST conference 2014 50
  • 51. Linked Data / Web of Data • Linked Data is a set of principles that allows publishing, querying and consump1on of RDF data, distributed across different servers • Not necessarily free / open data • ExponenDal growth -­‐> a Big Data approach: enriching Big Data with metadata & semanDcs, interlinking Big Data sets • PricewaterhouseCoopers, 2009: « You’ll be able to find pieces of data sets from different places, aggregate them without warehousing, and analyse them in a more straighSorward, powerful way » 20/10/2014 IEEE CIST conference 2014 51
  • 52. SemanDc Technologies for Big Data • Natural Language Processing (NLP) • Ontology Engineering techniques • SemanDc enrichment: – AddiDon of contextual informaDon – SemanDc annotaDon – Data categorizaDon / classificaDon – Improved informaDon retrieval – Reasoning 20/10/2014 IEEE CIST conference 2014 52
  • 53. SemanDc Data AggregaDng and Linking for Big Data Ontologies Linked Open Data Linked Open Data Structured Non-structured LAYER Documents DATA Web pages Sensor data Textual content Social Media KNOWLEDGE LAYER SemanDc aggregaDon SemanDc Enrichment and disambiguaDon Linking data Database 20/10/2014 IEEE CIST conference 2014 53
  • 54. LOD-­‐Based SemanDc Enrichment Structured Big Data 20/10/2014 IEEE CIST conference 2014 54
  • 55. Pahern-­‐based Technique Query =“Olive Garden"+“Darden Rest" The first owner of [Olive Garden] was the famous [Darden Rest]VAL 20/10/2014 IEEE CIST conference 2014 55
  • 56. SemanDc Enrichment ¢ Ownership Subject (owned,X), object (owned,Y) 20/10/2014 IEEE CIST conference 2014 56
  • 57. Value of SemanDc Technologies • SemanDc Technologies provide opportuniDes for reducing the cost and complexity of data integraDon • Common metadata layer • Powerful soluDons to find and explore informaDon • SemanDc Technologies are a good fit for Big Data’s Variety • Velocity and Volume: challenging issues for SemanDc Technologies • Linked Data will grow into Big Linked Data, but Big Data will also benefit from evolving into Linked Big Data 20/10/2014 IEEE CIST conference 2014 57
  • 58. Social Networks 20/10/2014 IEEE CIST conference 2014 58
  • 59. Graphs everywhere IEEE CIST conference 2014 59 - Social networks - Web - Enterprise databases - Biology - Etc. 20/10/2014 Simple management of structured, semi-­‐structured and unstructured informaDon Rela1onal databases XML Web
  • 60. Graphs: what can we do with? • Traversing linked informaDon, finding shortest path, doing (semanDc) parDDon • RecommendaDon and discovery of potenDally interesDng linked informaDon • Exploit the graph structure of large repositories – Web environment – Digital documents repositories – Databases with metadata • Use cases : recommendaDon, social networks IEEE CIST 20/10/2014 conference 2014 60
  • 61. Graphs for Social networks: enterprises use case • A technology for internal communicaDon, informaDon sharing and collaboraDon • A technology for informaDon communicaDon towards clients – Vote for the best product, – Understand the clients needs • A technology for watching the gossip – E-­‐reputaDon, opinion mining • A technology for creaDng collecDve intelligence – CollaboraDve common knowledge – Wikis and blogs associated to social networks 20/10/2014 IEEE CIST conference 2014 61
  • 62. Graphs for Social networks: public administraDons use case • Public administraDons need social networks: – As enterprises: • To analyze internal networks (projects, organizaDon…) • To analyze external networks (suppliers, clients, partners…) – As an interface for ciDzens: • To be well-­‐understood by ciDzens (who does what) • To understand ciDzens (who says what) • Scenarios examples: – Need to look over the organizaDonal structure (employees, departments, transversal projects) and idenDfy costs – Need for ciDzens to understand the impact of public poliDcs (offered services, available resources for each district of the city, which projects are the most relevant, ciDzens complains) – Opinion analysis from external social networks (Twiher for example) 20/10/2014 IEEE CIST conference 2014 62
  • 63. Social web – Social Networks • The Social SemanDc Web combines technologies, strategies and methodologies from the SemanDc Web, social sokware and the Web 2.0. • Web 2.0 allows users to express their opinion on products and services • Understanding “what people think” can support decision-­‐making, both for consumers and producers 20/10/2014 IEEE CIST conference 2014 63
  • 64. SenDment Analysis – Opinion mining Find out what other people think. Is it possible? What does it mean opinion mining? The beginning of wisdom is the definition of terms! (socrates) Today, vendors, practitioners, and the media alike call this still-nascent arena everything from ‘brand monitoring,’ ‘buzz monitoring’ and ‘online anthropology,’ to ‘market influence analytics,’ ‘conversation mining’ and ‘online consumer intelligence’. . . . In the end, the term ‘social media monitoring and analysis’ is itself a verbal crutch. It is placeholder [sic], to be used until something better (and shorter) takes hold in the English language to describe the topic of this report. Zabin and Jefferies: “Social media monitoring and analysis: Generating consumer insights from online conversation,” 20/10/2014 IEEE CIST conference 2014 64
  • 65. Opinion mining – possible uses Recommender systems (avoid recommending items that received a lot of negative feedback). Information Filtering Business Intelligence (why aren’t consumers buying my laptop?). Question answering (what did you want to say?) Clarification of politicians positions! eDemocracy…and so on 20/10/2014 IEEE CIST conference 2014 65
  • 66. Opinion mining – Sociology who is positively or negatively disposed toward whom Who would be more or less receptive to new information transmission from a given source. Structural balance theory: group cohesion and overall polarity among people. 20/10/2014 IEEE CIST conference 2014 66
  • 67. Opinion mining – The perfect tool The development of a complete opinion-search application might involve 1) Determine which documents or portions of documents contain opinionated material. 2) Identify the overall sentiment expressed by these documents and/ or the specific opinions regarding particular features or aspects of the items or topics in question, as necessary. 3) Finally, the system needs to present the sentiment information it has garnered in some reasonable summary fashion (aggregation of “votes”, selective highlighting of some opinions, etc)
  • 68. Opinion mining – Polarity A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level — whether the expressed opinion in a document, a sentence or an entity feature/ aspect is positive, negative, or neutral. A polarity is a real number quantifying the user’s positive, negative or neutral opinion. 20/10/2014 IEEE CIST conference 2014 68
  • 69. DetecDng feature senDment in user-­‐ generated reviews It is not possible to summarize everything with a unique vote/ polarity ⇒ detect local polarities expressed about the salient features of a considered domain. Extract the most frequent domain-related features Good LocaDon, Terrible Food: DetecDng Feature SenDment in User-­‐Generated Review Cataldi et al, 2013 -­‐ SNAM 20/10/2014 IEEE CIST conference 2014 69
  • 70. Combining staDsDcs and NLP 1) We idenDfy the most characterizing aspects of one domain (hotels, restaurant, products) by analyzing the domain corpus and extracDng the most frequent terms (eventually structuring them as a vocabulary and/or ontology) 2) We formalize the content of each review as a dependency tree among its terms and retrieve (if they exist) the features discussed within it. Then, by using the tree, we aim at discovering all the other terms that vehiculate some polarity linguisDcally connected to them. 20/10/2014 IEEE CIST conference 2014 70
  • 71. E R V 1 ,i φ … n i, φ 2 , i φ Feature Extractor Raw text POS-­‐ tagging τ Linguis1c Parser feature1 feature3 feature2 feature4 F ranking synset WordNet term pos. polar neg. polar Synset Polarity computa1on Subset of features i F in G feature1 Polarity for feature1 Sen1ment Computa1on Phrase Structure English Corpus Dep. Graph G Feature Set Dep. Graph G synset1 synset2 Synsets in G, carrying some sen0ment, referred to a feature in i F 20/10/2014 IEEE CIST conference 2014 71
  • 72. Graphs and social networks • Can be useful for many applicaDons: – E-­‐reputaDon and trust management – Monitoring of social networks for security – RecommendaDon of corporate data/informaDon – Retail Is TwiKer just a mirror of mass sen0ment or is it also able to influence opinion ? 20/10/2014 IEEE CIST conference 2014 72
  • 73. Conclusion • Many models should be combined: – Ontologies, graphs, formal concepts, predicDve models • Many techniques should be combined: – Natural language processing – Machine learning and staDsDcs – Ontology engineering, Linked Data Management – Graphs processing – VisualizaDon – Crowdsourcing, scrapping • For SemanDc Enrichment 20/10/2014 IEEE CIST conference 2014 73
  • 74. Challenges • SemanDc InformaDon aggregaDon – Pahern extracDon from streams and cross-­‐analysis – InformaDon extracDon from Linked Open Data: concepts and relaDons linked to the streams paherns – Opinion aggregaDon from social media and web – Social aspects for collaboraDon – InformaDon aggregaDon: “too much data to assimilate but not enough knowledge to act” • Distributed and real-­‐Dme processing – Design of real-­‐Dme and distributed algorithms for stream processing and informaDon aggregaDon – Storage and indexaDon of a knowledge base – IntegraDon of business processes with aggregated informaDon – DistribuDon and parallelizaDon of data mining algorithms • visual analyDcs and user modeling – Dynamic user model – Novel visualizaDons for very large datasets 20/10/2014 IEEE CIST conference 2014 74
  • 75. QUESTIONS? 20/10/2014 IEEE CIST conference 2014 75