1. Programmable web of the future
{
firstName: Veselin,
lastName: Pizurica,
epochTime: 1381953702
}
Free Powerpoint Templates
Page 1
2. Today talk is about the future future of the web
Integration/convergence:
– API’s
– Sensor Networks/M2M
– Cloud
– Data mining
– Intelligent decision engines
Page 2
3. Introduction to AI
– Learning, Pattern recognition
– Intelligent agents
– Probabilistic reasoning and uncertainty
– Graphical models
Page 3
4. Material used
•
•
•
•
UGent AI course: http://telin.ugent.be/~sanja/ArtificialIntelligence
BaysiaLab
white
paper
Wikipedia
Google
search
Page 4
5. Map
of
Analy;c
Modeling
Page
Breiman
(2001)
and
Shmueli
(2010) 5
8. Intelligent
agents
Agent:
an
en;ty
that
perceives
and
acts
(from
La;n
agere,
to
do)
Ra)onal
agent
is
one
that
acts
so
as
to
achieve
the
best
outcome,
or
when
there
is
uncertainty,
the
best
expected
outcome
Abstractly,
an
agent
is
a
func;on
from
percept
histories
to
ac;ons:
For
any
given
class
of
environments
and
tasks,
we
seek
the
agent
(or
class
of
agents)
with
the
best
performance
In
prac;ce,
computa;onal
limita;ons
make
perfect
ra;onality
unachievable
à
design
best
program
for
given
machine
resources
Page 8
8
9. Ra;onality
• A
ra;onal
agent
is
one
that
does
the
right
thing.
• How
do
we
know
whether
it
is
the
right
thing?
-‐
By
considering
the
consequences
of
the
agent
behavior
(i.e.,
the
sequence
of
states
through
which
the
environment
goes
as
a
result
of
agent’s
behavior)
• A
sequence
of
states
(through
which
the
environment
goes)
is
evaluated
by
a
performance
measure
Page 9
9
10. Specifying
the
task
environment
To design a rational agent, we must specify the
task environment
Consider the task of designing an automated taxi:
– Performance measure: safety, destination, profits, legality,
comfort
– Environment: streets/freeways, traffic, pedestrians, weather
– Actuators: steering, accelerator, brake, horn, speaker/display
– Sensors: video, acceleromaters, gauges, engine sensors,
keyboard, sensors
Page 10
10
19. Agent
types
• Four
basic
types
in
order
of
increasing
generality:
– simple
reflex
agents
– reflex
agents
with
state
– goal-‐based
agents
– u;lity-‐based
agents
All
these
can
be
turned
into
learning
agents
Page 19
19
24. Why
learning?
Why
do
we
want
an
agent
to
learn?
(Why
not
program
an
improved
design
from
the
beginning)?
– Cannot
an;cipate
all
possible
situa;ons
that
the
agent
might
find
itself
in
– Cannot
an;cipate
all
changes
over
;me
– Programmers
might
not
know
how
to
program
a
solu;on
themselves
(e.g.
how
to
program
face
recogni;on)
Learning
modifies
the
agent's
decision
mechanisms
to
improve
performance
Page 24
25. Paaern
recogni;on
Unsupervised
learning
– Learning
paaerns
without
explicit
feedback
supplied
– The
system
forms
clusters
or
natural
groupings
of
the
input
paaerns
(based
on
some
similarity
criteria).
➡Clustering
Reinforcement
learning
– Learning
from
a
series
of
reinforcements
–
rewards
and
punishments
Supervised
learning
– Learning
a
func;on
that
maps
input
to
output
based
on
available
(observed)
input-‐output
pairs
(Correct
answers
for
each
instance)
Semi-‐supervised
learning
– A
few
labeled
samples
available
and
a
large
collec;on
of
unlabeled
ones
– Learn
from
geometry
of
unlabeled
samples
and
use
the
labeled
ones
Page 25
to
improve
the
learning
27. Unsupervised Learning
•
•
No labeled training sets are provided
System applies a specified clustering/grouping criteria to unlabeled dataset Clusters/groups
together “most similar” objects (according to given criteria)
Page 27
28. Pattern Recognition Process
Data acquisition and sensing
– Measurements of physical variables.
– Important issues: bandwidth, resolution , etc.
Pre-processing
– Removal of noise in data.
– Isolation of patterns of interest from the
background.
Feature extraction
– Finding a new representation in terms of
features.
Classification
– Using features and learned models to assign a
pattern to a category.
Post-processing
– Evaluation of confidence in decisions.
Page 28
29. Feature vectors
Single object represented by several features, e.g. shape, size, color,
weight
x1 = shape(e.g.nr of sides)
x2 = size(e.g. some numeric value)
x3 = color (e. g. rgb values)
xd = some other(numeric)feature.
X becomes a feature vector
Page 29
34. PCA
Principal component analysis (PCA) is
a orthogonal transformation to convert
a set of observations of possibly
correlated variables into a set of
values of linearly uncorrelated
variables called principal components.
It is not, however, optimized for class
separability. An alternative is the
linear discriminant analysis, which
does take this into account. PCA is
also sensitive to the scaling of the
variables.
Page 34
35. Deep Learning
• Choosing the correct feature representation of input
data, is a way that people can bring prior knowledge of a
domain to increase an algorithm's computational
performance and accuracy. To move towards general
artificial intelligence, algorithms need to be less
dependent on this feature engineering and better learn to
identify the explanatory factors of input data on their
own.
• Deep learning tries to move in this direction by capturing
a 'good' representation of input data by using
compositions of non-linear transformations.
Page 35
36. Two types of models
• Probabilistic graphical models have
nodes in each layer that are considered
as latent random variables. In this case,
you care about the probability
distribution of the input data x and the
hidden latent random variables h that
describe the input data in the joint
distribution p(x,h). These latent random
variables describe a distribution over the
observed data.
• Direct encoding (neural network) models
have nodes in each layer that are
considered as computational units. This
means each node h performs some
computation (normally nonlinear like a
sigmoidal function) given its inputs from
the previous layer.
Page 36
37. Decision trees
1. Learn rules from data
2. Apply each rule at each
node
3. Classification is at the
leafs of the tree
Page 37
38. Decision Trees example
Example:
decision
whether
to
wait
for
a
table
in
a
restaurant
depending
on
the
following
aaributes:
1. Alternate
(Alt):
Is
there
a
suitable
alterna;ve
restaurant
nearby?
2.
Bar:
Is
there
a
comfortable
bar
area
in
the
restaurant,
where
I
can
wait?
3.
Fri/Sat
(Fri):
True
on
Fridays/Saturdays
4.
Hungry
(Hun):
Are
we
hungry?
5.
Patrons
(Pat):
How
many
people
are
in
the
restaurant
(None,
Some
or
Full)
6.
Price:
the
restaurant’s
price
range
($,
$$,
$$$)
7.
Raining
(Rain):
Is
it
raining
outside?
8.
ReservaBon
(Res):
Did
we
make
a
reserva;on?
9.
Type:
the
kind
of
restaurant
(French,
Italian,
Thai
or
burger)
10.
WaitEsBmate
(Est):
the
wait
;me
es;mated
by
the
host
(0-‐10min,
10-‐30,
30-‐60,
or
>60)
Page 38
39. Decision tree
How
many
dis;nct
decision
trees
we
have
with
n
Boolean
aaributes?
=
number
of
Boolean
func;on
=
number
of
dis;nct
truth
tables
with
2^n
rows
=
2^n^n
E.g., with 6 Boolean attributes 18,446,744,073,709,551,616
Page 39
40. Uncertainty
Let
At
denote
the
ac;on
“leave
for
airport
t
minutes
before
flight”
Will
At
get
me
there
on
;me?
?
?
?
• Purely
logical
approach
leads
to
weak
conclusions:
§
“A90
will
get
me
there
on
;me
if
there
is
no
accident
on
the
way
and
it
doesn't
rain
and
my
;res
remain
intact
and
no
meteorite
hits
the
car,
etc”
§ None
of
these
can
be
inferred
for
sure
à
plan
success
cannot
be
inferred
Page 40
40
41. Uncertainty
• Consider
diagnosis
of
a
pa;ent
with
headache.
Many
reasons
are
possible
like
sinus
problems
or
eye
vision,
tense
muscles,
flu,
cancer,…
Suppose
a
logical
rule
that
aaempts
to
express
this
Headache
⇒
SinusiBs
∨
EyeSight
∨
SBffNeck
∨
Flu
∨
Cancer…
• The
problem
is
that
there
is
almost
unlimited
list
of
possible
causes.
The
causal
rule,
like
SBffNeck=>Headache
doesn’t
work
either
(s;ff
neck
doesn’t
always
cause
headache)
• Trying
to
use
logic
in
this
type
of
domains
fails
because
§ there
is
too
much
work
to
list
all
the
aaributes
§ no
complete
theory
or
knowledge
§ not
all
the
necessary
tests
can
be
or
have
been
run
Page 41
41
42. Why
probabilis;c
reasoning?
• Probabilis;c
reasoning
is
useful
because
logic
olen
fails
due
to
Laziness
and
Ignorance
too
many
aaributes
to
list
Theore;cal
Prac;cal
(no
complete
knowledge
of
the
domain)
(not
enough
observa;ons,
tests,..)
• Probabilis;c
asser;ons
summarize
the
effects
of
laziness
and
ignorance
Page 42
42
44. Graphical
models
Graphical
models
Bayesian
networks
Graphical
models
are
related
to
mathema;cal
graph
theory
Page 44
44
45. Probabilis;c
graphs
• A
graph
is
a
set
of
objects
(represented
by
nodes,
also
called
ver)ces
or
points),
where
some
pairs
of
the
nodes
are
connected
by
links
(edges).
• If
the
edges
are
directed,
they
are
also
called
arrows
and
the
graph
is
directed.
In
a
weighted
graph,
weights
are
assigned
to
the
edges.
The
graph
is
complete
if
all
the
ver;ces
are
connected
to
each
other.
• Probabilis;c
graphs
– nodes
↔
random
variables
(r.v.s)
Page
45
– edges
↔
probabilis;c
dependencies
between
these
r.v.s.
45
46. Common
graphical
models
• Bayesian
networks
–
directed
graphical
models
X
Causal
influence
descendants
of
X
• Markov
random
fields
–
not
directed
graphs
X
neighbors
of
X
Page 46
46
47. Markov
rule
• In
a
directed
graph
P(Xi | all nondescend ants) = P(Xi | Parents(Xi ))
• A
special
case:
Markov
chain
P(Xi | Xi−1,..., X1 ) = P(Xi | Xi−1 )
…
• Markov
random
field
P(Xi | all other nodes) = P(Xi | Neighbors (Xi ))
Page 47
47
48. Markov
Random
Fields
(MRFs)
• Non-‐directed
probabilis;c
graphs
• Used
a
lot
in
digital
image
processing
and
computer
vision
• This
example
illustrates
applica;on
in
image
segmenta;on
Page 48
48
50. Bayes’
rule
Product
rule
P(a ∧ b) = P(a | b) P(b)
P (b | a ) P ( a )
Bayes’
rule
P( a | b) =
P ( b)
Or
in
distribu;on
form
P( X | Y )P(Y )
P(Y | X ) =
=
P(
|
Y
)P(Y )
α
X
P( X )
Useful
for
accessing
diagnos)c
probability
from
causal
probability
P( Effect | Cause)P(Cause)
P(Cause | Effect ) =
P( Effect )
Olen
we
perceive
as
evidence
the
effect
of
some
unknown
cause
and
we
want
to
determine
that
cause,
e,g.
the
chance
of
diseasex
given
symptomy:
P( symptom y | disease x ) P(disease x )
P(disease x | symptom y ) =
P( symptom y )
Page 50
50
51. Bayesian
networks
A
simple,
graphical
nota;on
for
condi;onal
independence
asser;ons
and
hence
for
compact
specifica;on
of
full
joint
distribu;ons
Syntax:
• a
set
of
nodes,
one
per
variable
• a
directed,
acyclic
graph
(each
link
means
“directly
influences”)
• a
condi;onal
distribu;on
for
each
node
given
its
parents:
P( X i | Parents ( X i ))
Page 51
51
52. Network:
directed
acyclic
graph
Descendants
of
X
Non-‐descendants
of
X
Y
edges:
causal
influence
X
nodes:
random
variables
X has causal influence on Y
• Evidence for X forms causal support for Y
• Evidence for Y forms diagnostic support for X
Page 52
52
53. Network
separa;on
Let
us
inves;gate
(condi;onal)
independence
in
three
simple
networks
featuring
these
types
of
nodes,
and
let
denote
“a
and
b
are
condi;onally
independent
given
c”
P(a, b, c) = P(a) P(c | a) P(b | c)
Consider
now
evidence
in
c:
P(a, b) = ∑ P(a) P(c | a ) P(b | c) = P(a) P(b | a )
⇒
c
≠ P(a) P(b)
(in
this
network
a
and
b
are
in
general
not
independent)
P(a, b, c) P(a)P(c | a)P(b | c)
=
=
P(c)
P(c)
= P(a | c)P(b | c)
P(a, b | c) =
So,
we
can
say
that
the
node
c
blocks
the
path
between
a
and
b.
Page 53
53
54. D-‐separa;on
contd.
A,
B
and
C
are
non-‐overlapping
sets
A
C
B
The
sets
A
and
B
are
d-‐separated
by
C
if
each
node
in
A
is
d-‐separated
from
each
node
in
B
by
C
Page 54
54
55. Example:
Car
diagnosis
Ini;al
evidence:
car
won't
start
Testable
variables
(green),
“broken,
so
fix
it”
variables
(orange)
Hidden
variables
(gray)
ensure
sparse
structure,
reduce
parameters
Page 55
55
56. Belief
propaga;on
• Belief
propaga;on
algorithm
was
introduced
by
Judea
Pearl,
1982
• Exact
inference
in
networks
without
loops;
complexity
linear
in
the
number
of
nodes
•
Became
very
popular
aler
it
was
shown
that
the
same
computa;ons
are
in
turbo
codes
and
the
same
principles
in
the
Viterbi
algorithm
• Main
idea:
inference
by
local
message
passing
among
neighboring
nodes
The
message
can
loosely
be
interpreted
as
“I
(node
i )
think
that
you
(node
j)
are
that
much
likely
to
be
in
a
given
state”.
Page 56
56
57. Message
passing
revisited
1.
Distributed
soldier
coun;ng.
2.
Distributed
soldier
coun;ng
with
the
leader
in
line.
Page 57
57
58. Numenta: HTM model
An HTM network consists of regions
arranged in a hierarchy.
Jeff Hawkins: “It combines and
extends approaches used in
Bayesian networks, spatial and
temporal clustering algorithms, while
using a tree-shaped hierarchy of
nodes that is common in neural
networks.”
Read a book, it is a great fun ->
Page 58
59. Semantic web and IBM’s Watson
The "heart and soul” is Unstructured Information Management Architecture [UIMA]
Page 59
60. Presentation 2nd part
• Smart web
– API economy
– IOT
• Bayesian nets
– Troubleshooting and diagnostic
– Sensor integration via plugin framework
– Inteligent decisions and actions
– Cloud deployment
– IFTTT like application using framework above
Page 60
62. API
• APIs have become new patents
• Who holds the data, holds the knowledge
• Companies don’t share their know-how, but
they are willing to share their know-what
(via application programming interface API)
• API economy is coming, and it will be the
major driver of the profit for many
companies
Page 62
66. Sensor Networks
• Network of specialized sensors intended
to monitor and record conditions at diverse
locations.
• Commonly monitored parameters are
temperature, humidity, pressure, wind
direction and speed, illumination intensity,
vibration intensity, sound intensity, powerline voltage, chemical concentrations,
pollutant levels and vital body functions.
Page 66
68. M2M is becoming a reality
API economy has become reality
Page 68
69. Programmable web of the future
Sensors gather and push data to the cloud.
API economies share data and services in the
cloud.
In the cloud, intelligent engine aggregates and
correlates data from different sources,
creating a new VALUE. That can be used
either to:
– Provide new insights (analysis)
– Create new instructions (actions) via API
Page 69
70. Three types of AI/IOT
implementations
• “Ambient intelligence” – mash networks,
information flow and decisions stay local
• “IOT Analytics” – big data like use case
scenarios
• IOT Analytics + API’s + cloud + decision
engine + actions
Page 70
75. Technology that can deal with huge data
sets under complexity and uncertainty?
Page 75
Google/Toyota/Renault/Volvo driverless car research projects
78. Bayesian network modeling
Data analysis technique ideally suited to
messy, complex data. The focus is on
structure discovery – determining an optimal
graphical model which describes the interrelationships in the underlying processes
structure discovery AND inter-relationships
Page 78
79. • How do you express that car needs both
battery and fuel to function? Easy.
• How do you say that if your lights are not
working, most likely it is a battery fault, but
it could be as well that just lights are
broken? Still the fact that lights are not
working point to most likely cause of the
battery fault.
If you only model via composition and add behavior
separately – what most of the tools do these days – you
are heading for complexity!
Page 79
80. Example, car model
Car model with relations: NO Data available
Chance that the car will start is above 98%
Page 80
81. Car example, lights are off
off
Lights are off
Chance that battery functions dropped from 99,99% to less 50%
Chance that the car will start is bellow 50%
Page 81
82. Car example, lights are on
on
Lights are on
Battery works, there is no need to check it
Chance that the car will start now only depends on the fuel
Page 82
83. Prototype architecture
Database of recipes
Website where
User configures
Logic (recipes)
Decision engine
Pluggable
Actions
Developer extensions (new capabilities)
Pluggable sensors
Page 83