Artificial intelligence and IoT

Programmable web of the future

{
firstName: Veselin,
lastName: Pizurica,
epochTime: 1381953702
}

Free Powerpoint Templates

Page 1

Today talk is about the future future of the web
Integration/convergence:
–  API’s
–  Sensor Networks/M2M
–  Cloud
–  Data mining
–  Intelligent decision engines

Page 2

Introduction to AI
–  Learning, Pattern recognition
–  Intelligent agents
–  Probabilistic reasoning and uncertainty
–  Graphical models

Page 3

Material used

• 
• 
• 
• 

UGent AI course: http://telin.ugent.be/~sanja/ArtificialIntelligence
BaysiaLab
white
paper

Wikipedia

Google
search

Page 4

Map
of
Analy;c
Modeling

Page
Breiman
(2001)
and
Shmueli
(2010) 5

Predic;ve
modeling

= f (X)
Page 6

Explanatory
modeling

Y = (X)

Page 7

Intelligent
agents

Agent:
an
en;ty
that
perceives
and
acts
(from
La;n
agere,
to
do)

Ra)onal
agent
is
one
that
acts
so
as
to
achieve
the
best
outcome,
or
when
there
is

uncertainty,
the
best
expected
outcome

Abstractly,
an
agent
is
a
func;on
from
percept
histories
to
ac;ons:

For
any
given
class
of
environments
and
tasks,
we
seek
the
agent
(or
class
of
agents)
with

the
best
performance

In
prac;ce,
computa;onal
limita;ons
make
perfect
ra;onality
unachievable
à
design
best

program
for
given
machine
resources

Page 8
8

Ra;onality

•  A
ra;onal
agent
is
one
that
does
the
right
thing.

•  How
do
we
know
whether
it
is
the
right
thing?

-‐

By
considering
the
consequences
of
the
agent
behavior

(i.e.,
the
sequence
of
states
through
which
the

environment
goes
as
a
result
of
agent’s
behavior)

•  A
sequence
of
states
(through
which
the
environment
goes)
is

evaluated
by
a
performance
measure

Page 9
9

Specifying
the
task
environment

To design a rational agent, we must specify the
task environment
Consider the task of designing an automated taxi:
–  Performance measure: safety, destination, profits, legality,
comfort
–  Environment: streets/freeways, traffic, pedestrians, weather
–  Actuators: steering, accelerator, brake, horn, speaker/display
–  Sensors: video, acceleromaters, gauges, engine sensors,
keyboard, sensors

Page 10
10

Environment
types

Page 11
11

Environment
types

Page 12
12

Environment
types

Page 13
13

Environment
types

Page 14
14

Environment
types

Page 15
15

Environment
types

Page 16
16

Environment
types

Page 17
17

Environment
types

Page 18
18

Agent
types

•  Four
basic
types
in
order
of
increasing

generality:

–  simple
reﬂex
agents

–  reﬂex
agents
with
state

–  goal-‐based
agents

–  u;lity-‐based
agents

All
these
can
be
turned
into
learning
agents

Page 19
19

Simple
reﬂex
agents

Page 20
20

Reﬂex
agents
with
state

Page 21
21

Goal-‐based
agents

Page 22
22

U;lity-‐based
agents

Page 23
23

Why
learning?
Why
do
we
want
an
agent
to
learn?
(Why
not
program
an

improved
design
from
the
beginning)?

–  Cannot
an;cipate
all
possible
situa;ons
that
the
agent

might
ﬁnd
itself
in

–  Cannot
an;cipate
all
changes
over
;me

–  Programmers
might
not
know
how
to
program
a
solu;on

themselves
(e.g.
how
to
program
face
recogni;on)

Learning
modiﬁes
the
agent's
decision
mechanisms
to
improve

performance

Page 24

Paaern
recogni;on
Unsupervised
learning

–  Learning
paaerns
without
explicit
feedback
supplied

–  The
system
forms
clusters
or
natural
groupings
of
the
input
paaerns

(based
on
some
similarity
criteria).
➡Clustering

Reinforcement
learning

–  Learning
from
a
series
of
reinforcements
–
rewards
and
punishments

Supervised
learning

–  Learning
a
func;on
that
maps
input
to
output
based
on
available

(observed)
input-‐output
pairs
(Correct
answers
for
each
instance)

Semi-‐supervised
learning

–  A
few
labeled
samples
available
and
a
large
collec;on
of
unlabeled

ones

–  Learn
from
geometry
of
unlabeled
samples
and
use
the
labeled
ones

Page 25
to
improve
the
learning

Supervised
Learning
labeled training sets, used to train a classifier

Page 26

Unsupervised Learning
• 
• 

No labeled training sets are provided
System applies a specified clustering/grouping criteria to unlabeled dataset Clusters/groups
together “most similar” objects (according to given criteria)

Page 27

Pattern Recognition Process
Data acquisition and sensing
– Measurements of physical variables.
– Important issues: bandwidth, resolution , etc.

Pre-processing
– Removal of noise in data.
– Isolation of patterns of interest from the
background.

Feature extraction
– Finding a new representation in terms of
features.

Classification
– Using features and learned models to assign a
pattern to a category.

Post-processing
– Evaluation of confidence in decisions.

Page 28

Feature vectors
Single object represented by several features, e.g. shape, size, color,
weight
x1 = shape(e.g.nr of sides)
x2 = size(e.g. some numeric value)
x3 = color (e. g. rgb values)
xd = some other(numeric)feature.

X becomes a feature vector

Page 29

Classical model of Pattern
Recognition

Page 30

Example of Simple Classifier

Page 31

“Curse of dimensionality”

Finding
the
principal
eigenvectors
of
the
covariance
matrix
of
the
data:
PCA

Page 33

PCA
Principal component analysis (PCA) is
a orthogonal transformation to convert
a set of observations of possibly
correlated variables into a set of
values of linearly uncorrelated
variables called principal components.

It is not, however, optimized for class
separability. An alternative is the
linear discriminant analysis, which
does take this into account. PCA is
also sensitive to the scaling of the
variables.

Page 34

Deep Learning
•  Choosing the correct feature representation of input
data, is a way that people can bring prior knowledge of a
domain to increase an algorithm's computational
performance and accuracy. To move towards general
artificial intelligence, algorithms need to be less
dependent on this feature engineering and better learn to
identify the explanatory factors of input data on their
own.
•  Deep learning tries to move in this direction by capturing
a 'good' representation of input data by using
compositions of non-linear transformations.
Page 35

Two types of models
•  Probabilistic graphical models have
nodes in each layer that are considered
as latent random variables. In this case,
you care about the probability
distribution of the input data x and the
hidden latent random variables h that
describe the input data in the joint
distribution p(x,h). These latent random
variables describe a distribution over the
observed data.
•  Direct encoding (neural network) models
have nodes in each layer that are
considered as computational units. This
means each node h performs some
computation (normally nonlinear like a
sigmoidal function) given its inputs from
the previous layer.
Page 36

Decision trees
1.  Learn rules from data
2.  Apply each rule at each
node
3.  Classification is at the
leafs of the tree

Page 37

Decision Trees example
Example:
decision
whether
to
wait
for
a
table
in
a
restaurant
depending
on

the
following
aaributes:

1.  Alternate
(Alt):
Is
there
a
suitable
alterna;ve
restaurant
nearby?

2. 
Bar:
Is
there
a
comfortable
bar
area
in
the
restaurant,
where
I
can
wait?

3. 
Fri/Sat
(Fri):
True
on
Fridays/Saturdays

4. 
Hungry
(Hun):
Are
we
hungry?

5. 
Patrons
(Pat):
How
many
people
are
in
the
restaurant
(None,
Some
or
Full)

6. 
Price:
the
restaurant’s
price
range
($,
$$,
$$$)

7. 
Raining
(Rain):
Is
it
raining
outside?

8. 
ReservaBon
(Res):
Did
we
make
a
reserva;on?

9. 
Type:
the
kind
of
restaurant
(French,
Italian,
Thai
or
burger)

10. 
WaitEsBmate
(Est):
the
wait
;me
es;mated
by
the
host
(0-‐10min,
10-‐30,

30-‐60,
or
>60)

Page 38

Decision tree
How
many
dis;nct
decision
trees
we
have
with
n
Boolean
aaributes?

=
number
of
Boolean
func;on
=
number
of
dis;nct
truth
tables
with
2^n
rows
=

2^n^n

E.g., with 6 Boolean attributes 18,446,744,073,709,551,616

Page 39

Uncertainty

Let
At
denote
the
ac;on
“leave
for
airport
t
minutes
before
ﬂight”

Will
At
get
me
there
on
;me?

?

?

?
•  Purely
logical
approach
leads
to
weak
conclusions:

§ 
“A90
will
get
me
there
on
;me
if
there
is
no
accident
on
the
way
and
it

doesn't
rain
and
my
;res
remain
intact
and
no
meteorite
hits
the
car,
etc”

§ None
of
these
can
be
inferred
for
sure
à
plan
success
cannot
be
inferred

Page 40
40

Uncertainty

•  Consider
diagnosis
of
a
pa;ent
with
headache.
Many
reasons
are
possible
like

sinus
problems
or
eye
vision,
tense
muscles,
flu,
cancer,…
Suppose
a
logical

rule
that
aaempts
to
express
this

Headache
⇒
SinusiBs
∨
EyeSight
∨
SBffNeck
∨
Flu
∨
Cancer…
•  The
problem
is
that
there
is
almost
unlimited
list
of
possible
causes.

The

causal
rule,
like
SBffNeck=>Headache
doesn’t
work
either
(s;ff
neck
doesn’t

always
cause
headache)

•  Trying
to
use
logic
in
this
type
of
domains
fails
because

§  there
is
too
much
work
to
list
all
the
aaributes

§  no
complete
theory
or
knowledge

§  not
all
the
necessary
tests
can
be
or
have
been
run

Page 41
41

Why
probabilis;c
reasoning?

•  Probabilis;c
reasoning
is
useful
because
logic
olen
fails
due
to

Laziness

and

Ignorance

too
many

aaributes
to
list
Theore;cal

Prac;cal

(no
complete

knowledge
of
the

domain)

(not
enough

observa;ons,

tests,..)

•  Probabilis;c
asser;ons
summarize
the
eﬀects
of
laziness
and
ignorance

Page 42
42

Graphical
models

•  Graphical
models

•  Markov
random
ﬁelds

•  Bayesian
networks

Page 43
43

Graphical
models

Graphical
models

Bayesian
networks

Graphical
models
are
related
to
mathema;cal

graph
theory

Page 44
44

Probabilis;c
graphs

•  A
graph
is
a
set
of
objects
(represented
by
nodes,
also
called

ver)ces
or
points),
where
some
pairs
of
the
nodes
are

connected
by
links
(edges).

•  If
the
edges
are
directed,
they
are
also
called
arrows
and
the

graph
is
directed.
In
a
weighted
graph,
weights
are
assigned
to

the
edges.
The
graph
is
complete
if
all
the
ver;ces
are

connected
to
each
other.

•  Probabilis;c
graphs

–  nodes
↔
random
variables
(r.v.s)

Page
45

–  edges
↔
probabilis;c
dependencies
between
these
r.v.s.

45

Common
graphical
models

•  Bayesian
networks
–
directed
graphical
models

X

Causal
inﬂuence

descendants
of
X

•  Markov
random
ﬁelds
–
not
directed
graphs

X

neighbors
of
X
Page 46
46

Markov
Random
Fields
(MRFs)

•  Non-‐directed
probabilis;c
graphs

•  Used
a
lot
in
digital
image
processing
and
computer
vision

•  This
example
illustrates
applica;on
in
image
segmenta;on

Page 48
48

Bayesian
networks

symptoms

smoker?

X-‐ray

travel

disease
1

disease
2

Page 49
49

Bayes’
rule

Product
rule
P(a ∧ b) = P(a | b) P(b)
P (b | a ) P ( a )

Bayes’
rule
P( a | b) =
P ( b)

Or
in
distribu;on
form

P( X | Y )P(Y )
P(Y | X ) =

=

P(

|
Y
)P(Y )

α

X

P( X )

Useful
for
accessing
diagnos)c
probability
from
causal
probability

P( Effect | Cause)P(Cause)

P(Cause | Effect ) =
P( Effect )

Olen
we
perceive
as
evidence
the
eﬀect
of
some
unknown
cause
and
we
want
to

determine
that
cause,
e,g.
the
chance
of
diseasex
given
symptomy:

P( symptom y | disease x ) P(disease x )

P(disease x | symptom y ) =

P( symptom y )
Page 50
50

Bayesian
networks

A
simple,
graphical
nota;on
for
condi;onal
independence
asser;ons
and

hence
for
compact
speciﬁca;on
of
full
joint
distribu;ons

Syntax:

•  a
set
of
nodes,
one
per
variable

•  a
directed,
acyclic
graph
(each
link
means
“directly
inﬂuences”)

•  a
condi;onal
distribu;on
for
each
node
given
its
parents:

P( X i | Parents ( X i ))

Page 51
51

Network:
directed
acyclic
graph

Descendants
of
X
Non-‐descendants
of
X

Y

edges:
causal
inﬂuence

X

nodes:
random
variables

X has causal influence on Y
•  Evidence for X forms causal support for Y
•  Evidence for Y forms diagnostic support for X
Page 52
52

Network
separa;on

Let
us
inves;gate
(condi;onal)
independence
in
three
simple

networks
featuring
these
types
of
nodes,
and
let

denote
“a
and
b
are
condi;onally
independent
given
c”

P(a, b, c) = P(a) P(c | a) P(b | c)

Consider
now
evidence
in

c:

P(a, b) = ∑ P(a) P(c | a ) P(b | c) = P(a) P(b | a )
⇒
c
≠ P(a) P(b)
(in
this
network
a
and
b
are
in
general
not

independent)

P(a, b, c) P(a)P(c | a)P(b | c)
=
=
P(c)
P(c)
= P(a | c)P(b | c)

P(a, b | c) =

So,
we
can
say
that
the
node
c

blocks
the
path
between
a
and
b.

Page 53
53

D-‐separa;on
contd.

A,
B
and
C
are
non-‐overlapping
sets

A

C

B

The
sets
A
and
B
are
d-‐separated
by
C
if

each
node
in
A
is
d-‐separated
from
each
node
in
B
by
C
Page 54
54

Example:
Car
diagnosis

Ini;al
evidence:
car
won't
start

Testable
variables
(green),
“broken,
so
ﬁx
it”
variables
(orange)

Hidden
variables
(gray)
ensure
sparse
structure,
reduce
parameters

Page 55
55

Belief
propaga;on

•  Belief
propaga;on
algorithm
was
introduced
by
Judea
Pearl,
1982

•  Exact
inference
in
networks
without
loops;
complexity
linear
in
the
number

of
nodes

• 
Became
very
popular
aler
it
was
shown
that
the
same
computa;ons
are
in

turbo
codes
and
the
same
principles
in
the
Viterbi
algorithm

•  Main
idea:
inference
by
local
message
passing
among
neighboring
nodes

The
message
can
loosely
be
interpreted
as
“I
(node
i )
think
that
you

(node
j)
are
that
much
likely
to
be
in
a
given
state”.

Page 56
56

Message
passing
revisited

1.
Distributed
soldier
coun;ng.

2.
Distributed
soldier
coun;ng
with
the
leader
in
line.

Page 57
57

Numenta: HTM model
An HTM network consists of regions
arranged in a hierarchy.
Jeff Hawkins: “It combines and
extends approaches used in
Bayesian networks, spatial and
temporal clustering algorithms, while
using a tree-shaped hierarchy of
nodes that is common in neural
networks.”
Read a book, it is a great fun ->
Page 58

Semantic web and IBM’s Watson

The "heart and soul” is Unstructured Information Management Architecture [UIMA]

Page 59

Presentation 2nd part
•  Smart web
–  API economy
–  IOT

•  Bayesian nets
–  Troubleshooting and diagnostic
–  Sensor integration via plugin framework
–  Inteligent decisions and actions
–  Cloud deployment
–  IFTTT like application using framework above
Page 60

API
•  APIs have become new patents
•  Who holds the data, holds the knowledge
•  Companies don’t share their know-how, but
they are willing to share their know-what
(via application programming interface API)
•  API economy is coming, and it will be the
major driver of the profit for many
companies
Page 62

Classical products distribution

Services distributed via API

Page 63

Sensor Networks
•  Network of specialized sensors intended
to monitor and record conditions at diverse
locations.
•  Commonly monitored parameters are
temperature, humidity, pressure, wind
direction and speed, illumination intensity,
vibration intensity, sound intensity, powerline voltage, chemical concentrations,
pollutant levels and vital body functions.
Page 66

M2M is becoming a reality
API economy has become reality

Page 68

Programmable web of the future
Sensors gather and push data to the cloud.
API economies share data and services in the
cloud.
In the cloud, intelligent engine aggregates and
correlates data from different sources,
creating a new VALUE. That can be used
either to:
–  Provide new insights (analysis)
–  Create new instructions (actions) via API
Page 69

Three types of AI/IOT
implementations
•  “Ambient intelligence” – mash networks,
information flow and decisions stay local
•  “IOT Analytics” – big data like use case
scenarios
•  IOT Analytics + API’s + cloud + decision
engine + actions

Page 70

IF THIS THEN THAT
IS NOT GOING TO WORK

Page 73

CRM/BPM
IS NOT GOING TO WORK

Page 74

Technology that can deal with huge data
sets under complexity and uncertainty?

Page 75
Google/Toyota/Renault/Volvo driverless car research projects

Bayes models will win the battle

Page 76

Why is this different?

Page 77

Bayesian network modeling
Data analysis technique ideally suited to
messy, complex data. The focus is on
structure discovery – determining an optimal
graphical model which describes the interrelationships in the underlying processes
structure discovery AND inter-relationships
Page 78

•  How do you express that car needs both
battery and fuel to function? Easy.
•  How do you say that if your lights are not
working, most likely it is a battery fault, but
it could be as well that just lights are
broken? Still the fact that lights are not
working point to most likely cause of the
battery fault.
If you only model via composition and add behavior
separately – what most of the tools do these days – you
are heading for complexity!
Page 79

Example, car model

Car model with relations: NO Data available
Chance that the car will start is above 98%

Page 80

Car example, lights are off

off
Lights are off
Chance that battery functions dropped from 99,99% to less 50%
Chance that the car will start is bellow 50%

Page 81

Car example, lights are on

on
Lights are on
Battery works, there is no need to check it
Chance that the car will start now only depends on the fuel

Page 82

Prototype architecture
Database of recipes

Website where
User configures
Logic (recipes)

Decision engine

Pluggable
Actions

Developer extensions (new capabilities)

Pluggable sensors
Page 83

Artificial intelligence and IoT

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (11)

Similaire à Artificial intelligence and IoT

Similaire à Artificial intelligence and IoT (20)

Plus de Veselin Pizurica

Plus de Veselin Pizurica (16)

Dernier

Dernier (20)

Artificial intelligence and IoT