3. AI Computing
Caution: AI is NOT magic
AI is a unique approach to programming
computers
Thinking or conscious computer, is still far off
on the digital horizon
5. Intelligent Behavior
Learn from experience
Apply knowledge acquired from experience
Handle complex situations
Solve problems when important information is
missing
React quickly and correctly to a new situation
Be creative and imaginative
Use heuristics
6. Major Branches of AI
Robotics & Perceptive Systems
Mechanical and computer devices that perform tedious
tasks with high precision.
Games Playing
programming computers to play games. The greatest
advances have occurred in the field of games playing.
Natural Language Processing (NLP)
Computers understand and react to statements and
commands made in a “natural” language.
7. Major Branches of AI
Expert System (ES)
programming computers to make decisions in real-life
Neural Network
Computer system that can act like or simulate the
functioning of the human brain.
Unsupervised learning.
Supervised learning.
8. Machine Learning
Learning System
Machine learning is the study of computer algorithms
that improve automatically through experience
Computer changes how it functions or reacts to
situations based on feedback.
“A computer program is said to learn from experience E
with respect to some task T and some performance
measure P, if its performance on T, as measured by P,
improves with experience E”
Tom Mitchell (1998)
9. Human VS Artificial
Intelligence - Pros
Human Intelligence
Intuition, Common sense,
Judgment, Creativity, etc.
The ability to demonstrate
their intelligence by
communicating effectively
Reasoning and Critical
thinking
Artificial Intelligence
Ability to simulate human
behavior and cognitive
processes
Capture and preserve
human expertise
Fast Response.
10. Human VS Artificial
Intelligence - Cons
Human Intelligence
Humans are fallible
They have limited knowledge
Information processing of
serial nature proceed very
slowly in the brain
Humans are unable to retain
large amounts of data
Artificial Intelligence
No "common sense"
Cannot readily deal with
"mixed" knowledge
May have high
development costs
Raise legal and ethical
concerns
11. Conventional Computing
VS Artificial Intelligence
Artificial Intelligence
AI software uses the
techniques of search and
pattern matching
Programmers design AI
software to give the
computer only the problem,
not the steps necessary to
solve it
Conventional computing
Conventional computer
software follow a logical
series of steps to reach a
conclusion
Computer programmers
originally designed
software that
accomplished tasks by
completing algorithms
12. Knowledge Representation
& Limits
The number of atomic facts that the average
person knows is astronomical.
Building a complete knowledge base of
commonsense requires enormous amounts of
engineering.
Much of what people know is not represented as
"facts" that they could express verbally
13. Conclusion
Intelligent Agents must be able to set goals and
achieve them.
They need a way to visualize the future and be
able to make choices.
Currently, no computers exhibit full artificial
intelligence.
Early AI researchers developed algorithms that
require enormous computational resources.
The search for more efficient problem-solving
algorithms is a high priority for AI research.
14. Neural Networks
Traditional computers cannot work around the
failure of even a single transistor. With the
biological designs, the algorithms are ever
changing, allowing the system to continuously
adapt and work around failures to complete
tasks.
15. “We’re moving from engineering
computing systems to something that
has many of the characteristics of
biological computing”
Larry Smarr,
an astrophysicist who directs the California Institute for
Telecommunications and Information Technology
16. “The new approach, used in both
hardware and software, is being driven
by the explosion of scientific
knowledge about the brain. But
scientists are still far from fully
understanding how brains function”
Kwabena Boahen,
a computer scientist who leads Stanford’s Brains in
Silicon research program
17. “The largest class this fall at Stanford was a
graduate level machine-learning course
covering both statistical and biological
approaches, taught by the computer scientist
Andrew Ng. More than 760 students enrolled”
“Everyone knows there is something big
happening, and they’re trying find out what it is.”
Terry Sejnowski,
a computational neuroscientist at the Salk Institute
19. Nervous Systems
Human brain contains ~ 1011 neurons.
Each neuron is connected ~ 104 others.
Neurons are slower than logic gates :
10-9 secs for semiconductors
10-3 secs for biologicals neurons
Energy efficiency of the brain is estimated at:
10-16 Joules / operation / sec,
The best energy efficiency of computers : is
10-6 Joules / operation / sec
20. Nervous Systems
it takes on average between 100 and 200 msec
to recognize a familiar face,
it takes days to process much simpler tasks with
conventional computers
Some scientists compared the brain with a
“complex, nonlinear, parallel computer”.
21. IBM Supercomputer – Compass
I.B.M. announced last year that it had built a
supercomputer simulation of the brain (Compass)
It encompassed roughly 10 billion neurons.
It ran about 1,500 times more slowly than an actual brain.
Further, it required several megawatts of power,
compared with just 20 watts of power used by the
biological brain.
“attempting to simulate a brain, at the same speed
would require a flow of electricity in a conventional
computer that is equivalent to what is needed to power
both San Francisco and New York,”
Dr. Modha said
22. Google & DeepMind
Google has acquired DeepMind for 400M$
DeepMind has not yet developed
any commercial products.
DeepMind main asset appears to be its
personnel
DeepMind claims that it combines “the best
techniques from machine learning and systems
neuroscience to build powerful general-purpose
learning algorithms.”
23. Google & AI
Google researchers were able to get a machine-
learning algorithm based on neural networks, to
perform an identification task.
The network scanned a database of 10 million
images, and in doing so trained itself to
recognize cats
In June, Google said it had used those neural
network techniques to develop a new search
service to help customers find specific photos
more accurately
25. Neurons
The main purpose of neurons is to receive, analyze and
transmit further the information in a form of signals
(electric pulses).
When a neuron sends the information we say that a
neuron “fires”.
31. Knowledge and Memory
. . .
. . .
. . .
. . .
x1 x2 xm
y1 y2 yn
The output behavior of a network is
determined by the weights.
Weights the memory of an NN.
Knowledge distributed across the
network.
Large number of nodes
increases the storage “capacity”;
ensures that the knowledge is
robust;
fault tolerance.
Store new information by changing
weights.
32. Exp.: Pattern Classification
. . .
. . .
. . .
. . .
x1 x2 xm
y1 y2 yn
Function: x y
The NN’s output is used to
distinguish between and recognize
different input patterns.
Different output patterns correspond
to particular classes of input patterns.
Networks with hidden layers can be
used for solving more complex
problems then just a linear pattern
classification.
input pattern x
output pattern y
34. Supervised Learning Goals
The goal of any supervised learning
algorithm is to find a function that best
maps a set of inputs to its correct output.
An example would be a simple
classification task, where the input is an
image of an animal (or the characteristics
of this animal), and the correct output
would be the name of the animal.
35. Training Neural Network:
Back-Propagation
Supervised learning method,
Requires a dataset of the desired output
for many inputs, making up the training
set,
Backpropagation requires that the
activation function used by the artificial
neurons (or "nodes") be differentiable.
36. A multi-layered network can create internal
representations and learn different features per layer.
The first layer may be responsible for learning the
orientations of lines using the inputs from the individual
pixels in the image.
The second layer may combine the features learned in
the first layer and learn to identify simple shapes.
Each higher layer learns more and more abstract
features that can be used to classify the image.
Each layer finds patterns in the layer below it and it is
this ability to create internal representations that are
independent of outside input that gives multi-layered
networks its power.
Motivation
37. Backpropagation
Learning Algo.
The learning algorithm can be divided into two phases:
Phase 1: Propagation
Forward propagation of a training pattern's input through the
neural network in order to generate the propagation's output
activations.
Backward propagation of the propagation's output activations
through the neural network using the training pattern target in
order to generate the deltas of all output and hidden neurons.
Phase 2: Weight update
Subtract a ratio (percentage) of the gradient from the weight.
This ratio (percentage) influences the speed and quality of
learning; it is called the learning rate. The greater the ratio, the
faster the neuron trains; the lower the ratio, the more accurate
the training is.
38. Algorithm
initialize network weights (often small random values)
do
forEach training example ex
prediction = neural-net-output(network, ex) // forward pass
actual = teacher-output(ex)
compute error (prediction - actual) at the output units
compute for all weights from output layer to hidden layer //
backward pass
compute for all weights from hidden layer to input layer //
backward pass continued
update network weights
until all examples classified correctly or another stopping criterion
satisfied
return the network
40. Neuromorphic Processors
Those new processors consist of electronic components
that can be connected by wires that mimic biological
synapses.
They are based on large groups of neuron-like elements,
and known as neuromorphic processors,
They are not “programmed.”
The connections between circuits are “weighted”
according to correlations in data that the processor has
already “learned.”
Those weights are then altered as data flows in to the
chip, causing them to change their values and to “spike.”
That generates a signal that travels to other components
and, in reaction, changes the neural network,
41. Conclusion
Neural Network technology offers more natural
interaction with the real world.
Neural Networks can:
learn and adapt to changes in a problem’s environment,
establish patterns in situations where rules are not known,
deal with fuzzy or incomplete information.
However, they lack explanation facilities and usually act
as a black box.
The process of training neural networks with current
technologies is still slow.
43. Motion and Manipulation:
Robotics
Intelligence is required for robots to be able to handle
such tasks as object manipulation and navigation, with
sub-problems of:
localization
mapping
and motion
44. Robot Quick Description
Each Leg consists of 7 DOFs
3 DOFs – Active for the HIP
1 DOFs – Active for the KNEE
2 DOFs – Active for the ANKLE
1 DOFs – Passive for the FOOT
51. Fuzzy Logic
Fuzzy logic or fuzzy set theory was introduced by
Professor Lotfi Zadeh
Human experts do not usually think in probability values,
but in such terms as often, generally, sometimes,
occasionally and rarely.
At the heart of fuzzy logic lies the concept of a linguistic
variable
Linguistic variables are words rather than numbers
Fuzzy logic provides the way to break through the
computational bottlenecks of traditional expert systems.
Eventually, fuzzy theory, ignored in the West, was taken
seriously in the East – by the Japanese
52. Fuzzy Logic: Motivation
Modeling of imprecise concepts:
Age, Weight, Height, …
Modeling of imprecise dependencies:
If Temperature is low and Oil is cheap then
crank up the heating system
Origin of Information:
Modeling of Expert Knowledge
Representation of information extracted from
inherently imprecise data
61. Term Definitions:
Distance:= {far, medium, close, zero, neg_close}
Angle := {pos_big, pos_small, zero, neg_small, neg_big}
Power := {pos_high, pos_medium, zero, neg_medium, neg_high}
1. Fuzzification:
- Linguistic Variables -
Membership Function Definition:
-90° -45° 0° 45° 90°
0
1
µ
Angle
zero
pos_smallneg_smallneg_big pos_big
4°
0.8
0.2
-10 0 10 20 30
0
1
µ
Distance [yards]
zero close medium farneg_close
12m
0.9
0.1
62. Computation of the “IF-THEN”-Rules:
#1: IF Distance = medium AND Angle = pos_small THEN Power = pos_medium
#2: IF Distance = medium AND Angle = zero THEN Power = zero
#3: IF Distance = far AND Angle = zero THEN Power = pos_medium
#4: …….
2. Fuzzy-Inference:
“IF-THEN”-Rules
Aggregation: Computing the “IF”-Part
Composition: Computing the “THEN”-Part
The Rules of the Fuzzy
Logic Systems Are the
“Laws” It Executes !
63. 2. Fuzzy-Inference:
Composition
Result for the Linguistic Variable "Power":
pos_high with the degree 0.0
pos_medium with the degree 0.8 ( = max{ 0.8, 0.1 } )
zero with the degree 0.2
neg_medium with the degree 0.0
neg_high with the degree 0.0
Composition Computes
How Each Rule Influences
the Output Variables !
64. 3. Defuzzification
Finding a Compromise Using “Center-of-Maximum”:
-30 -15 0 15 30
0
1
µ
Power [Kilowatts]
zeroneg_mediumneg_high pos_medium pos_high
6.4 KW
“Balancing” Out
the Result !
66. Improved Computational Power
Fuzzy rule-based systems perform faster than
conventional expert systems
Fuzzy Systems require fewer rules.
A fuzzy expert system merges the rules, making them
more powerful.
Lotfi Zadeh believes that in a few years most expert
systems will use fuzzy logic to solve highly nonlinear
and computationally difficult problems.
67. Summary
Fuzzy systems allow expression of expert knowledge
in a more natural way,
they still depend on the rules extracted from the
experts, and thus might be smart or dumb.
Some experts can provide very clever fuzzy rules – but
some just guess and may even get them wrong.
Therefore, all rules must be tested and tuned, which
can be a prolonged and tedious process.
It took Hitachi engineers several years to test and tune
only 54 fuzzy rules to guide the Sendal Subway
System.
68. Expert Systems
An expert system is a computer
program that is designed to hold
the accumulated knowledge of
one or more domain experts
ES imitate the expert’s reasoning processes to solve specific
problems
69. Overview of Expert Systems
Can…
Explain their reasoning or suggested decisions
Display intelligent behavior
Draw conclusions from complex relationships
Provide portable knowledge
Expert system shell
A collection of software packages and tools used
to develop expert systems
70. IBM & Expert Systems
It has been two years since Watson, the artificial
intelligence program created by I.B.M.. Watson,
Watson has access to roughly 200 million pages
of information, and is able to understand natural
language queries and answer questions.
The computer maker had initially planned to test
the system as an expert adviser to doctors; the
idea was that Watson’s encyclopedic knowledge
of medical conditions could aid a human expert
in diagnosing illnesses.
71. IBM & Watson
In May, I.B.M. announced a general-purpose
version of its software, the “I.B.M. Watson
Engagement Advisor.”
The idea is to make the company’s question-
answering system available in a wide range of
call center, technical support and telephone
sales applications.
The company says that as many as 61 percent
of all telephone support calls currently fail
because human support-center employees are
unable to give people correct or complete
information.
72. When to Use an Expert
System
Capture and preserve irreplaceable human
expertise
Provide expertise needed at a number of
locations at the same time
Provide expertise needed in a hostile
environment that is dangerous to human health
Provide expertise that is expensive or rare
Develop a solution faster than human experts
Provide a high potential payoff or significantly
reduced downside risk
73. Limitations of Expert Systems
Limited to relatively narrow problems
May have high development costs
May raise legal and ethical concerns
Cannot readily deal with “mixed” knowledge
Possibility of error
Difficult to maintain
74. Legal and Ethical Issues
Who is responsible if the advice is wrong?
The user?
The domain expert?
The knowledge engineer?
The programmer of the expert system shell?
The company selling the software?
75. Transferring Expertise
Objective of an expert system
To transfer expertise from an expert to a
computer system and
Then on to other humans (nonexperts)
Activities
Knowledge acquisition
Knowledge representation
Knowledge inferencing
Knowledge transfer to the user
Knowledge is stored in a knowledge base
76. An Expert System Example
General Electric's (GE) : Top Locomotive Field Service Engineer was
Nearing Retirement
Traditional Solution: Apprenticeship but would like
A more effective and dependable way to disseminate expertise
To prevent valuable knowledge from retiring
To minimize extensive travel or moving the locomotives
To MODEL the way a human troubleshooter works
Months of knowledge acquisition
3 years of prototyping
A novice engineer or technician can perform at an expert’s level
On a personal computer
Installed at every railroad repair shop served by GE
77. Participants in Expert
Systems
Domain expert
The individual or group whose expertise and
knowledge is captured for use in an expert system
Knowledge user
The individual or group who uses and benefits from
the expert system
Knowledge engineer
Someone trained or experienced in the design,
development, implementation, and maintenance of
an expert system
78. Determining requirements
Identifying experts
Construct expert system components
Implementing results
Maintaining and reviewing system
Expert Systems Development
Domain
• The area of knowledge
addressed by the
expert system.
80. Evolution of Expert Systems
Software
Expert system shell
Collection of software packages & tools to design,
develop, implement, and maintain expert systems
Easeofuse
low
high
Before 1980 1980s 1990s
Traditional
programming
languages
Special and 4th
generation
languages
Expert system
shells
83. Applications of Expert
Systems
DESIGN ADVISOR:
Gives advice to
designers of
processor chips
MYCIN:
Medical system for
diagnosing blood disorders.
First used in 1979
84. Applications of Expert
Systems
DENDRAL: Used to
identify the structure of
chemical compounds.
First used in 1965
LITHIAN: Gives advice
to archaeologists
examining stone tools
86. Expert Systems Benefits
Enhancement of Problem Solving and Decision Making
Improved Product and Decision Quality
Increased Output and Productivity
Decreased Decision Making Time
Capture Scarce Expertise
Can Work with Incomplete or Uncertain Information
Knowledge Transfer to Remote Locations
87. Problems and Limitations of
Expert Systems
Domain experts not always able to explain their logic and
reasoning
ES work well only in a narrow domain of knowledge
Knowledge engineers are rare and expensive
Expert system users have natural cognitive limits
Lack of trust by end-users
ES may not be able to arrive at valid conclusions
ES may sometimes produce incorrect recommendations
Lacks common sense
Cannot make creative responses as human expert
Cannot adapt to changing environments
88. Conclusion
Classic expert systems are especially good for closed-
system applications with precise inputs and logical
outputs.
They use expert knowledge in the form of rules and, if
required, can interact with the user to establish a
particular fact.
A major drawback is that human experts cannot always
express their knowledge in terms of rules or explain the
line of their reasoning.
This can prevent the expert system from accumulating
the necessary knowledge, and consequently lead to its
failure.
89. Summary
Expert, neural and fuzzy systems have now matured
and been applied to a broad range of different
problems, mainly in engineering, medicine, finance,
business and management.
Each technology handles the uncertainty and ambiguity
of human knowledge differently, and each technology
has found its place in knowledge engineering. They no
longer compete; rather they complement each other.
A synergy of expert systems with fuzzy logic and neural
computing improves adaptability, robustness, fault-
tolerance and speed of knowledge-based systems.
Besides, computing with words makes them more
“human”.
91. R Tops Data Mining Software
Poll
For the past 12 years, KDNuggets has conducted an
annual poll asking "What analytics/data mining software
you used in the past 12 months for a real project (not just
evaluation)".
In this year's poll, R was the top-ranked data mining
solution, selected by 30.7% of poll respondents.
Microsoft Excel was second, at 29.8%. Rapidminer,
which took the #1 spot over R in 2011 and 2010, ranked
third.
And as Bob Muenchen notes, four of the top five ranked
data mining solutions in this year's poll are open-source.
R was also ranked in this poll as the most popular
language for implementing data mining application,
beating out SQL and Java.
92. Important Problems in Data
Mining
Prediction
Finding patterns (Apriori)
Clustering
Classification
Regression
Ranking
Density Estimation
93. Prediction
For most of the following algorithms (as well as linear
regression), we would in practice first generate the model
using training data, and then predict values for test data.
To make predictions, we use the predict function.
Typically, the first argument is the variable in which you
saved the model, and the second argument is a matrix or
data frame of test data.
For instance, if we were to predict for the linear
regression model above, and x1 test and x2 test are
vectors containing test data, we can use the command
>predicted_values<-predict(lm_model,
newdata=as.data.frame(cbind(x1_test, x2_test)))
94. Finding patterns (Apriori)
In large datasets -e.g. (Diapers → Beer). Use Apriori!
To run the Apriori algorithm, first install the arules
package and load it.
Note that the dataset must be a binary incidence matrix;
the column names should correspond to the “items” that
make up the “transactions.” The following commands
print out a summary of the results and a list of the
generated rules.
> dataset <-read.csv("C:Datasetsmushroom.csv", header =
TRUE) > mushroom_rules <-apriori(as.matrix(dataset), parameter
= list(supp = 0.8, conf = 0.9)) > summary(mushroom_rules) >
inspect(mushroom_rules)
95. Clustering
grouping data into clusters that “belong” together -
objects within a cluster are more similar to each other
than to those in other clusters.
Kmeans, Kmedians
Input: {xi}mi=1,xi ∈X ⊂ Rn
Output: f : X →{1,...,K} (K clusters)
clustering consumers for market research, clustering
genes into families, image segmentation (medical
imaging)
If X is the data matrix and m is the number of clusters,
then the command is:
> kmeans_model <-kmeans(x=X, centers=m)
96. Classification
Input: {(xi,yi)}m “examples,” “instances with labels,” “observations”
xi ∈X,yi ∈ {−1, 1} “binary”
Let X train and X test be matrices of the training and test data respectively, and
labels be a binary vector of class attributes for the training examples. For k
equal to K, the command is:
> knn_model <-knn(train=X_train, test=X_test, cl=as.factor(labels), k=K)
automatic handwriting recognition, speech recognition,
biometrics, document classification
Identifying to which of a set of categories a new
observation belongs, on the basis of a training set of data.
Decision trees: rpart, party
Random forest: randomForest, party
SVM: e1071, kernlab
Neural networks: nnet, neuralnet, RSNNS
Performance evaluation: ROCR
97. Regression
Input: {(xi,yi)}mi=1, xi ∈X,yi ∈ R
Output: f : X→ R
predicting an individual’s income, predict house prices,
predict stock prices, predict test scores
the command is:
> glm_mod <-glm(y ∼ x1+x2, family=binomial(link="logit"),
data=as.data.frame(cbind(y,x1,x2)))
99. Density Estimation
predict conditional probabilities
{(xi,yi)}mi=1, xi ∈X,yi ∈ {−1, 1}
Output: f : X→ [0, 1] as “close” to P(y =1|x) as possible.
estimate probability of failure, probability to default on
loan
100. Training and Testing
for supervised learning
Training: training data are input, and model f is the
output
Testing: You want to predict y for a new x, where (x, y)
comes from the same distribution as
Compute f(x) and compare it to y. How well does f(x)
match y? Measure goodness of f using a loss function
Rtest(f)
Rtest is also called the true risk or the test error
We want Rtest to be small, to indicate that f(x) would be
a good predictor (“estimator”) of y called the true risk or
the test error
101. Time series decomposition: decomp(),
decompose(), arima(), stl()
Time series forecasting: forecast
Time Series Clustering: TSclust
Dynamic Time Warping (DTW): dtw
Time Series Analysis with R
102. Packages: igraph, sna
Centrality measures: degree(), betweenness(),
closeness(), transitivity()
Clusters: clusters(), no.clusters()
Cliques: cliques(), largest.cliques(), maximal.cliques(),
clique.number()
Community detection: fastgreedy.community(),
spinglass.community()
Social Network Analysis with R
103. Scatter plot
dataset <-read.csv ('fbgood.txt',head=TRUE, sep='t', row.names=1)
x = dataset$friends
y = dataset$getgoods
plot(x,y)
105. 2nd order polynomial fit
plot(x,y)
polyfit2 <- lm(y ~ poly(x, 2));
lines(sort(x), polyfit2$fit[order(x)], col = 2, lwd = 3)
106. 3rd order polynomial fit
plot(x,y)
polyfit3 <- lm(y ~ poly(x, 3));
lines(sort(x), polyfit3$fit[order(x)], col = 2, lwd = 3)
107. Packages: RHadoop, RHive
RHadoop10 is a collection of 3 R packages:
rmr2 - perform data analysis with R via MapReduce on a Hadoop
cluster
rhdfs - connect to Hadoop Distributed File System (HDFS)
rhbase - connect to the NoSQL HBase database
You can play with it on a single PC (in standalone or pseudo-
distributed mode), and your code developed on that will be able to
work on a cluster of PCs (in full-distributed mode)!
Step by step to set up my first R Hadoop system
http://www.rdatamining.com/tutorials/rhadoop
¹⁰https://github.com/RevolutionAnalytics/RHadoop/wiki
R and Hadoop
108. An Example of MapReducing
with R
library(rmr2)
map <- function(k, lines) {
words.list <- strsplit(lines, "s")
words <- unlist(words.list)
return(keyval(words, 1))
}
reduce <- function(word, counts) {
keyval(word, sum(counts))
}
wordcount <- function(input, output = NULL) {
mapreduce(input = input, output = output, input.format = "text",
map = map, reduce = reduce)
}
## Submit job
out <- wordcount(in.file.path, out.file.path)
109. Thank you for your time !
Email: hserhan@hotmail.com
THE END