APM Welcome, APM North West Network Conference, Synergies Across Sectors
A survey paper of virtual friend
1. [Type text] [Type text] [Type text]
2012
A Survey Paper of Virtual Friend Chatbot
Siddiq Abu Bakkar [09-13368-1]
AMERICAN INTERNA TIONAL UNIVERSITY BANGLADESH (AIUB)
CSE DEPARTMENT
shaon_sikdar@yahoo.com ; shaon.sikdar@gmail.com
Shaon
[Type the company name]
3/20/2012
2. 1 | P age
A Survey Paper of Virtual Friend Chatbot
Siddiq Abu Bakkar
09-13368-1
AMERICAN INTERNA TIONAL UNIVERSITY BANGLADESH (AIUB)
CSE DEPARTMENT
shaon_sikdar@yahoo.com ; shaon.sikdar@gmail.com
Abstract:
A chatter robot, chatterbot
, chatbot or chat bot is a computer When the ―USER‖ exceeded the
program designed to simulate an very small knowledge base, VF might
intelligent conversation with one or more provide a generic response, for example,
human users via auditory or textual responding to ―I won't go to university
methods, primarily for engaging in small today.‖ with ―Why you won't go to
talk. The primary aim of such simulation university, are you feeling sick?‖. The
has been to fool the user into thinking that response to ―Yahoo! I have got 3.94 CGPA
the program's output has been produced by in this semesters. ‖ would be
a human (the Turing test). Programs ―Congratulation!! I am very much happy
playing this role are sometimes referred to for your excellent result.‖ VF is
as Artificial Conversational Entities, talk implemented using simple pattern
bots or chatterboxes. In addition, however, matching techniques, but is taken seriously
chatterbots are often integrated into dialog by several of it users, even after explained
systems for various practical purposes to them how it worked.
such as online help, personalized service,
or information acquisition. Some
chatterbots use sophisticated natural
language processing systems, but many
simply scan for keywords within the input
and pull a reply with the most matching
keywords, or the most similar wording
pattern, from a textual database.
Virtual Friend (VF) is a computer
program and early example of primitive
natural language processing. VF operated
by processing user's response to scripts,
the most famous of which was DOCTOR,
a simulation of a Rogerian
psychotherapist. Eliza, using almost no
information about human thought or
emotion, DOCTOR sometimes provided a
startlingly human-like interaction .Eliza
was written at MIT by Joseph
Weizaenbaum between 1964 and 1966.
Virtual Friend Response
Virtual Friend Chatbot Siddiq Abu Bakkar 09-13368-1
3. 2 | P age
The program was designed to Natural Language Processing:
showcase the digitized voices the cards
The history of machine translation
were able to produce, though the quality
dates back to the seventeenth century,
was far from life-like. Its AI engine was
when philosophers such
likely based on something similar to
as Leibniz and Descartes put forward
the ELIZA algorithm.
proposals for codes which would relate
words between languages. All of these
Contents: proposals remained theoretical, and none
resulted in the development of an actual
1. Natural Language Processing machine.
[NLP] The first patents for "translating
2. Machine Learning [ML] machines" were applied for in the mid-
I. Supervised learning 1930s. One proposal, by Georges
algorithms Artsrouni was simply an automatic
II. Logic based algo- bilingual dictionary using paper tape. The
rithms other proposal, by Peter Troyanskii,
Decision a Russian, was more detailed. It included
trees both the bilingual dictionary, and a method
for dealing with grammatical roles
III. Statistical learning between languages, based on Esperanto.
algorithms
In 1950, Alan Turing published his
famous article "Computing Machinery and
Intelligence"[1] which proposed what is
Naive Bayes
classifiers now called the Turing test as a criterion of
intelligence. This criterion depends on the
Bayesian
Networks ability of a computer program to
impersonate a human in a real-time written
3. Speech Recognition [SR] conversation with a human judge,
sufficiently well that the judge is unable to
4. Turing Test [TT] distinguish reliably - on the basis of the
5. Most Popular Chatbots conversational content alone - between the
program and a real human.
a. ELIZA
In 1957, Noam
b. PARRY
Chomsky’s Syntactic
c. The Chinese Room Structures revolutionized Linguistics with
d. SIRI 'universal grammar', a rule based system of
syntactic structures. However, the real
i. Details of SIRI progress of NLP was much slower, and
ii. Reception Of SIRI after the ALPAC report in 1966, which
found that ten years long research had
iii. SIRI says some
weird things failed to fulfill the expectations, funding
was dramatically reduced internationally.
6. References.
In 1969 Roger Schank introduced
the conceptual dependency theory for
natural language understanding. This
Virtual Friend Chatbot Siddiq Abu Bakkar 09-13368-1
4. 3 | P age
model, partially influenced by the work take, but rather must discover which ac-
of Sydney Lamb, was extensively used by tions yield the best reward, by trying each
Schank's students at Yale University, such action in turn.
as Robert Wilensky, Wendy Lehnert,
andJanet Kolodner. Numerous ML applications involve
tasks that can be set up as supervised. In
In 1970, William A. Woods the present paper, we have concentrated on
introduced the augmented transition the techniques necessary to do this. In par-
network (ATN) to represent natural ticular, this work is concerned with classi-
language input. Instead of phrase structure fication problems in which the output of
rules ATNs used an equivalent set of finite instances admits only discrete, unordered
state automata that were called recursively. values. Instances with known labels (the
ATNs and their more general format called corresponding correct outputs)
"generalized ATNs" continued to be used We have limited our references to recent
for a number of years. refereed journals, published books and
conferences. In addition, we have added
some references regarding the original
Machine Learning: work that started the particular line of re-
There are several applications for search under discussion. A brief review of
Machine Learning (ML), the most signifi- what ML includes can be found in (Dutton
cant of which is data mining. People are & Conroy, 1996). De Mantaras and Ar-
mengol (1998) also presented a historical
often prone to making mistakes
duringanalyses or, possibly, when trying to survey of logic and instance based learning
establish Relationships between multiple classifiers. The reader should be cautioned
features. This makes it difficult for them to that a single article cannot be a compre-
find solutions to certain problems. Ma- hensive review of all classification learn-
ing algorithms. Instead, our goal has been
chine learning can often be successfully
applied to these problems, improving the to provide a representative sample of exist-
efficiency of systems and the designs of ing lines of research in each learning tech-
machines. nique. In each of our listed areas, there are
Every instance in any dataset used many other papers that more comprehen-
sively detail relevant work.
by machine learning algorithms is repre-
sented using the same set of features. The
features may be continuous, categorical or Supervised learning algorithms
binary. If instances are given with known
labels (the corresponding correct outputs)
Inductive machine learning is the
then the learning is called supervised, in process of learning a set of rules from in-
contrast to unsupervised learning, where
stances (examples in a training set), or
instances are unlabeled. By applying these more generally speaking, creating a classi-
unsupervised (clustering) algorithms, re-
fier that can be used to generalize from
searchers hope to discover unknown, but new instances. The process of applying
useful, classes of items (Jain et al., 1999).
supervised ML to a real-world problem is
Another kind of machine learning described in Figure
is reinforcement learning (Barto & Sutton,
1997). The training information provided
to the learning system by the environment
(external trainer) is in the form of a scalar
reinforcement signal that constitutes a
measure of how well the system operates.
The learner is not told which actions to
Virtual Friend Chatbot Siddiq Abu Bakkar 09-13368-1
5. 4 | P age
only used to handle noise but to cope with
the infeasibility of learning from very
large datasets. Instance selection in these
datasets is an optimization problem that
attempts to maintain the mining quality
while minimizing the sample size (Liu and
Motoda, 2001). It reduces data and enables
a data mining algorithm to function and
work effectively with very large datasets.
There is a variety of procedures for sam-
pling instances from a large dataset
(Reinartz, 2002). Feature subset selection
is the process of identifying and removing
as many irrelevant and redundant features
as possible (Yu & Liu, 2004). This reduces
the dimensionality of the data and enables
data mining algorithms to operate faster
and more effectively. The fact that many
features depend on one another often
unduly influences the accuracy of super-
vised ML classification models. This prob-
Figure: The process of supervised ML lem can be addressed by constructing new
features from the basic feature set (Mar-
The first step is collecting the da- kovitch & Rosenstein, 2002). This tech-
taset. If a requisite expert is available, then nique is called feature construc-
s/he could suggest which fields (attributes, tion/transformation. These newly generat-
features) are the most informative. If not, ed features may lead to the creation of
then the simplest method is that of ―brute- more concise and accurate classifiers. In
force,‖ which means measuring everything addition, the discovery of meaningful fea-
available in the hope that the right (in- tures contributes to better comprehensibil-
formative, relevant) features can be isolat- ity of the produced class.
ed. However, a dataset collected by the
―brute-force‖ method is not directly suita- Logic based algorithms:
ble for induction. It contains in most cases
noise and missing feature values, and Decision trees:
therefore requires significant pre-
processing (Zhang et al., 2002). Murthy (1998) provided an over-
view of work indecision trees and a sample
The second step is the data prepara- of their usefulness to newcomers as well as
tion and data preprocessing. Depending on practitioners in the field of machine learn-
the circumstances, researchers have a ing. Thus, in this work, apart from a brief
number of methods to choose from to han- description of decision trees, we will refer
dle missing data (Batista & Monard, to some more recent works than those in
2003). Hodge & Austin (2004) have re- Murthy’s article as well as few very im-
cently introduced a survey of contempo- portant articles that were published earlier.
rary techniques for outlier (noise) detec- Decision trees are trees that classify in-
tion. These researchers have identified the stances by sorting them based on feature
techniques’ advantages and disadvantages. values. Each node in a decision tree repre-
Instance selection is not sents a feature in an instance to be classi-
fied, and each branch represents a value
Virtual Friend Chatbot Siddiq Abu Bakkar 09-13368-1
6. 5 | P age
that the node can assume. Instances are analysis (LDA) and the related Fisher's
classified starting at the root node linear discriminant are simple methods
and sorted based on their feature values. used in statistics and machine learning to
Figure is an example of a decision tree for find the linear combination of features
the training set of Table. which best separate two or more classes of
object (Friedman, 1989). LDA works
when the measurements made on each ob-
servation are continuous quantities. When
dealing with categorical variables, the
equivalent technique is Discriminant
Correspondence Analysis (Mika et
al.1999). Maximum entropy is another
general technique for estimating probabil-
ity distributions from data. The overriding
principle in maximum entropy is that when
nothing is known, the distribution should
be as uniform as possible, that is, have
maximal entropy. Labeled training data is
used to derive a set of constraints for the
model that characterize the class-specific
expectations for the distribution. Csiszar
(1996) provides a good tutorial introduc-
tion to maximum entropy techniques.
Bayesian networks are the most well-
known representative of statistical learning
algorithms. A comprehensive book on
Bayesian networks is Jensen’s
(1996). Thus, in this study, apart from our
brief description of Bayesian networks, we
mainly refer to more recent works.
Using the decision tree depicted in Figure
as an example, the instance 〈at1 = a1, at2 = Naive Bayes classifiers:
b2, at3 = a3, at4 =b4〉
nodes: at1, at2, and finally at3, which Naive Bayesian networks (NB) are
would classify the instance as being posi- very simple Bayesian networks which are
tive (represented by the values ―Yes‖). The composed of directed acyclic graphs with
problem of constructing optimal binary only one parent (representing the unob-
decision trees is an NPcomplete problem served node) and several children (corre-
and thus theoreticians have searched sponding to observed nodes) with a strong
for efficient heuristics for constructing assumption of independence among child
near-optimal decision trees. nodes in the context of their parent (Good,
1950).Thus, the independence model
Statistical Learning Algorithms: (Naive Bayes) is based on estimating
(Nilsson, 1965):
Conversely to ANNs, statistical
approaches are characterized by having an R= ( )
explicit underlying probability model, ()
which provides a probability that an ()()
instance belongs in each class, rather than ()()
simply a classification. Linear discriminant
Virtual Friend Chatbot Siddiq Abu Bakkar 09-13368-1
7. 6 | P age
()() network has the limitation that each fea-
()() ture can be related to only one other fea-
||| ture. Semi-naive Bayesian classifier is an-
||| other important attempt to avoid the
r independence assumption. (Kononenko,
r 1991), in which attributes are partitioned
PiXPiPXiPiPXi into groups and it is assumed that xi is
PjXPjPXjPjPXj conditionally independent of xj if and only
= = ΠΠ if they are in different groups.
Comparing these two probabilities,
the larger probability indicates that the The major advantage of the naive
class label value that is more likely to be Bayes classifier is its short computational
the actual label (if R>1: predict i time for training. In addition, since the
predict j). Cestnik et al (1987) first used model has the form of a product, it can be
the Naive Bayes in ML community. Since converted into a sum through the use of
the Bayes classification algorithm uses a logarithms – with significant consequent
product operation to compute the probabil- computational advantages. If a feature is
ities P(X, i), it is especially prone to being numerical, the usual procedure is to discre-
unduly impacted by probabilities of 0. This tize it during data pre-processing (Yang &
can be avoided by using Laplace estimator Webb, 2003), although a researcher can
or m-esimate, by adding one to all numera- use the normal distribution to calculate
tors and adding the number of added ones probabilities (Bouckaert, 2004).
to the denominator (Cestnik, 1990).
Bayesian Networks:
The assumption of independence
among child nodes is clearly almost al- A Bayesian Network (BN) is a
ways wrong and for this reason naive graphical model for probability relation-
Bayes classifiers are usually less accurate ships among a set of variables (features).
that other more sophisticated learning al- The Bayesian network structure S is a di-
gorithms (such ANNs). rected acyclic graph (DAG) and the nodes
in S are in one-to-one correspondence with
However, Domingos & Pazzani the features X. The arcs represent casual
(1997) performed a large-scale comparison influences among the features while the
of the naive Bayes classifier with state-of- lack of possible arcs in S encodes condi-
the-art algorithms for decision tree induc- tional independencies. Moreover, a feature
tion, instance-based learning, and rule in- (node) is conditionally independent from
duction on standard benchmark datasets, its non-descendants given its parents (X1 is
and found it to be sometimes superior to conditionally independent from X2 given
the other learning schemes, even on da- X3 if P(X1|X2,X3)=P(X1|X3) for all possi-
tasets with substantial feature dependen- ble values of X1, X2, X3).
cies.
The basic independent Bayes mod- Speech recognition:
el has been modified in various ways in
attempts to improve its performance. At- In Computer Science, Speech
tempts to overcome the independence recognition is the translation of spoken
assumption are mainly based on adding words into text. It is also known as
extra edges to include some of the depend- "automatic speech recognition", "ASR",
encies between the features, for example "computer speech recognition", "speech to
(Friedman et al. 1997). In this case, the text", or just "STT".
Virtual Friend Chatbot Siddiq Abu Bakkar 09-13368-1
8. 7 | P age
Speech Recognition is technology generate performance indistinguishable
that can translate spoken words into from that of a human being. All
text. Some SR systems use "training" participants are separated from one
where an individual speaker reads sections another. If the judge cannot reliably tell the
of text into the SR system. These systems machine from the human, the machine is
analyze the person's specific voice and use said to have passed the test. The test does
it to fine tune the recognition of that not check the ability to give the correct
person's speech, resulting in more accurate answer; it checks how closely the answer
transcription. Systems that do not use resembles typical human answers. The
training are called "Speaker Independent" conversation is limited to a text-only
systems. Systems that use training are channel such as a computer
keyboard and screen so that the result is
called "Speaker Dependent" systems.
not dependent on the machine's ability to
Speech recognition applications render words into audio.
include voice user interfaces such as voice
dialing (e.g., "Call home"), call routing ("I
would like to make a collect
call"), demotic appliance control, search
(e.g., find a podcast where particular
words were spoken), simple data entry
(e.g., entering a credit card number),
preparation of structured documents (e.g.,
a radiology report), speech-to-text
processing (e.g., word
processors or emails), and aircraft (usually
termed Direct Voice Input).
The term voice recognition refers
to finding the identity of "who" is
speaking, rather than what they are
saying. Recognizing the speaker voice The test was introduced by Alan
recognition can simplify the task of Turing in his 1950 paper Computing
translating speech in systems that have Machinery and Intelligence, which opens
been trained on specific person's voices or with the words: "I propose to consider the
it can be used to authenticate or verify the question, 'Can machines think?'" Since
identity of a speaker as part of a security "thinking" is difficult to define, Turing
process. "Voice recognition" means chooses to "replace the question by
"recognizing by voice", something humans another, which is closely related to it and
do all the time over the phone. As soon as is expressed in relatively unambiguous
someone familiar says "hello" the listener words." Turing's new question is: "Are
can identify them by the sound of their there imaginable digital computers which
voice alone. would do well in the imitation
game?" This question, Turing believed, is
Turing Test: one that can actually be answered. In the
The Turing test is a test of remainder of the paper, he argued against
a machine's ability to exhibit intelligent all the major objections to the proposition
behavior. In Turing's original illustrative that "machines can think".
example, a human judge engages in a
natural language conversation with
a human and a machine designed to
Virtual Friend Chatbot Siddiq Abu Bakkar 09-13368-1
9. 8 | P age
cent of the time — a figure consistent with
random guessing.
ELIZA and PARRY
In the 21st century, versions of
In 1966, Joseph these programs (now known as
Weizenbaum created a program which "chatterbots") continue to fool people.
appeared to pass the Turing test. The "CyberLover", a malware program, preys
program, known as ELIZA, worked by on Internet users by convincing them to
examining a user's typed comments for "reveal information about their identities
keywords. If a keyword is found, a rule or to lead them to visit a web site that will
that transforms the user's comments is deliver malicious content to their
applied, and the resulting sentence is computers".The program has emerged as a
returned. If a keyword is not found, ELIZA "Valentine-risk" flirting with people
responds either with a generic riposte or by "seeking relationships online in order to
repeating one of the earlier comments. In collect their personal data".
addition, Weizenbaum developed ELIZA
to replicate the behaviour of a Rogerian The Chinese Room
psychotherapist, allowing ELIZA to be
Main article: Chinese room
"free to assume the pose of knowing
almost nothing of the real world." With John Searle's 1980 paper Minds,
these techniques, Weizenbaum's program Brains, and Programs proposed an
was able to fool some people into argument against the Turing Test known as
believing that they were talking to a real the "Chinese room" thought experiment.
person, with some subjects being "very Searle argued that software (such as
hard to convince that ELIZA ELIZA) could pass the Turing Test simply
is nothuman." Thus, ELIZA is claimed by by manipulating symbols of which they
some to be one of the programs (perhaps had no understanding. Without
the first) able to pass the Turing understanding, they could not be described
Test, although this view is highly as "thinking" in the same sense people do.
contentious (see below). Therefore—Searle concludes—the Turing
Test cannot prove that a machine can
Kenneth Colby created PARRY in
think. Searle's argument has been widely
1972, a program described as "ELIZA with criticized, but it has been endorsed as well.
attitude".[26] It attempted to model the
behaviour of a paranoidschizophrenic, Arguments such as that proposed
using a similar (if more advanced) by Searle and others working on
approach to that employed by the philosophy of mind sparked off a more
Weizenbaum. In order to validate the intense debate about the nature of
work, PARRY was tested in the early intelligence, the possibility of intelligent
1970s using a variation of the Turing Test. machines and the value of the Turing test
A group of experienced psychiatrists that continued through the 1980s and
analysed a combination of real patients 1990s.
and computers running PARRY
through teleprinters. Another group of 33
psychiatrists were shown transcripts of the
conversations. The two groups were then
asked to identify which of the "patients"
were human and which were computer
programs. The psychiatrists were able to
make the correct identification only 48 per
Virtual Friend Chatbot Siddiq Abu Bakkar 09-13368-1
10. 9 | P age
Siri (Speech Interpretation and CEO of Siri at Apple after the launch of
Recognition Interface) the iPhone 4S.
Siri (Speech Interpretation and
Recognition Interface)
(pronounced /ˈsɪri/) is an intelligent
personal assistant and knowledge
navigator which works as an application Reception Of Siri:
for Apple's iOS. The application uses Siri was met with a very positive
a natural language user interface to answer reaction for its ease of use and practicality,
questions, make recommendations, and as well as its apparent
perform actions by delegating requests to a "personality". Google’s executive
set of web services. Apple claims that the chairman and former chief, Eric Schmidt,
software adapts to the user's individual has conceded that Siri could pose a
preferences over time and personalizes
"competitive threat" to the company’s core
results, and performing tasks such as
search business. Google generates a large
finding recommendations for nearby
portion of its revenue from clickable ad
restaurants, or getting directions.
links returned in the context of searches.
Siri was originally introduced as an The threat comes from the fact that Siri is
iOS application available in the App a non-visual medium, therefore not
Store by Siri Inc. Siri Inc. was acquired by affording users with the opportunity to be
Apple on April 28, 2010. Siri Inc. had exposed to the clickable ad links. Writing
announced that their software would be in The Guardian, journalist Charlie
available for BlackBerry and for Android- Brooker described Siri's tone as "servile"
powered phones, but all development while also noting that it worked
efforts for non-Apple platforms were "annoyingly well."
cancelled after the acquisition by Apple.
Siri is now an integral part of iOS
5, and available only on the iPhone 4S,
launched on October 4, 2011. Despite this,
hackers were able to adapt Siri in prior
iPhones. On November 8, 2011, Apple
publicly announced that it had no plans to
support Siri on any of its older devices.
Siri Inc. was founded in 2007
by Dag Kittlaus (CEO), Adam Cheyer (VP
Engineering), andTom Gruber (CTO/VP
Design), together with Norman Winarsky
from SRI International's venture group. On
October 13, 2008, Siri announced it had
raised an $8.5 million Series A financing
round, led by Menlo
Ventures and Morgenthaler Ventures. In
November 2009, Siri raised a $15.5
million Series B financing round from the
same investors as in their previous round, However, Siri was criticized by
but led by Hong-Kong billionaire Li Ka- organizations such as the American Civil
shing. Dag Kittlaus left his position as Liberties Union and NARAL Pro-Choice
Virtual Friend Chatbot Siddiq Abu Bakkar 09-13368-1
11. 10 | P a g e
Despite many functions still requiring the
use of the touchscreen, the National
Federation of the Blind describes the
iPhone as "the only fully
accessible handset that a blind person can
buy".
America after users found that it would not
provide information about the location of
birth control or abortion providers,
sometimes directing users to anti-
abortion crisis pregnancy centers instead.
Apple responded that this was a glitch
which would be fixed in the final version.
It was suggested that abortion providers
could not be found in a Siri search because
they did not use "abortion" in their
descriptions. At the time the controversy
arose, Siri would suggest locations to buy
illegal drugs, hire a prostitute, or dump a
corpse, but not find birth control or
abortion services. Apple responded that
this behavior is not intentional and will
improve as the product moves from beta to
final product.
Siri has not been well received by
some English speakers with distinctive
accents, including Scottish and Americans
from Boston or the South. Apple's Siri
FAQ states that, "as more people use Siri
and it’s exposed to more variations of a
language, its overall recognition of dialects
and accents will continue to improve, and
Siri will work even better."
Virtual Friend Chatbot Siddiq Abu Bakkar 09-13368-1
12. 11 | P a g e
Siri says some weird things
t
e
x
S
i
r
i
s
a
y
s
s
o
m
e
w
e
i
r
d
t
h
i
n
g
s
Virtual Friend Chatbot Siddiq Abu Bakkar 09-13368-1