2. Lazy learners
lazy learning is a learning method in which
generalization of the training data is, in
theory, delayed until a query is made to the
system, as opposed to in eager learning,
where the system tries to generalize the
training data before receiving queries.
Lazy learners do less work while training
data is given and more work when
classification of a test tuple is given.
3. The classification methods discussed so far
in this chapter—decision tree induction,
Bayesian classification, rule-based
classification, classification by
backpropagation, support vector machines,
and classification based on association rule
mining—are all examples of eager learners
A lazy learner simply stores the training data
and only when it sees a test tuple starts
generalization to classify the tuple based on
its similarity to the stored training tuple
4. Building a model from a given set of training
data
Applying the model to a given set of testing
data
Eager Learners like Bayesian Classification,
Rule-based classification, support vector
machines, etc. will construct a classification
model before receiving new tuple when a set
of training tuple is given
5. k-Nearest-Neighbor
Classifiers
The k-nearest-neighbor method was first
described in the early 1950s.
Nearest-neighbor classifiers are based on
learning by analogy, that is, by comparing a
given test tuple with training tuples that are
similar to it.
The training tuples are described
by n attributes. Each tuple represents a point
in an n-dimensional space.
6. In this way, all of the training tuples are
stored in an n-dimensional pattern space.
When given an unknown tuple, a k-nearest-
neighbor classifier searches the pattern
space for the k training tuples that are
closest to the unknown tuple
distance between two points or tuples,
say, X1 = (x11, x12…. x1n) and X2 =
(x21, x22…x2n)
When given a test tuple, a k-nearest
neighbor classifier searches the pattern
space for the k training tuples that are
closest to the test tuple.
These k training tuples are the k “nearest
neighbors” of the test tuple
7.
8. Case-Based Reasoning
Base-based reasoning is the process of
solving new problems based on the solutions
of similar past problems.
These classifiers use a database of problem
solutions to solve new problems.
The case-based reasoner tries to combine
the solutions of the neighboring training
cases in order to propose a solution for the
new case
9. Case-based reasoning (CBR) classifiers use
a database of problem solutions to solve new
problems.
Unlike nearest-neighbor classifiers, which
store training tuples as points in Euclidean
space, CBR stores the tuples or cases‖ for
problem solving as complex symbolic
descriptions.
Business applications of CBR include
problem resolution for customer service help
desks, where cases describe product-related
diagnostic problems.
10. CBR has also been applied to areas such as
engineering and law, where cases are either
technical designs or legal rulings, respectively.
Medical education is another area for CBR,
where patient case histories and treatments are
used to help diagnose and treat new patients.
The case-based reasoner tries to combine the
solutions of the neighboring training cases in
order to propose a solution for the new case.
The case-based reasoner may employ
background knowledge and problem-solving
strategies in order to propose a feasible
combined solution.
11. Other classification methods
Data mining involves six common classes of
tasks. Anomaly detection, Association rule
learning, Clustering, Classification,
Regression,
Summarization. Classification is a
major technique in data mining and widely
used in various fields.
Classification is a technique where we
categorize data into a given number of
classes
12. Binary Classification: Classification task
with two possible outcomes Eg: Gender
classification (Male / Female)
Multi class classification: Classification
with more than two classes. In multi class
classification each sample is assigned to one
and only one target label Eg: An animal can
be cat or dog but not both at the same time
Multi label classification: Classification
task where each sample is mapped to a set
of target labels (more than one class). Eg: A
news article can be about sports, a person,
and location at the same time.
13. Naïve Bayes
Naive Bayes algorithm based on Bayes’
theorem with the assumption of
independence between every pair of
features. Naive Bayes classifiers work well in
many real-world situations such as document
classification and spam filtering.
This algorithm requires a small amount of
training data to estimate the necessary
parameters. Naive Bayes classifiers are
extremely fast compared to more
sophisticated methods.
14. Fuzzy Set Approaches
Fuzzy Set Theory is also called Possibility
Theory. This theory was proposed by Lotfi
Zadeh in 1965 as an alternative the two-value
logic and probability theory
This theory allows us to work at a high level of
abstraction. It also provides us the means for
dealing with imprecise measurement of data.
fuzzy set approach an important consideration
is the treatment of data from a linguistic view
point from this has developed an approach that
uses linguistically quantified propositions to
summarize the content of a data base by
providing a general characterization of the
analyzed data