2. When to consider Linear
Regression?
When the outcome, or class, is numeric, and all
the attributes are numeric.
The idea is to express the class as a linear
combination of the attributes, with predetermined
weights:
x = w0 + w1a1 + w2a2 + … + wkak
x is the class; a1, a2, …, ak are the attribute values;
and w0, w1, …, wk are weights.
5. Linear Regression in Weka
Options specific to
weka.classifiers.functions.LinearRegression:
D. Produce debugging output (default disabled).
S <number of selection method>. Set the
attribute selection method to use. 1 = None, 2 =
Greedy (default 0 = M5' method).
C. Do not try to eliminate colinear attributes.
R <double>. Set ridge parameter (default 1.0e
8).
6. Linear Regression in Weka
S <number of selection method>. Set the
method used to select attributes for use in the
linear regression:
0 = M5' method. Build trees whose leaves are
associated to multivariate linear models and the
nodes of the tree are chosen over the attribute that
maximizes the expected error reduction, given by
the Akaike information criterion (a measure of the
relative goodness of fit of a statistical model).
7. Linear Regression in Weka
1 = None. No need explanation.
2 = Greedy. ”For example, a greedy strategy for
the traveling salesman problem (which is of a high
computational complexity) is the following
heuristic: "At each stage visit an unvisited city
nearest to the current city". This heuristic need not
find a best solution but terminates in a reasonable
number of steps; finding an optimal solution
typically requires unreasonably many steps” from
Wikipedia.
8. Linear Regression in Weka
C. Do not try to eliminate colinear attributes.
Possible examples:
high performance, expensive German cars
low performance, cheap American cars
9. Linear Regression in Weka
R <double>. Set ridge parameter (default 1.0e8).
Its value is assigned by the analyst, and determines
how much Ridge Regression departs from Least
Square Regression, whose goal is to circumvent the
problem of predictors collinearity.
If this value is too small, Ridge Regression cannot
fight collinearity efficiently.
If it is too large, the bias of the parameters become too
large, and so do the parameters and predictions Mean
Square Errors.
It has therefore to be estimated by a series of trial and
errors, usually resorting to crossvalidation
10. References
I. Witten, E. Frank and M. Hall. Data Mining: Practical Machine
Learning Tools and Techniques (Third Edition). Elsevier. MA,
USA, 2011.
Weka API. Class LinearRegression. Extracted on October 16,
2012 from
http://weka.sourceforge.net/doc/weka/classifiers/functions/LinearRegre
D. Rodríguez, J.J. Cuadrado, M.A. Sicilia and R. Ruiz.
Segmentation of Software Engineering Datasets Using the M5
Algorithm. Extracted on October 14, 2012 from
http://www.cc.uah.es/drg/c/ICCS06.pdf
AI Access. Ridge Regression. Extracted on October 16, 2012 from
http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_ridge.htm