1. Clarkson Honors Program Thesis Proposal
Altering the AdaBoost algorithm to produce a new boosting method
yielding more accurate results under the same number of repetitions.
April 5, 2000
Daniel Lawry
Professor Christino Tamon, Advisor
Topics:
Boosting is a method used implicitly to improve the accuracy of learning algorithms. Boosting's
roots lie in a theoretical framework for studying machine learning called "PAC" learning model. The
creators of this model: Kearns and Valiant, presented the hypothesis that a "weak" learning algorithm, an
algorithm which produces results slightly better than random guessing, in the PAC model can be boosted,
increasing the weak learning algorithm's accuracy and creating a "strong" learning algorithm. Currently, a
boosting algorithm called AdaBoost produces the desired increase in accuracy given a weak learning
algorithm. The AdaBoost method utilizes the weak learning method it is given and a training set (xl,yl),...,
(xm,ym) where xi belongs to a domain X and each label yi is in some label set Y. AdaBoost then calls the
weak learning algorithm repeatedly in a series of T rounds giving weights to the training sets and
updating the weights of these sets each repetition by utilizing data from the last run of the weak
predictor and current weights. These weights will increase or decrease as they are run through the
method, yielding more accurate results based on these new weights each time the training set is run
through the weak learning algorithm. It is believed that by eliminating the last k runs of the weak learning
algorithm where k < t where t is the number of times the weak learning algorithm has been used so far
will force this method to produce more accurate results with the same amount of repetitions. The
elimination of the last k runs forces the current run's data to draw on a smaller set of output from the weak
hypothesis repetitions. By doing this, the hope is that the algorithm will place more of an emphasis on the
runs left in the hypothesis repetitions forcing it to have to become more accurate faster. The parameters to
investigate include the appropriate value of k based on the weak learning algorithm and the number of
repetitions, T, of that algorithm. The investigation will also involve developing this new boosting method
and testing it against the AdaBoost method.
Methodology:
The new boosting method will be developed and constructed for testing purposes using the C
programming language. Likewise the AdaBoost method will be constructed using the C programming
2. language. A formula to maximize the value of k using the number of repetitions, T, and the efficiency of
the weak learning method will be formed. This formula will then be tested in conjuncture with the new
boosting method and variations on the variable k. Once the optimal value of k is achieved, the two
methods will be run on the same training sets and the resulting data will be compared as to see which
method yields more accurate results.