WEKA: A Useful Tool for Air Quality Forecasting

Weka: A Useful Tool for Air Quality Forecasting William F. Ryan Department of Meteorology The Pennsylvania State University [email_address] 2007 National Air Quality Conference, Orlando

Weka The weka, or woodhen, is a bird native to New Zealand. Weka is also the name of a suite of machine learning software tools, written in Java, and developed at the University of Wiakato in New Zealand. http://www.cs.waikato.ac.nz/ml/weka

Machine Learning ,[object Object],[object Object]

Weka Can Be A Useful Tool ,[object Object],[object Object],[object Object],[object Object],[object Object]

Weka and PM 2.5 Forecasting ,[object Object],[object Object],[object Object]

PM 2.5 Forecasting O 3 (left panel) is well-behaved statistically. Distribution is near normal with a strong association with maximum temperature. As a result, linear techniques are useful. PM 2.5 (right panel) is not well- behaved. Distribution is skewed, no strong association with any particular weather variable. Tools included in Weka, including ANN and classification and regression trees (CART), are capable of addressing non-linear problems posed by PM 2.5 .

Weka: Information http://www.cs.waikato.ac.nz/ml/weka/

Input File Format Weka uses its own file format called: *.aarf All you need to do though is provide a *.csv file with variable names in the first line and Weka will convert

aarf Format aarf format is simple anyway: ASCII file List of variable and type Then data follows, comma separated Missing data marked as “?”

Data Editing Data can be easily edited within Weka itself

Analyzing Data Variables can be easily scanned with basic statistics and histograms provided by Weka

Sampling and Test Data Set Options

Functions Available WEKA includes a number of different techniques that can be useful for forecast development. These include: Linear and logistic regression Perceptron models (Neural networks)

Linear Regression Unfortunately, the “work horse” linear regression module in Weka is limited in usefulness: -No automatic stepwise function -Poor diagnostics Compare: SYSTAT, Minitab

Classification and Regression Trees (CART) A variety of classification algorithms are available. Standard algorithm is J48, which is a souped up version of the last free version of CART (Version 4.5) Commercial version is currently 5.0.

CART Options ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

CART Diagnostics CART is notorious for using CPU resources but the WEKA version runs efficiently on my standard PC. Diagnostics are better for CART than linear regression. Example on left is of a 4 category PM 2.5 CART forecast.

Artificial Neural Networks (ANN) “ Linear Regression by a mob” Produces forecast by taking the weighted sum of predictors and then layering the process.

Artificial Neural Networks - Summary Known samples (historical data) are used to “train” the network. Input data (x i ) are assigned weights (w i ) and combined in the “hidden” layer – like a set of linear regressions. These sets are then combined in additional layers – like regressions of regressions. The sum of data and weights are transformed (“squashed”) to the range of the training data and error is measured. A supervised training algorithm uses output error to adjust network weights to minimize errors.

Artificial Neural Networks – Pros/Cons ,[object Object],[object Object],[object Object],[object Object]

Example: Neural Network Structure www.doc.ic.ac.uk/~sgc/teaching/v231/

WEKA Neural Networks WEKA provides user control of training parameters: # of iterations or epochs (“training time”) Increment of weight adjustments in back propogation (“learning rate”) Controls on varying changes to increments (“momentum”)

Conclusions ,[object Object],[object Object],[object Object]

URLs of Interest ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Acknowledgements ,[object Object],[object Object]

WEKA: A Useful Tool for Air Quality Forecasting

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (10)

Similaire à WEKA: A Useful Tool for Air Quality Forecasting

Similaire à WEKA: A Useful Tool for Air Quality Forecasting (20)

Plus de butest

Plus de butest (20)

WEKA: A Useful Tool for Air Quality Forecasting