SlideShare une entreprise Scribd logo
1  sur  27
Weka:  A Useful Tool for Air Quality Forecasting  William F. Ryan Department of Meteorology The Pennsylvania State University [email_address] 2007 National Air Quality Conference, Orlando
Weka The weka, or woodhen, is a bird native to New Zealand.  Weka is also the name of a suite of machine learning software tools, written in Java, and developed at the University of Wiakato in New Zealand. http://www.cs.waikato.ac.nz/ml/weka
Machine Learning ,[object Object],[object Object]
Weka Can Be A Useful Tool ,[object Object],[object Object],[object Object],[object Object],[object Object]
Weka and PM 2.5  Forecasting ,[object Object],[object Object],[object Object]
PM 2.5  Forecasting O 3  (left panel) is well-behaved statistically.  Distribution is near normal with a strong association with maximum temperature. As a result, linear techniques are useful. PM 2.5  (right panel) is not well- behaved.  Distribution is skewed, no strong association with any particular weather variable. Tools included in Weka, including  ANN and classification and regression trees (CART),  are capable of addressing  non-linear problems posed by PM 2.5 .
Weka: Information http://www.cs.waikato.ac.nz/ml/weka/
Input File Format Weka uses its own file format called:  *.aarf All you need to do though is provide a *.csv file with variable names in the first line and Weka will convert
aarf Format aarf format is simple anyway: ASCII file List of variable and type Then data follows, comma separated Missing data marked as “?”
Data Editing Data can be easily edited within Weka itself
Analyzing Data Variables can be easily scanned with basic statistics and histograms provided by Weka
Quick Analysis Tools
Sampling and Test Data Set Options
Functions Available WEKA includes a number of  different techniques that can be useful for forecast development. These include: Linear and logistic regression Perceptron models (Neural networks)
Linear Regression Unfortunately, the “work horse” linear regression module in Weka is limited in usefulness: -No automatic stepwise function -Poor diagnostics Compare:  SYSTAT, Minitab
Classification and Regression Trees (CART) A variety of classification algorithms are available. Standard algorithm is J48, which is a souped  up version of the last free version of CART (Version 4.5) Commercial version is currently 5.0.
CART Options ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
CART Diagnostics CART is notorious for using CPU resources but the WEKA version runs efficiently on my standard PC. Diagnostics are better for CART than linear regression. Example on left is of a 4 category PM 2.5  CART forecast.
CART Visualization
Artificial Neural Networks (ANN) “ Linear Regression by a mob” Produces forecast by taking the weighted sum of predictors and then layering the  process.
Artificial Neural Networks - Summary Known samples (historical data) are used to “train” the network. Input data (x i ) are assigned weights (w i ) and combined in the “hidden” layer – like a set of linear regressions.  These sets are then combined in  additional layers – like regressions of regressions.  The sum of data and weights are transformed (“squashed”) to the range of the training data and error is measured.  A supervised training algorithm uses output error to adjust network weights to minimize errors.
Artificial Neural Networks – Pros/Cons ,[object Object],[object Object],[object Object],[object Object]
Example:  Neural Network Structure www.doc.ic.ac.uk/~sgc/teaching/v231/
WEKA Neural Networks WEKA provides user control of training parameters: # of iterations or epochs  (“training time”) Increment of weight adjustments in back propogation (“learning rate”) Controls on varying changes to increments (“momentum”)
Conclusions ,[object Object],[object Object],[object Object]
URLs of Interest ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Acknowledgements ,[object Object],[object Object]

Contenu connexe

Tendances

Using FME to Transfer Park Asset Data From an Oracle Database to Trimble GPS ...
Using FME to Transfer Park Asset Data From an Oracle Database to Trimble GPS ...Using FME to Transfer Park Asset Data From an Oracle Database to Trimble GPS ...
Using FME to Transfer Park Asset Data From an Oracle Database to Trimble GPS ...
Safe Software
 
Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...
Rafael Ferreira da Silva
 
STATA - Presenting Output
STATA - Presenting OutputSTATA - Presenting Output
STATA - Presenting Output
stata_org_uk
 

Tendances (10)

Multithreaded reactive programming—the kiel esterel processor
Multithreaded reactive programming—the kiel esterel processorMultithreaded reactive programming—the kiel esterel processor
Multithreaded reactive programming—the kiel esterel processor
 
Using FME to Transfer Park Asset Data From an Oracle Database to Trimble GPS ...
Using FME to Transfer Park Asset Data From an Oracle Database to Trimble GPS ...Using FME to Transfer Park Asset Data From an Oracle Database to Trimble GPS ...
Using FME to Transfer Park Asset Data From an Oracle Database to Trimble GPS ...
 
Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...
 
Software-Defined Simulations for Continuous Development of Cloud and Data Cen...
Software-Defined Simulations for Continuous Development of Cloud and Data Cen...Software-Defined Simulations for Continuous Development of Cloud and Data Cen...
Software-Defined Simulations for Continuous Development of Cloud and Data Cen...
 
STATA - Presenting Output
STATA - Presenting OutputSTATA - Presenting Output
STATA - Presenting Output
 
Energy efficient resource management for high-performance clusters
Energy efficient resource management for high-performance clustersEnergy efficient resource management for high-performance clusters
Energy efficient resource management for high-performance clusters
 
Occupancy and hvac energy
Occupancy and hvac energyOccupancy and hvac energy
Occupancy and hvac energy
 
Distributed in memory processing of all k nearest neighbor queries
Distributed in memory processing of all k nearest neighbor queriesDistributed in memory processing of all k nearest neighbor queries
Distributed in memory processing of all k nearest neighbor queries
 
Scilab Tutorials Research Ideas
Scilab Tutorials Research IdeasScilab Tutorials Research Ideas
Scilab Tutorials Research Ideas
 
IMPL Data Analysis
IMPL Data AnalysisIMPL Data Analysis
IMPL Data Analysis
 

Similaire à WEKA: A Useful Tool for Air Quality Forecasting

Similaire à WEKA: A Useful Tool for Air Quality Forecasting (20)

IJET-V3I1P27
IJET-V3I1P27IJET-V3I1P27
IJET-V3I1P27
 
Wek1
Wek1Wek1
Wek1
 
Artificial Neural Network Based Load Forecasting
Artificial Neural Network Based Load ForecastingArtificial Neural Network Based Load Forecasting
Artificial Neural Network Based Load Forecasting
 
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
 
PID2143641
PID2143641PID2143641
PID2143641
 
Clustering
ClusteringClustering
Clustering
 
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...
 
Attribute Reduction:An Implementation of Heuristic Algorithm using Apache Spark
Attribute Reduction:An Implementation of Heuristic Algorithm using Apache SparkAttribute Reduction:An Implementation of Heuristic Algorithm using Apache Spark
Attribute Reduction:An Implementation of Heuristic Algorithm using Apache Spark
 
Prediction of Wireless Sensor Network and Attack using Machine Learning Techn...
Prediction of Wireless Sensor Network and Attack using Machine Learning Techn...Prediction of Wireless Sensor Network and Attack using Machine Learning Techn...
Prediction of Wireless Sensor Network and Attack using Machine Learning Techn...
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
 
IRJET-Attribute Reduction using Apache Spark
IRJET-Attribute Reduction using Apache SparkIRJET-Attribute Reduction using Apache Spark
IRJET-Attribute Reduction using Apache Spark
 
A Wallace Tree Approach for Data Aggregation in Wireless Sensor Nodes
A Wallace Tree Approach for Data Aggregation in Wireless Sensor Nodes A Wallace Tree Approach for Data Aggregation in Wireless Sensor Nodes
A Wallace Tree Approach for Data Aggregation in Wireless Sensor Nodes
 
Itb weka nikhil
Itb weka nikhilItb weka nikhil
Itb weka nikhil
 
Implementation of area optimized low power multiplication and accumulation
Implementation of area optimized low power multiplication and accumulationImplementation of area optimized low power multiplication and accumulation
Implementation of area optimized low power multiplication and accumulation
 
Rain technology seminar
Rain technology seminar Rain technology seminar
Rain technology seminar
 
Resisting skew accumulation
Resisting skew accumulationResisting skew accumulation
Resisting skew accumulation
 
Data mining weka
Data mining wekaData mining weka
Data mining weka
 
Nag software For Finance
Nag software For FinanceNag software For Finance
Nag software For Finance
 
Edal an energy efficient, delay-aware, and lifetime-balancing data collection...
Edal an energy efficient, delay-aware, and lifetime-balancing data collection...Edal an energy efficient, delay-aware, and lifetime-balancing data collection...
Edal an energy efficient, delay-aware, and lifetime-balancing data collection...
 
Akselos solutions for oil & gas
Akselos solutions for oil & gasAkselos solutions for oil & gas
Akselos solutions for oil & gas
 

Plus de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
butest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
butest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
butest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
butest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
butest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
butest
 
Facebook
Facebook Facebook
Facebook
butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
butest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
butest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
butest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
butest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
butest
 

Plus de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

WEKA: A Useful Tool for Air Quality Forecasting

  • 1. Weka: A Useful Tool for Air Quality Forecasting William F. Ryan Department of Meteorology The Pennsylvania State University [email_address] 2007 National Air Quality Conference, Orlando
  • 2. Weka The weka, or woodhen, is a bird native to New Zealand. Weka is also the name of a suite of machine learning software tools, written in Java, and developed at the University of Wiakato in New Zealand. http://www.cs.waikato.ac.nz/ml/weka
  • 3.
  • 4.
  • 5.
  • 6. PM 2.5 Forecasting O 3 (left panel) is well-behaved statistically. Distribution is near normal with a strong association with maximum temperature. As a result, linear techniques are useful. PM 2.5 (right panel) is not well- behaved. Distribution is skewed, no strong association with any particular weather variable. Tools included in Weka, including ANN and classification and regression trees (CART), are capable of addressing non-linear problems posed by PM 2.5 .
  • 8. Input File Format Weka uses its own file format called: *.aarf All you need to do though is provide a *.csv file with variable names in the first line and Weka will convert
  • 9. aarf Format aarf format is simple anyway: ASCII file List of variable and type Then data follows, comma separated Missing data marked as “?”
  • 10. Data Editing Data can be easily edited within Weka itself
  • 11. Analyzing Data Variables can be easily scanned with basic statistics and histograms provided by Weka
  • 13. Sampling and Test Data Set Options
  • 14. Functions Available WEKA includes a number of different techniques that can be useful for forecast development. These include: Linear and logistic regression Perceptron models (Neural networks)
  • 15. Linear Regression Unfortunately, the “work horse” linear regression module in Weka is limited in usefulness: -No automatic stepwise function -Poor diagnostics Compare: SYSTAT, Minitab
  • 16. Classification and Regression Trees (CART) A variety of classification algorithms are available. Standard algorithm is J48, which is a souped up version of the last free version of CART (Version 4.5) Commercial version is currently 5.0.
  • 17.
  • 18. CART Diagnostics CART is notorious for using CPU resources but the WEKA version runs efficiently on my standard PC. Diagnostics are better for CART than linear regression. Example on left is of a 4 category PM 2.5 CART forecast.
  • 20. Artificial Neural Networks (ANN) “ Linear Regression by a mob” Produces forecast by taking the weighted sum of predictors and then layering the process.
  • 21. Artificial Neural Networks - Summary Known samples (historical data) are used to “train” the network. Input data (x i ) are assigned weights (w i ) and combined in the “hidden” layer – like a set of linear regressions. These sets are then combined in additional layers – like regressions of regressions. The sum of data and weights are transformed (“squashed”) to the range of the training data and error is measured. A supervised training algorithm uses output error to adjust network weights to minimize errors.
  • 22.
  • 23. Example: Neural Network Structure www.doc.ic.ac.uk/~sgc/teaching/v231/
  • 24. WEKA Neural Networks WEKA provides user control of training parameters: # of iterations or epochs (“training time”) Increment of weight adjustments in back propogation (“learning rate”) Controls on varying changes to increments (“momentum”)
  • 25.
  • 26.
  • 27.