Articles online

Forecasting Surface O3 in Texas Urban Areas Using Random Forest and Generalized Additive Models

Category: Air Pollution Modeling

Accepted Manuscripts
DOI: 10.4209/aaqr.2018.12.0464
PDF | Supplemental material

Export Citation:  RIS | BibTeX

Rick Pernak , Matthew Alvarado, Chantelle Lonsdale, Marikate Mountain, Jennifer Hegarty, Thomas Nehrkorn

  • Atmospheric and Environmental Research, Lexington, MA 02421, USA


  • GAM quantitative and probabilistic models were built for six Texas urban areas.
  • A random forest machine learning algorithm was applied for classification.
  • The probabilistic models show no skill.
  • Quantitative and classification models exhibit some degree of success.
  • Predictive success increased in urban areas with less extreme ozone events.


We developed and evaluated three types of statistical forecast models (quantitative, probabilistic, and classification) for maximum daily 8-hour average of ozone based on meteorological and ozone monitoring data for six Texas urban areas from 2009-2015. The quantitative and probabilistic forecast models were Generalized Additive Models (GAMs), while the classification forecast used the random forest machine learning method. We found that for the quantitative forecast models, five of the eight predictors (day-of-week, day-of-year, water vapor density, wind speed, and previous day ozone measurement) were significant at the α = 0.001 level for all urban areas, while the other three had significance that varied with location. The quantitative forecast results for the 2016 ozone season agreed well with associated measurements (R2 of ~0.70) but tended to under-predict on the days with the highest ozone concentrations. In contrast, the probabilistic forecasting models showed little skill in determining the probability of ozone exceeding policy-relevant thresholds during the 2016 ozone season. The success rate for the random forest classification models typically exceeds 75% and likely would be higher if the training data sets contained more extreme events.


Ozone MDA8 Ozone prediction Generalized additive Models Random forest

Related Article