Air Pollution Forecasting Using Artificial and Wavelet Neural Networks with Meteorological Conditions

Air quality forecasting is a significant method of protecting public health because it provides early warning of harmful air pollutants. In this study, we used correlation analysis and artificial neural networks (ANNs; including wavelet ANNs [WANNs]) to identify the linear and nonlinear associations, respectively, between the air pollution index (API) and meteorological variables in Xi’an and Lanzhou. Evaluating twelve algorithms and nineteen network topologies for the ANN and WANN models, we discovered that the optimal input variables for an API forecasting model were the APIs from the 3 preceding days and sixteen selected meteorological factors. Additionally, the API could be accurately predicted based solely on the value recorded 3 days earlier. Based on the correlation coefficients between the air pollution index of the targeted day and the tested variables, the API displayed the closest relationship with the API 1 day earlier as well as stronger correlations with the average temperature, average water vapor pressure, minimum temperature, maximum temperature, API 2 days earlier, and API 3 days earlier. When Bayesian regularization was applied as a training algorithm, the WANN and ANN models accurately reproduced the APIs in both Xi’an and Lanzhou, although the WANN model (R = 0.8846 for Xi’an and R = 0.8906 for Lanzhou) performed better than the ANN (R = 0.8037 for Xi’an and R = 0.7742 for Lanzhou) during the forecasting stage. These results demonstrate that WANNs are effective in short-term API forecasting because they can recognize historic patterns and thereby identify nonlinear relationships between the input and output variables. Thus, our study may provide a theoretical basis for environmental management policies.


INTRODUCTION
Air pollution is a theme of high importance, and global problems have demonstrated its damaging impacts on human physical health and ecosystems (Nguyen et al., 2015). Meanwhile, it also has a detrimental effect on visibility, climate, and sustainable development (Lelieveld et al., 2015). Poor air quality is one of the five major health risks in the world, for example, long-term exposure to polluted air is related to respiratory infections, heart attack, stroke and lung cancer (Kessler, 2014;Watson, 2014;Lelieveld and Pöschl, 2017). And air pollution has adverse effects on people's life span, and social communication willingness (Huang et al., 2018).
Due to the large-scale development of industrialization and urbanization, China has been suffering from acute air pollution for many years (Liu and Diamond, 2005). The number of haze days in a year has also risen evidently in China, which has seriously hindered the sustainable development of society and caused widespread concern from all walks of life (Jiang and Bai, 2018). In 2013, China suffered extremely serious haze pollution, influencing 800 million people, and daily average PM 2.5 concentrations at a site in Xi'an were more than twice those of Beijing, Shanghai, and Guangzhou (Huang et al., 2014).
Air pollution forecasting also is crucial for public health interventions and air pollution control policymaking. However, air quality forecasting is quite complex (Li et al., 2017a;Park et al., 2018). Apart from the rapid economic growth, air pollution is affected by unfavorable meteorological conditions (Al-Saadi et al., 2005;Gong and Ordieres-Meré, 2016;Li et al., 2017b).
Artificial neural network (ANN) has been performed to predict ground motion (Wiszniowski, 2016), and groundwater depth (He et al., 2014). In particular, ANN has been shown to be effective for more complex tasks. And ANN models utilize a sophisticated technique that has been successfully applied to forecast air pollution (Li et al., 2017b). However, in some cases, the data is too complex for the modeling tools to be processed. Hence it is necessary to preprocess the input system information (Simons et al., 1995). Traditionally, this has been done by using principal component analysis (Xia et al., 2015), or by using Fourier transform (Artursson et al., 2002). Whatever technique is used, it must address two objectives: to save a number of relevant information and to reduce the complexity of the input signal (Zhao et al., 2018). Here, the wavelet transformation was employed to extract the important information from the past air pollution index (API) and meteorological factors. The use of wavelet artificial neural network (WANN) as the predictive model is explained by emphasizing the following aspects: (a) the effects of diverse network parameters and (b) investigation of the capability of WANN model for forecasting next-day air quality (Bai et al., 2016), which offers important guidance to public.

Study Area and Data Introduction
The two study stations are Xi'an and Lanzhou, both located in China (Fig. 1). API data were gathered at the Environmental Protection Agency, and meteorological data at the Meteorological Bureau. Fig. 2 shows API from January 2010 to December 2012. API has a periodic law at both sites, and API is larger in the winter and spring, while it is smaller in autumn and summer. And PM 10 is the primary air pollutant in both cities, therefore, API may represent PM 10 .
The data series were divided into a training group (January 2010-December 2011), a calibration group (January-June 2012) and a testing group (July-December 2012). The generalization ability of WANN and ANN is tested by crosscorrection.   Fig. 3 provides an architecture of the artificial neural network employed in the study with one node (API) in the output layer and nineteen nodes in the input layer. The input layer consists of nineteen nodes; namely, precipitation (P), extreme wind speed (EWS), extreme wind speed direction (EWSD), average atmospheric pressure (AAP), average wind speed (AWS), average temperature (AT), average water vapor pressure (AWVP), average relative humidity (ARH), sunshine duration (SD), minimum atmospheric pressure (MAP), minimum temperature (MINT), maximum atmospheric pressure (MAP), maximum temperature (MAXT), maximum wind speed (MWS), maximum wind speed direction (MWSD), minimum relative humidity (MRH), air pollution index [API(t)], API(t -1), and API(t -2).

Artificial Neural Network
Backpropagation is a general approach to train ANNs to minimize the goal function (global error function) (Nunnari et al., 2004). The global error function (F) is computed by utilizing Formula (1): where F is the global error function, B i is the expected output, and D i is the output of network prediction. The gradient descent technique is employed to adjust the weights of F minimization by using Formula (2) below: where ΔC ji = weight; and η = learning rate.

Wavelet Transformation
The Mallat pyramidal algorithm is used to calculate the discrete wavelet transform coefficient (DWT) (Mallat, 1989a). So, the DWT was employed to analyze the API and meteorological data. The DWT also comprises a multiresolution decomposition scheme for input signals (Mallat, 1989b). The DWT of a data sequence f(q) is defined as Formula (3): where ψ(q) indicates the base wavelet of active length q; u indicates the scale or dilation factor; v indicates the translation in time. For a discrete signal f(q), f(q) ∈ S 2 (R), the DWT is defined by multi-resolution decomposition, which can be calculated by the Mallat decomposition algorithm and Mallat pyramidal reconstruction algorithm (Li et al., 1997): where t and r are the impulse responses to high-pass filter T and low-pass filter R, respectively; i m K and i m Y are the wavelet series and dimension of the 2 -i dimension, respectively; and S is the maximum probable dimension of the discrete data f [m]. The Mallat pyramidal reconstruction formula is: where r ̅ and t̅ are the impulse responses to R * and T * , respectively, that is, * T RR  , * T TT  . The major aim of using DWT is to decrease the complicacy of input signal and the number of related information between decomposition compositions (detailed CD1, CD2 and approximate CA2). DWT can be employed to approximate components to obtain low-dimension compositions and get that of multi-dimension analysis. The correlation coefficients of CD1, CA2 and CD2 are less than 0.0037. It turned out to be the best way to achieve our goal.

Wavelet Artificial Neural Network
We use WANN model architecture ( Fig. 4) to decompose the original time series (API) into three sets of data: detailed CD2 and CD1 components and approximate CA2. Afterwards, these data are used by the ANN as the input elements. In Fig. 4, A n is input variables, API n + t is the next-day API.

Evaluation Criteria
Four performance criteria are employed to assess the validity of WANNs and ANNs adopted in the research. These are root mean square error (RMSE), mean error (ME), percentage error of peak (EO p ), and correlation coefficient (R), which are as follows (He et al., 2014): where B j = measured API for the j th data, D j = fitting API for the j th data, B̅ = mean of measured API, D ̅ = mean of fitting API, L = number of measures, D p expresses the peak of the fitting API, B P is the peak of the measured API and EO p is the relative error of peak API.

Correlation Analysis
Correlation analysis can determine the linear associations between air pollution and meteorological variables. The main disadvantage of using correlation analysis is that it could only detect the linear relationship between two variables. As a result, correlation analysis cannot catch any possible nonlinear relationship that may exist between the outputs and the inputs and may result in missing important output-related inputs in a nonlinear fashion.
The determination of input parameters is one of the most important steps in the design of WANN models. The selection of correlation functions calculated for the variables is shown in Table 1, which is 95% significant. The performance of every variable was evaluated by computing its correlation coefficient (R) with API(t + 1). The analysis showed that API(t) was strongly related to API(t + 1) at the two stations. Furthermore, the performances of average temperature, average water vapor pressure, minimum temperature, maximum temperature, API(t -1), and API(t -2) were better than other variables at two stations. That is, the meteorological parameters with the highest correlation to API(t + 1) comprise the above variables. We identified seven significant variables. Therefore, different combinations of variables were selected as inputs for modeling daily API in Table 2. The selection of the variables was based both on comprehensive correlation analysis and on existing knowledge. The horizontal wind is the basic parameter that controls the horizontal dispersion and transport of air pollutants. The effects of solar radiation on the reaction rate constants and, consequently, on the destruction and formation of photochemical species, are complicated. The removal of air pollutants from the atmosphere by precipitation is a very effective process that often leads to low air pollution levels. Many pollutants are highly persistent, and it is usually accepted that the possibility of occurrence of air pollution events increase if the past day's air pollution was higher than normal.

Determination of Network Topologies and Training Algorithms
It should be emphasized that finding the most appropriate model structure may be one of the main tasks of the model developer. That is probably because there are usually a lot of candidate variables, and the priority is unknown. Moreover, the relationship between inputs and air quality is nonlinear and highly location dependent.  12:6:1 WANNAPI6 AWVP(t), MINT(t) 6:6:1 WANNAPI7 API(t) 3:6:1 WANNAPI8 API(t), API(t -1), API(t -2) 9:6:1 The different details and dimensions of the input variables are obtained by two-stage decomposition of the wavelet transform. After two-stage decomposition and reconstruction, the variables are divided into three portions. The approximate composition CA2 indicates the general trend of the original variables, while the detailed CD2 reflects the periodic values of the original variables, and the detailed CD1 reflects the inhomogeneity and complicacy of the original variables. In other words, the detailed CD1 determines the complexity of empirical model predictions.
The variation characteristics of sequences are the critical elements affecting the selection of wavelets (Sang, 2013). In order to decompose the input variables optimally, the mother wavelet is selected, and the similarity between the CD1, CD2 and CA2 is considered. The minimum R can best satisfy our purpose of analyzing the variation characteristics of different components of the input variables. The quantitative calculation shows that the components are independent of each other. Twenty-one kinds of wavelet functions are selected for DWT. Table 3 shows that db4 is the best wavelet function in the study because it has the smallest R. Here we take the average temperature as an example, and other input variables have similar results.
Trial and error is applied to acquire the optimal model parameters. Fig. 5 shows that network topologies (19-3-1 for ANN and 57-6-1 for WANN) for Xi'an are better than others by trial and error. The amount of nodes of the hidden layer rises from 1 to 19 in the models. The following observations can be made from Fig. 5 as raising the amount of nodes of the hidden layer: The RMSE values decreased slightly, but after 19-3-1 for ANN and 57-6-1 for WANN, RMSE values increase and fluctuate. Therefore, the best topologies of the patterns for Xi'an are separately identified as 19-3-1 for ANN and 57-6-1 for WANN. Similarly, the best system topologies for Lanzhou are separately 19-3-1 for ANN and 57-6-1 for WANN.  5 shows the performances of the improved training algorithms, revealing that the trainbr algorithm has the best performance in predicting API(t + 1) in Xi'an. Trainbr automatically sets optimum values for the parameters of the objective function. Table 4 shows transfer function (tansig-purelin) in Xi'an is better than others during training, cross-validation and testing periods. Similarly, transfer function (tansig-purelin) in Lanzhou is also better than others.

Comparative Analysis of the Models
All results of trainings for ANN and WANN during the training period are shown in Table 5. The RMSE for the ANN ANNAPI1 and WANNAPI1 in Xi'an are 22.7233 and 11.5683, respectively; the R are 0.642 and 0.9783, respectively; the ME are -0.1114 and 0.7682, respectively; and the EO p are -0.5337% and -0.016%. The WANNs are superior to the ANN during the training period. Meanwhile, similar results in Lanzhou also can be found in the RMSE, ME, R, EO p .
The values of the evaluative criteria for the nine models at the two stations during the forecast period are shown in Table 6. Table 6 summarizes the results of the tests with every network configuration. The ANNs and WANNs have an agile mathematic structure and can map highly nonlinear relations. Most WANN models have good performance in Xi'an and Lanzhou. However, the performances of WANN models in Xi'an were obviously superior to those of the WANN models in Lanzhou. The performance of WANNAPI1 is better than that of ANNAPI1 in Xi'an and Lanzhou. The WANNAPI1 model had even more obvious advantages for Lanzhou, where it was found to provide a more accurate API forecast than the ANNAPI1 model. The EO p values in Table 6 show the models' performances in simulating the extreme events. In Lanzhou and Xi'an, during the forecast period, the RMSE value of the WANNAPI1 model was the smallest of all the WANN models; the R-value of WANNAPI1 was the largest, and EO p was the smallest. The lower RMSE values indicate that the WANNAPI1 model produced fewer differences and discrepancies between the forecasted API(t + 1) and observed API(t + 1). Fig. 6 shows observed API(t + 1) versus predicted API(t + 1) in Xi'an and Lanzhou. ANN and WANN models were both able to replicate average of API, however, limited in capturing minimal or maximal peaks.
Figs. 7 and 8 indicates that the ANN and WANN models predicted API at an acceptable accuracy level in Xi'an and Lanzhou. However, the performances of WANN models were obviously superior to those of the ANN models. The WANN models yielded a good agreement between the observed API(t + 1) and predicted API(t + 1), but it is obvious that the WANNAPI1 model was better than the WANNAPI2. It is also obvious that the WANNAPI8 model with 1-3-day lag API was better than the WANNAPI7 with 1-day lag API; that is to say, including the three previous days' API as parameters in input data set gives more precise results. However, it is necessary to point out that the WANN methods have limitations inherent to their structures.
The agreement between the observed API(t + 1) and the predicted API(t + 1) is also very good at both stations using WANNAPI4 model. The main meteorological conditions of air pollution are average temperature (t), average water vapor pressure (t), minimum temperature (t) and maximum temperature (t) in Xi'an and Lanzhou. The possible reason is that correlation coefficient between them and air pollution is larger.

Comparison with Other Models
Many studies have been developed to identify and understand the relationships between air quality and Fig. 6. Boxplots show the variation of observed API(t + 1) (Observed-API) and predicted API(t + 1) (such as P-ANNAPI1) in Xi'an and Lanzhou. Fig. 7. Comparison between the observed API and the predicted API in Xi'an. meteorological conditions. ANN, which has the abilities of self-adaption and nonlinear mapping, has been certified in its advantage and widespread application in forecasting air quality. The estimated PM 2.5 result in Beijing is in a better RMSE (= 24.06 mg m -3 ) using ANN than that obtained through multi-variate statistical analysis method (RMSE = 26.69 mg m -3 ) (Ni et al., 2017). In the linear regression analysis, the range of R 2 at six subway stations were 0.18-0.63. Nevertheless, the neural network model with present time variables has high R 2 of 0.54-0.81 (Park et al., 2018). The ANNs' results provided absolute errors in predicting PM 10 , SO 2 , and CO are 35%, 43%, 28%, respectively (Kurt et al., 2008). But for the ANN model of API forecasting, the correlation coefficients are 0.6993, 0.6056, 0.6300 for SO 2 , PM 10 , NO 2 (Jiang et al., 2004).
WANN has better performance for forecasting SO 2 , PM 10 , and NO 2 in Chongqing than the ANN, such as, RMSE is lower at 4.447 mg m -3 , 8.233 mg m -3 , and 2.785 mg m -3 , respectively (Bai et al., 2016). The best forecast of PM 2.5 in Dingling is completed for next day utilizing the hybrid model combining ANN, wavelet transformation, and air mass trajectory, and RMSE is 15.65 mg m -3 . It is also noticed that wavelet transform plays a role improving the PM 2.5 forecasting accuracy (Feng et al., 2015).
The simulation and forecast were proved by utilizing the data of PM 2.5 in Wuhan based on support vector machine. The results showed that the way can obtain precise outcomes (He et al., 2018). The prediction results of neural networks are better than that of linear model, and the maximum prediction error 21 hours ahead is 32% (Perez and Menares, 2018). The long short-term memory (LSTM) can effectively forecast air pollution and achieve the best results (Karimian et al., 2019). The prediction of the 2016 ozone season using generalized additive models are in good agreement with the relevant measurement results (R 2 = 0.70) (Pernak et al., 2019). The average consistency index between PM 2.5 prediction and observation for the four seasons in the Yangtze Delta is between 74% and 77%, using machine learning and WRF (Jia et al., 2019). The best estimation of PM 2.5 (R 2 = 0.84) is obtained by using artificial neural network (Bai et al., 2020). Compared with WRF, the correlation coefficients of machine learning model are higher by 50-100%, which can provide better PM 2.5 prediction (Ma et al., 2020).
These studies have improved the artificial neural network and achieved better predicting results, but it still need to enhance the predicting accuracy. Therefore, learning and characteristic collection of historical data plays an important role in guaranteeing predicting accuracy.
We have reported the prediction results at both stations. In Xi'an, the WANN prediction is successful, simulating well the peaks, and the location and shape of the main peaks are predicted correctly by the model but slightly overestimating the general background API because the main purpose is to simulate the peaks correctly. In Lanzhou, the agreement was similar; the general characteristics of observed API was also successfully reproduced by the WANN. The agreement obtained in Lanzhou is good with the model accurately forecasting the location and the magnitude of the main peaks.

CONCLUSIONS
This study presents an optimum system for nonlinear modeling of the daily API using ANNs and WANNs. The input variables (meteorological elements and APIs) for the models were defined via correlation analysis, and discrete wavelet transform was employed to decompose the time series of the meteorological conditions into different dimensions, whereby a unique mixed aspect was decomposed into multiple unique aspects. This data was then incorporated into the models to simulate the next day's API. Our results indicated that both the ANN and the WANN models predicted the daily API with acceptable accuracy, but the performance of the latter, which integrated the nonlinear mapping of the ANN as well as the multi-scale analysis of the DWT, was obviously superior.
For future WANNs, we will focus on four aspects. Firstly, the models will address meteorological elements and forecast the API in other locations. Secondly, additional elements will be considered, for instance, the longitude, latitude, land use, topography (simulated with the digital elevation model [DEM]), and population density. Sensitivity analysis will also be conducted in order to select parameters that are more closely related to the API, thus improving the predictive accuracy. Thirdly, the models will be used to predict other complex time series that possess nonlinear and unstable characteristics, such as those for the air quality index (AQI), fine particulate matter (PM 2.5 ), PM 10 , nitrogen dioxide (NO 2 ), sulfur dioxide (SO 2 ), carbon monoxide (CO), and ozone (O 3 ). Finally, a deep reinforcement learning algorithm based on multi-agent cooperation will be employed for air pollution forecasting, thus providing further insights into the multiscale spatiotemporal prediction of pollutant concentrations.