Evolving Spiking Neural Network Model for PM 2.5 Hourly Concentration Prediction Based on Seasonal Differences: A Case Study on Data from Beijing and Shanghai

In recent years, the dangers that air pollutants pose to human health and the environment have received widespread attention. Although accurately predicting the air quality is essential to managing pollution and developing control policies, traditional forecasting models have not been able to simulate the seasonal and diurnal variation in air pollutant concentrations. Furthermore, inadequate processing of the available spatio-temporal data has precluded the capture of predictive historical patterns. Therefore, we have developed a staging evolving spiking neural network (eSNN) model named Staging-eSNN that first employs a time series clustering algorithm to distinguish the seasonal from the diurnal variation in the PM 2.5 concentration. We then predict the concentrations in Beijing and Shanghai 1, 3, 6, 12 and 24 hours in advance. Various evaluation indicators show that the Staging-eSNN model achieves higher performance than the support vector regression


INTRODUCTION
Since the reform and opening up, China has made steady progress in the process of urbanization. The urbanization rate has increased from 17.92% before the reform and opening up to 58.52% in 2017. Beijing and Shanghai, as two key cities to support China's sustained economic growth, are also experiencing frequent effects of smog while developing rapidly. Smog is usually characterized by high PM2.5 concentration. According to Liu et al. (2019), there is a statistically significant correlation between short-term exposure to PM10, PM2.5 and cardiovascular, respiratory mortality. In addition, PM2.5 has a negative impact on socioeconomic and climate change (Li et al., 2016).
To reduce smog and improve air quality, researchers have proposed many methods to analyze and predict the concentration of air pollutants such as PM2.5. Among the three commonly used methods, one is the deterministic method which is based on aerodynamic theory and physicochemical processes. It establishes a numerical model of air pollution concentration diffusion, further predicts the dynamic changes of atmospheric pollutant concentration through highspeed calculation and simulation. The commonly used models are the the Community Multiscale Air Quality (CMAQ) model (Chen et al., 2014) and the WRF-Chem model (Saide et al., 2011). However, in these models, various types of parameters need to be determined by experience, eSNN), CEeSNN and Staging-eSNN model. The conclusion part summarizes the work and points to some directions for further research.

Study Regions
Beijing is located in 115°42ʹ-117°24ʹE and 39°24ʹ-41°36ʹN which is in the northern part of the North China Plain and adjacent to Bohai Bay. It is China's economic decision center and political center. Beijing has a typical warm temperate semi-humid continental monsoon climate. Shanghai is located in 120°52ʹ-122°12ʹE, and 30°40ʹ-31°53ʹN, which is at the mouth of the Yangtze River. It is an important city in the Yangtze River Delta region of China. Due to the influence of the East Asian monsoon, Shanghai is a subtropical monsoon climate. Fig. 1(a) shows the geographic locations of the two cities.

Data Collection
This paper uses the hourly historical data of PM2.5, NO2, and SO2 air pollutants from January 1, 2017, to August 21, 2019 (for convenience of description, hereinafter referred to as "2017-2019"). These data were monitored by the national air quality monitoring stations in Beijing (36) and Shanghai (10), which can be downloaded through the China Urban Air Quality Real-Time Publishing Platform (http://106.37.208.233:20035/). The meteorological data is from NOAA's National Centers for Environmental Information (NCEI; https://www.ncdc.noaa.gov/), which provides half-hour data for Shanghai Pudong, Hongqiao airports and Beijing Capital International Airport (Figs. 1(b) and 1(c) show the locations of air quality monitoring stations and airports in Beijing and Shanghai respectively). This paper selects wind speed (WS; m s -1 ), wind direction (WD; °), and temperature (TEMP; °C) from 2017 to 2019. For the three meteorological variables, if one half-hour data is missing, the other half-hour data is filled. The missing data for the entire hour is considered to be missing. The hourly data is finally obtained by averaging the half-hour data. For missing values in air pollutant and meteorological hourly data, the same treatment method is adopted, that is, if there are more than 12 missing values of any variable on a certain day, the day will be eliminated, and the missing values in the remaining days will be filled with the mean of the day. The final dataset of Beijing is obtained after averaging the pollution data of all air quality monitoring stations and the meteorological data of all airports, which contains hourly historical data of PM2.5, NO2, SO2, TEMP, WS, and WD from 2017 to 2019 (Shanghai final dataset is obtained in the same way). Table 1 shows descriptive statistics of air pollutants and meteorological parameters in Beijing and Shanghai, and Fig. 2 shows their distribution. Fig. 3 shows the trends of PM2.5 concentration in Beijing and Shanghai at different time scales from 2017 to 2019. The shading indicates a 95% confidence interval in the mean. From Fig. 3(a), it can be found that the diurnal variation of PM2.5 concentration in the two cities shows a bimodal change and the peak shape of Beijing is weaker. The specific performance is that the PM2.5 concentration on different days in a week will peak at around 09:00 and 20:00, which is the characteristic of the morning and evening peak due to the traffic emission in two cites. Fig. 3(b) shows the same data averaged over each hour, which further confirms the bimodal characteristics of diurnal variations of PM2.5 concentration in the two cities. Another characteristic of PM2.5 is reflected in the monthly trend and the concentration value in different months. It can be seen from Fig. 3(c) that the downward and upward trends of PM2.5 concentration are shown before and after August respectively. From the difference of the concentration values, the U-shaped structure of PM2.5 concentration reflects the seasonal difference, that is, the concentration during the cold seasons (spring and winter) will be greater than the warm seasons (summer and autumn).

Diurnal and Seasonal Variations of PM2.5 Concentration
Based on the above analysis, we believe that there are seasonal differences in the diurnal variations of PM2.5 concentration. The characteristics is called the diurnal variation seasonality of PM2.5 concentration, which can be used to divide the data into different periods (seasonality) for PM2.5 prediction. However, due to the differences in urban locations, pollution sources, natural climate and other factors, the seasonality of PM2.5 divided by the four seasons only reflects the  difference in the climate. Therefore, a time series clustering technology is used to re-evaluate and re-define the seasonality in different cities, which to the extent possible control the impact of multiple factors (see the subsection "Clustering results"). Eventually, the diurnal variation of PM2.5 in the same period (called "hot", "warm" or "cold" period) only reflects one variation pattern, which is driven by a potentially stable structure.

THEORY AND MODEL IMPLEMENTATION
A unique characteristic of spiking neural networks is that they learn temporal or spatio-temporal patterns in their connectionist structures that can be used to predict future temporal events.
The first two parts of this section introduce SNN and the details of implementing eSNN in a known SNN architecture, NeuCube. The last part introduces our proposed Staging-eSNN model, which combines a time series clustering algorithm and eSNN models.

Spiking Neural Networks
In spiking neural networks, spatial and temporal information can be encoded as locations of synapses and neurons, and time of their spiking activity respectively (Kasabov, 2014;Kasabov, 2019). Temporal information is processed and transmitted by synaptic neurons to form memories. During the transmission process, synaptic connections are modified to more accurately reflect the correlation between different series (Tavanaei et al., 2019).
One class of SNN, the evolving SNN, can learn incrementally new data and evolve new output neurons to capture different patterns from input data in an adaptive way (Kasabov, 2019). And that is why eSNN are selected as the basic model in this paper.
In recently published CEeSNN model, Maciąg et al. (2019) clustered all training samples for the eSNN model training, which lead to the fact that although each eSNN model receives pollutant series with similar variations, the variation's causes of these series are different. For example, although the pollutant's concentration variations on a day in January and a day in August are similar, they are affected obviously by different seasonality, which are reflected in the difference of pollution sources, natural climate and other factors. This means that the data processed by the CEeSNN model may be driven by multiple different patterns, and therefore it can be improved.

Implementing eSNN Model in NeuCube
NeuCube is a modular development system that provides a framework to build an eSNN model for data mining and prediction, especially spatio-temporal data (Kasabov, 2014). For the sake of brevity, the model built in the NeuCube hereafter is called "eSNN." NeuCube contains four modules, as shown in Fig. 4.
Input module: This paper uses step-forward encoding to encode spatio-temporal data into binary temporal events (Kasabov, 2019). 1). 3D Cube module: In this paper, 1006 leaky integrate-and-fire (LIF) neurons are initialized in the cube and 6 input neurons (corresponding to 6 feature variables) among them are mapped into the corresponding spatial positions. Since there is no available neuron position information for feature variables in predicting PM2.5 problems, vectorization principle is used to measure the temporal similarity between variables to map variables with similar patterns to closer neurons in the cube (Tu et al., 2017). The neurons in the cube are connected by inhibitory or excitatory synapses and the initial connection weight is set according to the small world principle (Kasabov et al., 2016). After the initial cube is established, the encoded spatio-temporal data is input into the cube, and the neurons learn the spike sequences in an unsupervised mode through spike-timing-dependent plasticity (STDP) learning rule to capture spatio-temporal patterns in the neuronal connections. During the learning process, synaptic connection weights are modified to form the final SNN cube. In this module, the learned connectivity patterns in the SNN cube can be interpreted as deep knowledge representing deep spatio-temporal patterns in the data (Kasabov 2019). 2). Output module: The output module is a classifier/predictor trained in a supervised mode.
The learned connectivity patterns can be interpreted for rule extraction (Kasabov, 2019). 3). Parameter optimization module: The prediction performance of the eSNN model can be easily improved by changing a large number of parameters. Different encoding parameters will significantly change the information density of spike sequences. Different "mod" and "drift" parameters will lead to different classification accuracy in the eSNN (Kasabov et al., 2016). We use the grid search method to execute the parameter optimization module.

The Proposed Staging-eSNN Model
The input of the Staging-eSNN model is spatio-temporal data composed of 24-hour air pollutants and meteorological data in a day. The target value is PM2.5 concentration whose time point is 1, 3, 6, 12 or 24 hours after the input spatio-temporal sample. Fig. 5 shows the data structure. Compared with the sequence length of 12 in the CEeSNN model, the way of setting the sequence length to 24 provides a complete diurnal variation cycle of PM2.5 concentration for the Staging-eSNN model. Furthermore, shorter samples contain lower information density, making  it difficult for encoded binary events to activate neurons, thereby weakening the connectivity of neurons in the eSNN model. Fig. 6 shows the framework of the Staging-eSNN model. The training process is divided into the following two steps which reveals the technical details of the proposed Staging-eSNN model. 1). The time series clustering algorithm is applied first to distinguish the seasonal differences in diurnal variations of PM2.5 concentration (see the subsection "Clustering results"). Specifically, for the time series of PM2.5 diurnal trends in 12 months, dynamic time warping (DTW) is applied to calculate the distance between them. A smaller DTW distance means that the two series are more aligned and similar, and have similar seasonal variations. On this basis, the clustering algorithm is used to cluster the monthly diurnal variation series of PM2.5 concentration to distinguish the changing structure in different periods. During the process, partitioning around medoids (PAM) is used as an update function of the centroid to reduce the possibility of failure to converge (Petitjean et al., 2011). Eventually, in each cluster class, PM2.5 diurnal variations have the similar seasonality, which means that they share a common variation pattern. 2). A multi-model eSNN system is created that consists of several eSNN models, one for each seasonal period defined in Step 1. Based on the clustering results, samples from different periods are entered as input samples into the corresponding eSNN models implemented in NeuCube, and the prediction results of all eSNN models are finally integrated.

Clustering Results
To observe the characteristic of the diurnal variation seasonality, we calculate the diurnal variation concentration of PM2.5 in each month from 2017 to 2019 to obtain 12 time series in Beijing and Shanghai respectively. Then a time series clustering algorithm (PAM-DTW) is further used to classify the seasonality. The number of candidate clusters is 2, 3, and 4. In this paper, silhouette index (Sil), Calinski-Harabasz (CH), and Davies-Bouldin index (DB) are selected as the basis for evaluating the clustering effect. A good clustering result corresponds to low DB value, high CH and Sil value (Arbelaitz et al., 2013). Table 2 shows the evaluation results of clustering, indicating that Beijing's seasonality can be divided into 2 categories, and Shanghai can be divided into 3 categories.
Based on the clustering results of PM2.5 concentration in Beijing and Shanghai shown in Figs. 7(a) and 7(b), we divide the seasonality of PM2.5 concentration diurnal variations in Beijing into the cold and warm period, of which the cold period (January-April, November) is mainly concentrated in spring and winter, while the warm period (May-October, December) is concentrated in summer and autumn, the shape is roughly inverted-V-shaped. There are obvious hierarchical structures between the clusters and similar trends within the clusters in Shanghai. We refer to the three periods of PM2.5 concentration variations in Shanghai as the "cold," "warm,"    and "hot" period. The cold period (January-April, November-December) is mainly concentrated in spring and winter. The warm period (May-June) is the turn of spring and summer. The hot period (July-October) is concentrated in summer and autumn. In addition, the descriptive statistics of PM2.5 concentration in different periods are reported in Table 3.

Model Prediction Performance Evaluation
Based on the clustering results, we first randomly selected 150 and 100 samples from each period in Beijing and Shanghai, respectively. The samples from both cities contained 7200 hours of air pollutants and meteorological data. Then, the eSNN model was implemented by NeuCube software to predict PM2.5 concentration in the next 1, 3, 6, 12 and 24 hours. Two-fold crossvalidation was applied to model training and prediction. Finally, the predict results of each period are summarized in the two cities.

Definition of Predictive Performance Evaluation Indicators
The following indicators are used to evaluate the prediction results of models: the bias, root mean square error (RMSE), root mean square percentage error (RMSPE), index of agreement (IOA), Pearson correlation coefficient (r), fraction of predictions within a factor of two (FAC2). See Table 4 for definition and description. For the Shanghai region, the bias of the Staging-eSNN model is higher than 0 in the most predicted moments, which means that the predicted values are more likely to be overestimated than the target values. RMSE is lower than 15 µg m -3 in the hot period while it is above 18 µg m -3 in the cold period. However, due to the differences in samples from different periods, RMSE cannot effectively compare model performance in different periods. RMSPE measures the magnitude of the predicted deviation from the target mean, which does not show an obvious difference between the three periods. FAC2 reveals that more than 84% of the Staging-eSNN  (Hyndman and Koehler, 2006).  Measure the extent to which the predicted value deviates from the target value within a certain range. The ideal model is 1. model predictions are within a factor of two of the target PM2.5 concentration regardless of any predicted moment in any period, and there is no significant difference in each period. The Pearson correlation coefficients of the cold and hot periods are almost above 0.8 in the first three predicted moments, while r in the warm period is lower than 0.76, indicating that the Staging-eSNN model predicts the warm period even worse. IOA shows results similar to r.

The Staging-eSNN model predictive performance evaluation
For Beijing, except for 3-hour-ahead prediction in the warm period, the bias is almost less than 0 in both periods, indicating that the Staging-eSNN model underestimates the target value, which is different from the prediction result in Shanghai. RMSPE and FAC2 indicate that the prediction result of the warm period is better than the cold period, while IOA and r are more optimistic about the performance of the Staging-eSNN model in the cold period.
The scatter plot of Fig. 9 shows the relationship between the target value and the predicted value of the Staging-eSNN model at five predicted moments in different periods. The diagonal solid line indicates that the predicted value and the target value are equal, and the dotted line indicates the critical condition of FAC2. For the Shanghai area, more than 85% of the scattered points fall within the range of the dotted line and are distributed near the diagonal. For high target values, the scatter is more likely to deviate from the diagonal, which means that the Staging-eSNN model has a poor prediction of high PM2.5 concentration. The prediction results in the Beijing area verify this conclusion. From Fig. 9(a), it can be found that the r and FAC2 indicators in the Beijing area perform worse than the Shanghai area, which is the result of the Beijing area having more high target values.
It should be noted that six indicators used reflect different aspects of prediction capabilities (Table 4); the performance of the Staging-eSNN model in different periods depends on what we focus on. But in general, the prediction results of both regions show that the Staging-eSNN model performance will deteriorate as the predicted moment moves backward. The conclusion can be explained by the proportion of outliers. Based on the definition of outliers (Wang et al., 2017a), we compared the PM2.5 series in the sample to determine whether the target value is an outlier. The final results are listed in Table 5. It can be found that at any period in the two cities, as the predicted moment moves backward, the proportion of outliers in the target value will increase.

Predictive Performance Evaluation of Different Models
We compared the predictive performance of SVR, RF, Plain-eSNN, CEeSNN and Staging-eSNN models using the six indicators mentioned above. The Plain-eSNN model was implemented entirely based on the NeuCube. In order to prevent the impact of sample differences, the same samples were used for the three models. And for SVR and RFs, two-fold cross-validation was applied to model training and prediction using the sampled 1000-hour data. The evaluation results of the models for the prediction of the next 1, 3, 6, 12 and 24 hours are shown in Fig. 10.
When comparing the three SNN models, we found that no matter in Beijing or Shanghai, Fig. 9. The scatter plot of predicted and target values of the Staging-eSNN model at five predicted moments in different periods.  almost all the indicators show that the Staging-eSNN model has the best prediction performance. Specifically, taking Shanghai as an example, RMSE of the Staging-eSNN model is 9.14, 10.46, 10.85, 12.99 µg m -3 at the first four predicted moments, which is lower than CEeSNN's 12.49,14.04,13.81,16.49 µg m -3 and Plain-eSNN's 12. 38, 15.12, 16.29, 20.99 µg m -3 . The FAC2 and IOA are above 0.8 regardless of any predicted moment in the Staging-eSNN model, which are higher than the other two SNN models. In the performance of the r indicator, the correlation coefficient of the prediction result of the Plain-eSNN model is between 0.50-0.75, which is lower than r (0.60-0.74) in the CEeSNN model and is lower than r (0.67-0.87) in the Staging-eSNN model. For the prediction result of Beijing, except for the sign of the bias which is opposite to that of Shanghai in the most predicted moments, the other five evaluation indicators have similar trends to those of Shanghai. These facts show that the Staging-eSNN model improves the prediction accuracy of SNN models and has the best prediction performance in the same series of SNN models. On the other hand, the CEeSNN model also has a clear improvement in prediction accuracy compared to the Plain-SNN model.
When compared with the SVR and RF model, we found that Staging-eSNN model's prediction results are slightly worse than these two models in the first (Beijing) or the first two (Shanghai) predicted moments, while in the prediction at other moments, the Staging-SNN model almost all showed the best prediction performance compared to the other four models.

CONCLUSIONS
To better predict the PM2.5 concentration, we developed an eSNN model, Staging-eSNN, that applies a time series clustering algorithm (PAM-DTW) to differentiate between seasonal and diurnal variation in the levels of this pollutant. Based on our forecasts for the concentrations in Beijing and Shanghai 1, 3, 6, 12 and 24 hours in advance, we drew the following conclusions. 1). When using the bias, RMSE, RMSPE, FAC2, IOA, and r-value as evaluation indicators, the Staging-eSNN model exhibited better performance than classical NeuCube and the recently proposed CEeSNNs, which indicates that Staging-eSNN offers good predictive accuracy and stability compared to other eSNN models. 2). When predicting the PM2.5 concentration 1 or 3 hours in advance, the Staging-eSNN model performed slightly worse than the SVR and RF models. 3). Staging-eSNN incorporated seasonal factors to enhance the accuracy of its PM2.5 forecasting.
This capability merits further investigation and can be added to future models for predicting the concentrations of other pollutants. Although this study achieved several promising results, potential improvements include better mapping of the input variables to the 3D SNN architecture and further optimization of the algorithm's parameters. We also plan to use residual modeling to increase Staging-eSNN's predictive accuracy.