Lian-Hua Zhang This email address is being protected from spambots. You need JavaScript enabled to view it.1,2,3, Ze-Hong Deng1, Wen-Bo Wang2,3 

1 School of Literature, Law and Economics, Wuhan University of Science and Technology, Wuhan 430065, China
2 Hubei Province Key Laboratory of Systems Science in Metallurgical Process (Wuhan University of Science and Technology), Wuhan 430081, China
3 College of Science, Wuhan University of Science and Technology, Wuhan 430065, China


Received: June 22, 2020
Revised: January 28, 2021
Accepted: January 29, 2021

 Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.


Download Citation: ||https://doi.org/10.4209/aaqr.200144  

  • Download: PDF


Cite this article:

Zhang, L.H., Deng, Z.H., Wang, W.B. (2021). PM2.5 Concentration Prediction Based on Markov Blanke Feature Selection and Hybrid Kernel Support Vector Regression Optimized by Particle Swarm Optimization. Aerosol Air Qual. Res. 21, 200144. https://doi.org/10.4209/aaqr.200144


HIGHLIGHTS

  • The approximate Markov blanket based nMRMR algorithm is used.
  • A hybrid kernel (HK) was created.
  • A support vector regression model (nMRMR-PSO-HK-SVR) was established and applied.
 

ABSTRACT


This study employed air quality and meteorological data as research materials and extracted the optimal feature subset by using the approximate Markov blanket-based normal maximum relevance minimum redundancy (nMRMR) algorithm to serve as the input data of the prediction model. In addition, a hybrid kernel (HK) was created to improve upon the traditional support vector regression (SVR) model. Particle swarm optimization (PSO) was used to calculate the optimal parameters of hybrid kernel (HK) SVR, which were then used to establish the nMRMR-PSO-HK-SVR model for PM2.5 concentration prediction. The 2016–2019 year air quality and weather data of Wuhan and Tianjin were employed to test the proposed method. The experimental results show that the mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE) and Theil’s inequality coefficient (TIC) of nMRMR-PSO-HK-SVR model are lower than those of SVR, PSO-SVR, nMRMR-SVR and PSO-HK-SVR model. But also, the proposed model could more precisely track moments of sudden PM2.5 concentration change. Thus, the nMRMR-PSO-HK-SVR model has more satisfactory generalizability and can predict PM2.5 concentration more precisely.


Keywords: PM2.5, Maximum relevance minimum redundancy (MRMR), Hybrid kernel, Support vector regression, Prediction model


1 INTRODUCTION


Rapid development of economies worldwide has caused increasingly severe air pollution. A major pollutant, PM2.5 remains in the air for a long time and can be transported over great distances because of its small size. These results in lowered visibility and severe deterioration of air quality and the atmospheric environment because of the copious toxic substances PM2.5 carries, thus posing a health risk (Gu et al., 2006). Estimation of the PM2.5 concentration has critical value and significance to early warnings of severe pollution events. To date, PM2.5 concentration has generally been estimated using validation or statistical models (Patricio et al., 2020). Validation models are primarily constructed using historical weather information and chemical initial and boundary conditions to infer the complex process of pollutant formulation (Wang et al., 2017a). Therefore, the estimation precision of these models is dependent on the accuracy of complex historical records, and accurate historical records are usually difficult to obtain. Because of the development in regression learning, artificial neural network and support vector regression (SVR) models have been successfully applied to estimate PM2.5 concentration. Perez and Gramsch (2016) verified that when historical pollutant concentration data and weather data are available, a feed forward neural network model can effectively predict the hourly concentration of PM2.5. Sun and Sun (2017) combined principal component analysis (PCA) with least-squares SVR to predict the daily PM2.5 concentration; their experimental results revealed that the prediction precision was high. Statistical models describe the relationship between PM2.5 concentration and the factors influencing it; therefore, the models have high prediction precision (Yin et al., 2018). Although artificial neural network models can be employed in PM2.5 concentration estimation, problems tend to occur, such as local optimal solutions being obtained and over fitting (Niu et al., 2016). SVR models are based on statistical learning theories (Zhang, 2019); structural minimization is adopted as a principle, and the problem of over fitting does not exist. Hence, such models exhibit favorable generalizability (Yan et al., 2020). As a major type of air pollutant, PM2.5 has complex origins and forms through a complicated process under the influence of numerous factors (Ni et al., 2017; Song et al., 2018). It exhibits high complexity and nonlinearity (Wang et al., 2017b). Most studies on PM2.5 concentration estimation have been based on PM2.5 time series (Qin et al., 2016; Wang et al., 2020), which strongly affect the prediction precision. In the present study, six air quality indices (PM2.5, PM10, SO2, NO2, CO, and O3 concentrations) and five weather factors (temperature, relative humidity, precipitation, wind speed, and air pressure) were employed to predict the PM2.5 concentration on the next day. However, redundant data had to be eliminated, and this required dimension reduction of the aforementioned 11 factors. Some scholars (Wang et al., 2017c; Jayakumar and Sangeetha, 2020) have used correlation coefficients or PCA for feature selection. Qiao et al. (2017) combined with principal component analysis (PCA) and fuzzy neural network to predict the concentration of PM2.5, and obtained better prediction results. Singh and Gupta (2012) used stepwise linear regression method to select the original features in the prediction of urban air quality, and used linear and nonlinear prediction models for experimental comparison. Kim et al. (2010) used the partial least squares (PLS) method to select the variables that have a greater impact on the output to predict PM2.5 and PM10 in the subway station, and compared with the prediction results obtained by taking all the measured variables as inputs, which proved the necessity of selecting characteristic variables.

The selection of PM2.5 characteristic variables by the above common methods only reflects the linear relationship between variables, and does not consider the nonlinear relationship between variables. However, correlation coefficients are appropriate only when all features are independent, PCA can only process linear questions, and the 11 features considered in this study exhibit strong correlations and nonlinearity (Wang et al., 2015). Therefore, this paper proposes a feature selection method based on the approximate Markov blanket-based normal maximum relevance minimum redundancy (nMRMR) algorithm (Zhang and Wang, 2018; Cai et al., 2019), aiming at the shortcomings of the commonly used PM2.5 feature variable selection method. The maximum correlation and minimum redundancy mutual information based on the approximate Markov blanket is used to calculate and sort out some features with small correlation, and the optimal feature collection is selected.

Support Vector Regression (SVR) is one of the most robust and accurate methods in data mining algorithms (Vapnik, 1998), which mainly includes SVM classification and support vector regression. Sun et al. (2016) presented a hybrid model based on PCA and least square support vector regression (LSSVR) optimized by cuckoo search algorithm to predict PM2.5 concentrations. The prediction precision of an SVR model is dependent on the type of kernel functions (Cura, 2020). The radial basis function kernel (RBF kernel) has a high degree of local fitting, whereas the polynomial kernel (Poly) has strong generalizability (Dhamecha et al., 2019). Poly and the RBF kernel can be linearly integrated to form a hybrid kernel (HK), which retains the advantages of the two original functions, enhances generalizability in the SVR model, and has been applied satisfactorily in numerous fields (Fei, 2016; Zhong and Carr, 2016).

In the Hybrid kernel SVR algorithm, the choice of SVR parameters (penalty factor and kernel function parameters) and kernel function combination parameters has an important impact on the accuracy of prediction, but at present, there is no optimal value method for kernel parameter, penalty factor and kernel function combination parameters. In traditional SVR, parameter selection is obtained by repeated experiments, which has great randomness of artificial selection. However, it takes a lot of time to select parameters by cross validation, although it overcomes human randomness to some extent. Particle swarm optimization (PSO), as an optimization method developed in recent years, has been widely used in function optimization, pattern recognition and other fields due to its easy implementation and deep intelligent background. In order to achieve the optimal selection of parameters in the hybrid kernel SVR model, this paper combines PSO with SVR, and uses the global search ability of PSO to search the parameters in HK-SVR

In summary, the present study employed historical air quality and weather data. The nMRMR algorithm was used to first select features, and the optimal feature subset was chosen and input to the SVR model. A particle swarm optimization (PSO)-based HK was constructed to improve the conventional SVR model. Eventually, an nMRMR-PSO-HK-SVR model was established for estimating PM2.5 concentration.

 
2 MATERIALS AND METHODS


 
2.1 Research Site

Fig. 1 briefly shows the geographical location of Wuhan and Tianjin. Wuhan is located in the eastern part of the Jianghan Plain in the middle and lower reaches of the Yangtze River (39°N, 114°E). Wuhan is the largest and most populous city in Hubei Province, with a total area of 8494.41 km2 and a population of 11.08 million. Wuhan's terrain is mainly hilly, which is not conducive to the spread of PM2.5 Although Wuhan is currently making efforts to improve its environment, the air quality does not meet the national secondary standard, especially in winter, when the concentration of PM2.5rises sharply. In recent years, PM2.5 pollution has become a thorny issue in Wuhan, and it is of great significance to study the pollution control of Wuhan, a mega-city in a period of economic growth.

Fig. 1. Geographical location of Wuhan City and Tianjin City.
Fig. 1. Geographical location of Wuhan City and Tianjin City.

Tianjin is the largest coastal city in northern China, located on the west coast of Bohai Bay (39°N, 117°E), which has become China's new growth pole and center of advanced industrial and financial activities. Tianjin has a warm temperate sub-humid monsoon climate, with a significant monsoon and four distinct seasons. In the past decades, with the rapid urbanization, Tianjin has become a mega-city with a population of over 10 million. At the same time, due to the development of industrialization and the increase of motor vehicles, hazy weather is a common occurrence. In order to protect public health and atmospheric environment, there is an urgent need to simulate and predict the concentration of PM2.5 in Tianjin.

 
2.2 Research Data

In this paper, data were collected from January 1, 2016 to June 30, 2019, including air quality data and meteorological data. The air quality data from https://www.aqistudy.cn/historydata/, including PM2.5 and PM10, sulfur dioxide (SO2), nitrogen dioxide (NO2), carbon monoxide (CO), and the daily average density of ozone (O3) eight hours.The "8 hours of ozone" is a sliding average value, which is calculated based on the average concentration of the 8 consecutive hours with the highest ozone level between 8:00 and 24:00. The meteorological data were collected from http://www.wunderground.com/history/,including wind speed, precipitation, atmospheric pressure, relative humidity and temperature. A total of 1065 data from January 1, 2016 to November 30, 2018 in Wuhan and Tianjin were selected as the training sample set, and 31 data from December 1, 2018 to December 31, 2018 in the two regions were selected as the test sample set for the short-term prediction. A total of 181 data from January 1, 2019 to June 30, 2019 in the Wuhan region were selected as the test sample set for the long-term forecasts.

For a small number of missing data (1 in 2016, 3 in 2017, and 3 in 2018), we fill in the data with the mean value of the adjacent two days. For the prediction training, we first normalize the data. The normalization formula is as follows:

In the formula, y denotes normalized data; x denotes pre-normalized data; xmin denotes data minimum; xmax denotes data maximum. In Section 1.4, four evaluation metrics are given in this paper to evaluate the prediction accuracy of the prediction model for PM2.5, and all algorithmic models and experiments in this paper are implemented in Matlab 2018b.

 
2.3 Description of the Problem

The goal of PM2.5 concentration prediction is to predict PM2.5 concentration for a fixed period of time in the future (for example, the next 24, 48 or 72 hours) using the observed values for a fixed period of time in the past (for example, 24 hours). The aim of this manuscript is to make a short-term prediction of PM2.5 concentrations for the next 24 h (next day) using past observations.

For a given moment t, assume that the observed data in the past L (L × 24 h) day are

 

Each observation Xt-i(0 ≤ i ≤ L) in the sequence is a d dimensional vector consisting of pollutant concentration and some meteorological element observations. After the prediction model is constructed, PM2.5 concentration value in the next {t + 1, t + 2, …, t + K} day can be predicted by the data set ST as input.

In this manuscript, we use the maximum correlation minimum redundancy algorithm to extract the optimal features from the observed data ST, and use particle swarm optimization kernel support vector regression as a prediction model to make a short-term prediction of PM2.5 concentration in the next 24 hours (the next day).

 
2.4 Research Methods


2.4.1 Approximate Markov blanket-based nMRMR algorithm

Because the factors influencing PM2.5 concentration are strongly correlated, data redundancy and specifically nonlinear relationships must be considered. Therefore, the approximate Markov blanket-based nMRMR algorithm was employed in this study for feature selection. The core of this algorithm is maximization of the correlation between a feature and the target feature at the same time as minimization of the correlations among the other features. The correlation at this time was expressed through mutual information (Ju and He, 2018). The mutual information between variables x and y is defined as

 

where, p(x) and p(y) are respectively the probability density function of x and y, and p(x, y) is the joint probability density function of x and y. The measurement indicators of maximum relevance and minimum redundancy are defined respectively as

  

where, S is the feature subset; n is the number of features; I(xip) is the mutual information between the 11 features (i.e., six air quality features and five weather features) and the PM2.5 data for the next day; p is the target feature and I(xixj) is the mutual information among the 11 features.

The criteria for feature selection are excellent classification performance and the smallest possible number of dimensions. These entail the maximum relevance within the feature set and categories as well as the minimum redundancy among the features. After comprehensive consideration of the aforementioned two measurement indicators, the following criterion for maximum relevance and minimum redundancy is obtained:

 

Markov blanket (Yu et al., 2019)Let F be the feature set and fi be a feature within F. For a feature subset S, if S ⊂ F, then fi ∈ S and the Markov blanket condition of fi is fi ⊥ {F – S – {fi},C}|S, where ⊥ indicates independence and |S means condition on S. In a given Sfi is independent of F – S – {fi} and C, suggesting when exists, fi makes no contribution to the label and should therefore be deleted. In addition, this shows that the smallest that satisfies the aforementioned conditions is the Markov blanket of feature fi.

According to the definition of the Markov blanket, the criteria for determining the approximate Markov blanketcan be obtained (Hua et al., 2020); that is, for features fi and fi (i ≠ j), the Markov blanket criteria for feature fi to be feature fi are

 

The specific steps of the approximate Markov blanket-based nMRMR algorithm used in this study are as follows:

Step 1: Initialize F, the set of factors influencing PM2.5 concentration, and establish an empty set S.

Step 2: Calculate the mutual information between each feature in F and the PM2.5 concentration of the next day. Arrange the features in F in descending order by the size of their mutual information.

Step 3: Deposit the first feature in F into S, and delete fi from F.

Step 4: Arrange the features in accordance with the principle of maximum relevance and minimum redundancy. Deposit the arranged features into S.

Step 5: Delete irrelevant and redundant features in S according to the criteria of the approximate Markov blanket.

Step 6: Export the optimal set of factors affecting PM2.5 concentration, which is S.

 
2.4.2 PSO hybrid Kernel SVR (PSO-HK-SVR)

Based on statistical principles, particularly that of structural minimization, SVR can satisfactorily solve high-dimensional and over fitting problems. The core idea is to use the kernel function to map the imported data to a high-dimensional feature space rather than transforming the nonlinear problem into a linear problem. Through the use of the kernel function, dot product operations in high-dimensional space can be avoided, and an objective function can be formulated as follows:

 

where, the weight is vector and b is the offset constant. Substituting them into the kernel function yields the optimal hyper plane fitting function:

  

where, ai and ai* are Lagrange multipliers and K(xxi) is the kernel function.

When performing PM2.5 prediction using an SVR, kernel function selection has a decisive influence on the prediction result. Of the numerous types of kernel functions created for SVR applications, the following three types are the most commonly used:

(1) Poly Kernel

K(xxi) = [(xxi) + 1]q

(2) Gaussian RBF kernel

 

(3) Sigmoid kernel

K(xxi) = tanh(v(x·xi) + c)

Poly kernel is a global kernel function that has a significant influence on dots that are far apart; it has extremely strong generalizability but weak learning capacity. By contrast, Gaussian RBF is a local kernel function that only influences dots that are relatively close; it has strong learning capacity but weak generalizability (Huanrui, 2016; Liu et al., 2016).

To create a kernel that influences close-together dots as well as far-apart dots in fuzzy prediction, Poly and the Gaussian RBF kernel can be weighted and integrated to formulate a new linear HK (Huanrui, 2016):

 

where, Krbf is the RBF kernel; Kpoly is Poly; is the polynomial order; σ is the bandwidth parameter of the RBF kernel; r is the weight coefficient for the HK, and r ∈ [0,1]. When r = 1, the HK functions as Poly, whereas when r = 0, the HK is the RBF kernel. An analysis of Eq. (8) reveals that three variables in the equation-namely q, σ and r-must be optimized. Therefore, obtaining the solution to the HK entails optimization of three variables, which can be expressed as x = [qσ, r]. Smart algorithms such as the genetic algorithm, PSO, and artificial bee colony algorithm are one of the most effective methods of solving such problems. In this study, PSO, the most easily performed smart algorithm, was used to optimize the three parameters of HK-SVR. The core of the PSO algorithm is update of the speed and location of particles (Jiao and Liu, 2009)

 

where, λ is the inertia weight, the value of which generally decreases from 0.9 to 0.4; c1 and c2 are the learning factors, the values of which are typically 2; and u1 and u2 are two random numbers. The PSO algorithm was employed to optimize x = [qσ, r], a parameter combination in HK-SVR. The parameter combination with the optimal fitness was used in the HK-KELM model. The fitness was obtained by

 

where, yi is the actual value; yi* is the predicted value, and m is the total number of trained samples. The optimized parameters were then substituted into Eq. (8) to derive the final equation of the HK-SVR model.


2.4.3 Construction of nMRMR method-based PSO-HK-SVR model

Because PM2.5 concentration is affected by numerous factors and strong correlations exist among these factors, this study first used the approximate Markov blanket-based nMRMR algorithm to select features. The optimal feature subset was then employed as the SVR model input. Because PM2.5 concentration variation is difficult to describe using a single kernel function, the HK was employed to modify the conventional SVR model; furthermore, the PSO algorithm was adopted to optimize the parameters of the HK. Finally, the nMRMR-PSO-HK-SVR model was constructed in MATLAB. Fig. 2 illustrates the process.

Fig. 2. Flow diagram of nMRMR-PSO-HK-SVR model.
Fig. 2Flow diagram of nMRMR-PSO-HK-SVR model.

(1) Collect the 2016–2019 year air quality statistics and weather information of Wuhan and produce the time series data of next-day PM2.5 concentration to be used as the model output.

(2) Use the approximate Markov blanket-based nMRMR algorithm to select the optimal feature subset.

(3) Create a training set based on the optimal feature subset selected in Step (2). Subject the training set to normalization using Eq. (10) to eliminate interference caused by unit differences among the features.

 

where, x and x' are the values before and after normalization respectively; and xmin and xmax are the minimum and maximum values of the original data series respectively.

(1) Use the HK to improve the conventional SVR model.

(2) Train the SVR using the normalized training set; use the optimized PSO algorithm to obtain the corresponding parameters and a prediction model.

(3) Input the testing sample to the prediction model to obtain the PM2.5 concentration of the next day.

 
2.4 Prediction Model and Evaluation Indices

In order to verify the validity of the proposed model, the following error evaluation indexes were selected: mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE), mean square error (RMSE) and Theil's inequality coefficient (TIC) (Chen and Li, 2019), which is calculated as follows.

 

where, yi and yi* represent the measured and predicted values of PM2.5 concentration, and m is the number of samples in the test set. In the evaluation index, MAE, RMSE and MAPE are used to quantify the error of prediction results. The smaller MAE, RMSE and MAPE indicate that the prediction accuracy is higher. TIC is used to evaluate the prediction ability of different prediction models. The smaller TIC is, the better the prediction ability of the model is.

To compare the validity of the nMRMR-PSO-HK-SVR method proposed in this paper, we also use back propagation (BP), support vector regression (SVR), normal maximum relevance minimum redundancy SVR (nMRMR-SVR) and Particle Swarm Optimization-hybrid kernel-SVR (PSO-HK-SVR) to predict the concentration of PM2.5. Finally the results of the five prediction methods are compared and analyzed. In the BP neural network prediction model, two hidden layers are set up; the first hidden layer has 30 neuron nodes and the second hidden has 20 neurons; the iteration step size is chosen to be 0.001; the number of training sessions is 1,000; the training target is taken to be 0.0001. In the SVR and nMRMR-SVR models, the radial basis function is chosen as the nuclear function, and the nuclear function parameters are selected by ten-count cross-validation. In the PSO-HK-SVR and nMRMR-PSO-HK-SVR models, radially based and polynomial kernel functions are selected to form hybrid kernel functions, the parameters of which are determined by a particle swarm optimization algorithm. The parameters of the particle swarm algorithm are set as follows: population size is 25, the number of evolutions is 200 times, and the learning factor is taken to be c1 = c2 = 2.

 
3 RESULTS AND ANALYSIS


 
3.1 Analysis of Wuhan’s and Tianjin’s Pollution

Fig. 3 shows that the concentration of PM2.5 is U-shaped and has obvious seasonal characteristics, with a low concentration in summer and a sharp increase in winter. It can be seen from Table 1 that the current pollution situation in Wuhan and Tianjin has been relieved after the treatment in previous years. And the annual average value is about 45.71 µg m3. But the concentration of PM2.5 also often reaches 150 µg m3 in winter.

Fig. 3. PM2.5 concentration of Wuhan city and Tianjing city.
Fig. 3.
 PM2.5 concentration of Wuhan city and Tianjing city.

Table 1. Statistical results of air quality data and meteorological data in Wuhan.  


3.2 nMRMR-method Based Feature Selections

In this study, the 2016–2019 year data for 11 features in Wuhan were obtained to produce the times series of next-day PM2.5 concentration, which is expressed as PM2.5 Table 2 presents the between-feature mutual information values of the 12 features. A greater value indicates stronger correlation. A few features were indeed strongly correlated; therefore, feature selection could not be conducted as if the features were mutually independent. When employed as the output of the prediction model, PM2.5p was most strongly correlated with air pressure, temperature, and the concentrations of PM10, CO, and O3. When the between-feature degree of redundancy was also considered, the optimal subset selected using the nMRMR algorithm was air pressure, temperature, PM10 concentration, 46 concentration and precipitation. These five features were used as the input of the prediction model. Despite being extremely strongly correlated with PM2.5p, CO was not included in the optimal input subset because CO had extremely strong correlation with PM10 and O3. This proves that the nMRMR algorithm indeed takes the redundancy in data into consideration when selecting the optimal input subset.

Table 2. Mutual information values between the 12 feature data (Air quality data and meteorological data).

 
3.3 Short-term Prediction of PM2.5 in Wuhan City

In order to analyze the validity of the nMRMR-PSO-HK-SVM model proposed in this paper, a total of 1065 data from 2016/01/01 to 2018/11/30 is selected as the training sample set, and a total of 31 data from 2018/12/01to 2018/12/31 is selected as the test sample set for the short-term prediction experiment of PM2.5. Meanwhile, the prediction results of the proposed method are compared with BP method, SVR method, nMRMR-SVR method, and PSO-HK-SVR method. The prediction performance of all prediction models are evaluated using four error indices including MAE, MAPE, RMSE and TIC.

The prediction results of the five models are shown in Fig. 4, and the four error performance indicators (MAE, MAPE, RMSE, TIC) for all models are shown in Table 3. In Table 3, the minimum values for each error performance indicator are marked in bold black. From Table 3, it can be seen that the four error performance metrics (MAE, MAPE, RMSE, TIC) for the prediction results of the nMRMR-PSO-HK-SVR model are minimal compared to the other four methods. This indicates that the hybrid nMRMR-PSO-HK-SVR model proposed in this paper has the best prediction performance performance. To compare the prediction errors of different models more visually, in Fig. 5 we present histograms of MAE, MAPE, RMSE, and TIC for different methods.

Fig. 4. Comparison of short-term forecast results of five methods with measured values (Wuhan).Fig. 4. Comparison of short-term forecast results of five methods with measured values (Wuhan).

Table 3. Comparison of short-term prediction accuracy of five methods (Wuhan).

Fig. 5. Comparison chart of error evaluation indexes in short-term prediction of the five methods (Wuhan).Fig. 5. Comparison chart of error evaluation indexes in short-term prediction of the five methods (Wuhan).

In order to further analyze the effect of nMRMR feature selection method and PSO-HK technique on the prediction accuracy, the following three types of comparison testsare conducted in this paper. The first type of comparison test (Comparison Test I) is used to analyze the effect of nMRMR method on the prediction accuracy of PM2.5, in which BP model, SVR model and nMRMR-SVR model are compared and analyzed. The second type of comparison test (Comparison Test II) is used to analyze the effect of Particle Swarm Optimization (PSO) algorithm and hybrid kernel function on the prediction of SVR, in which we compare the prediction results of the SVR model, PSO-HK-SVR. The third type of comparison test (Comparison Test III) is used to analyze the effect of the hybrid model nMRMR-PSO-HK on the prediction accuracy of SVR, in which the PM2.5 prediction results of the nMRMR-SVR model, the PSO-HK-SVR model, and the nMRMR-PSO-HK-SVR model are Contrast analysis. The results of the analysis of the three types of comparison experiments are shown in Table 4. From the results of Table 4, we can obtain the following conclusions.

Table 4. Comparison results of Comparison Test I, II and III in short-term prediction (Wuhan).

 
3.3.1 Comparison results between SVR and BP, nMRMR-SVR and SVR

From Table 3, it can be seen that both SVR and BP models are able to complete the PM2.5 concentration prediction to a certain extent, but the prediction accuracy of SVR model is somewhat improved compared to BP model. The final decision function of SVR is determined by only a few support vectors, and the complexity of the calculation depends on the number of support vectors rather than the dimension of the sample space, which in a sense avoids the "Dimensional catastrophe". Since only a few support vectors are needed to determine the final result, it helps us to catch the key samples and "eliminate" a large number of redundant samples, which makes the algorithm more "robust". The experimental results also show that the SVR method outperforms the BP network method in terms of stability and generalization.

Comparing the nMRMR-SVR model with the SVR model shows that the nMRMR-SVR model after feature selection is superior to the traditional SVR model in terms of MAE, MAPE, RMSE, and TIC evaluation indexes. From Table 4, it can be seen that compared with the SVR model the nMRMR-SVR model reduced the values of MAE, RMSE, MAPE, and TIC by 14%, 11%, 15%, and 38%.

 
3.3.2 Comparison results of PSO-HK-SVR and SVR

From Table 4, it can be seen that the optimization of SVR by using the hybrid kernel function (HK) and particle swarm algorithm (PSO) significantly improves the prediction accuracy of SVR for PM2.5. Compared with SVR model, the values of MAE, RMSE, MAPE, and TIC of PSO-HK-SVR model are reduced by 15%, 6%, 32%, and 32%, 10%.

Therefore, optimizing the parameters of the kernel functions by mixing the kernel functions and PSO can improve the predictive ability of SVR. Furthermore, it can be seen that SVR cannot reduce the prediction error more effectively by relying only on a single radial basis function. This is because the radial basis kernel function, as a local kernel function, does not have a strong generalization ability, and cannot accurately track the sudden changes in PM2.5 concentration, which limits the prediction accuracy of the model to a certain extent and makes it difficult for the traditional SVR model to warn of sudden air pollution events. Therefore, the use of mixed kernel functions has a significant impact on the improvement of the prediction accuracy.

 
3.3.3 Comparison results of nMRMR-PSO-HK-SVM and nMRMR-SVR,PSO-HK-SVR

From Table 3, it can be seen that the nMRMR-PSO-HK-SVR model has a higher prediction accuracy compared to the nMRMR-SVR model, PSO-HK-SVR model, and its prediction results are closer to the measured values. The four error performance indicators of the nMRMR-PSO-HK-SVR model are also minimal. From Table 4, it can be seen that compared with the nMRMR-SVR model the values of MAE, MAPE, RMSE and TIC of the nMRMR-PSO-HK-SVM model are reduced by 47%, 34%, 64% and 39%; compared with the PSO-HK-SVR model the values of MAE, MAPE, RMSE and TIC of the nMRMR-PSO-HK-SVM model are reduced by 20%, 24%, 51% and 24%.

In the nMRMR-PSO-HK-SVM model, the input vector is still the optimal subset elected by the nMRMR algorithm. But in order to improve the shortcomings of the nMRMR-SVM model and enhance the generalization ability of the model, the hybrid kernel function (HK) is constructed by Eq. (8) and used as the kernel function to improve the model. Compared with the radial kernel function (Chen and Pai, 2015), the hybrid kernel function in this paper is more suitable to describe the complex changes of PM2.5.

The weight coefficients of the mixed kernel function are determined by the PSO search algorithm (Zhou et al., 2017). From the prediction results in Fig. 4, the model prediction results are very close to the measured values. And the prediction accuracy is also high in some positions with large fluctuations, which further indicates that the optimized mixed kernel function can further enhance the generalization ability of the model.

 
3.4 Short-term Prediction of PM2.5 in Tianjin City

In order to further systematically and comprehensively analyze the validity and applicability of the proposed nMRMR-PSO-HK-SVR method, we use the Tianjin PM2.5 data for short-term prediction experimental analysis. In order to analyze the validity of the nMRMR-PSO-HK-SVM model proposed in this paper, a total of 1065 data from 2016/01/01 to 2018/11/30 is selected as the training sample set, and a total of 31 data from 2018/12/01 to 2018/12/31 is selected as the test sample set for the short-term prediction experiment of PM2.5 in Tianjin. The prediction results of PM2.5 concentration for different models are shown in Fig. 6, and the prediction performance indexes of MAE, MAPE, RMSE and TIC for each prediction model are calculated and the results are shown in Fig. 7Fig. 8Table 5 and Table 6.

Fig. 6. Comparison of short-term forecast results of five methods with measured values (Tianjin).Fig. 6. Comparison of short-term forecast results of five methods with measured values (Tianjin).

Fig. 7. Comparison chart of error evaluation indexes in short-term prediction of the five methods (Tianjin).Fig. 7. Comparison chart of error evaluation indexes in short-term prediction of the five methods (Tianjin).

Fig. 8. The PM2.5 long term prediction result of five models (Wuhan) (a) prediction result of SVR (b) prediction result of PSO-SVR (c) prediction result of nMRMR-SVR (d) prediction result of PSO-HK-SVR (e) prediction result of nMRMR-PSO-HK-SVR.Fig. 8. The PM2.5 long term prediction result of five models (Wuhan) (a) prediction result of SVR (b) prediction result of PSO-SVR (c) prediction result of nMRMR-SVR (d) prediction result of PSO-HK-SVR (e) prediction result of nMRMR-PSO-HK-SVR.

Table 5. Comparison of short-term prediction accuracy of five methods (Tianjin).

Table 6. Comparison results of Comparison Test I, II and III in short-term prediction (Tianjin).

From the experimental results, it can be seen that different models can obtain similar results as those of Wuhan in predicting PM2.5 in Tianjin. Compared with the BP model, SVR model, nMRMR-SVR model and PSO-HK-SVR model, the hybrid nMRMR-PSO-HK-SVR model proposed in this paper has the highest prediction accuracy. Compared with the BP prediction model, the MAE, MAPE, RMSE, and TIC of the nMRMR-PSO-HK-SVR model are reduced by 53.582%, 46.811%, 78.445%, and 80.054%. Compared with theSVR model, the MAE, MAPE, RMSE, and TIC of the nMRMR-PSO-HK-SVR model are reduced by 60%, 42%, 70%, and 72%.

Compared with the nMRMR-SVR model, the MAE, MAPE, RMSE, and TIC of the nMRMR-PSO-HK-SVR model are reduced by 49%, 37%, 65%, and 47%. Compared with the PSO-HK-SVR model, the MAE, MAPE, RMSE, and TIC of the nMRMR-PSO-HK-SVR model are reduced by 34%, 30%, 53%, and 39%. The experimental results for the prediction of PM2.5 in Tianjin further confirmed that the proposed model is very suitable for the prediction of PM2.5 concentration and has a strong adaptive capability. It also demonstrates that the hybrid model (nMRMR-PSO-HK-SVR) has better prediction performance than the single input feature selection model (nMRMR-SVR) and the single kernel function optimization model (PSO-HK-SVR). From the experimental results we can also see that the input feature selection (nMRMR) and particle swarm optimization hybrid kernel function (PSO-HK) can indeed improve the prediction ability of SVR.

 
3.5 Long-term Prediction of PM2.5 in Wuhan City

In order to evaluate the performance of the model more comprehensively, a total of 1096 data from 2016/01/01to 2018/12/31 in Wuhan is selected as the training sample set, and a total of 181 data from 2019/01/01 to 2019/06/30 is selected as the test sample set for the long-term prediction experiment of PM2.5.

Fig. 8 presents the results of the comparison between predicted and measured values for the long-term predictions of the different models. It can be seen that the trend of the predicted and measured values of the nMRMR-PSO-HK-SVR model is almost the same, especially for the location of the "PM2.5 concentration peak", which also has a good dependence effect. The long-term prediction results of PM2.5 concentration in Wuhan show that the nMRMR-PSO-HK-SVR model proposed in this paper can be adapted to the long-term prediction needs, and can be used not only for the prediction of PM2.5 concentration in the weather with good air quality, but also for the prediction of PM2.5 concentration in the heavily polluted weather. Table 7 gives the values of the error performance indicators for the five models when predicted in the long term. As can be seen in Table 7, the four error performance metrics of the nMRMR-PSO-HK-SVR model are also minimal in the long-term predictions. The MAE, MAPE, RMSE, and TIC decreased by 6, 11%, 10, and 0.13 compared to the SVR model. The MAE, MAPE, RMSE, and TIC decreased by 6, 11%, 10 and 0.13 compared to the nMRMR-SVR model. The MAE, MAPE, RMSE, and TIC decreased by 4, 8%, 6, and 0.05 compared to the PSO-SVR model. The MAE, MAPE, RMSE, and TIC decreased by 1, 5%, 4, and 0.04 compared to the nMRMR-PSO-HK-SVR model.

Table 7. Comparison of long-term prediction accuracy of five methods (Wuhan).

The scatter plot of the predicted and measured values of the five methods on the test set is shown in Fig. 9. In Fig. 9, the horizontal coordinates indicate the measured values, the vertical coordinates indicate the predicted values, the trend line indicates the regression line between the predicted and measured values, and R2 indicates the coefficient of determination, with the closer R2 to 1 indicating a stronger linear relationship between the predicted and measured values. From the scatter plots, it can be seen that the R2 for the predicted and measured values of the SVR model, nMRMR-SVR model, PSO-SVR model, PSO-HK-SVR model and nMRMR-PSO-HK-SVR model are 0.74, 0.88, 0.89, 0.92, and 0.97. nMRMR-PSO-HK-SVR model has the largest R2 which is close to 1, indicating that the nMRMR-PSO-HK-SVR model is also close to the measured value in the long-term prediction of PM2.5.

Fig. 9. Long-term predicted scatter plot of PM2.5 in the Wuhan region. (a) BP model, (b) SVR model, (c) nMRMR-SVR model, (d) PSO-HK-SVR model, (e) nMRMR-PSO-HK-SVR model.
Fig. 9. Long-term predicted scatter plot of PM2.5 in the Wuhan region. (a) BP model, (b) SVR model, (c) nMRMR-SVR model, (d) PSO-HK-SVR model, (e) nMRMR-PSO-HK-SVR model.

Based on the above experimental results, the combined nMRMR and PSO-HK-SVR can achieve accurate prediction of PM2.5 concentrations over longer periods and under different weather quality conditions, indicating that the nMRMR-PSO-HK-SVR model is feasible and reliable for application.

 
4 CONCLUSIONS


Fine particulate matter PM2.5 is an important air pollution measurement data, and the prediction of PM2.5 is of great significance for environmental protection. Considering the efficiency, practicability and accuracy of the prediction, this manuscript firstly extracts the optimal feature subset from the air quality data and meteorological data by the approximate maximum correlation minimum redundancy algorithm. Then, the optimal feature subset was used as the input, and the mixed kernel function support vector regression model was used to predict the PM2.5 concentration in the next 24 h (the next day). The optimal parameters of the hybrid kernel function can be adaptively determined by the particle swarm optimization algorithm.

(1) Air quality elements and weather elements are both strongly correlated with PM2.5 concentration. However, data redundancy between features exists because some features are strongly intercorrelated. This hinders the precision of the SVR model in predicting PM2.5 concentration.

(2) The approximate Markov blanket-based nMRMR algorithm considers the correlations among features while considering the correlations between ordinary features. The optimal feature subset selected by the algorithm can retain the majority of data even though its dimensionality is reduced.

(3) When using the SVR model for PM2.5 concentration prediction, the prediction precision is strongly affected by the kernel function employed. Because of the high complexity of PM2.5 concentration variation, a single kernel function has difficulty describing all the varying features. By comparison, a PSO-based HK is more appropriate for describing complex variation.

(4) The prediction experiments of PM2.5 concentrations in Wuhan 2018 and 2019 show that the nMRMR-PSO-HK-SVM model has higher prediction accuracy. Compared with the traditional SVR model, the MAE, MAPE, RMSE and TIC of PM2.5 short-term prediction decreased by 5, 11%, 10, 0.11, respectively. The MAE, MAPE, RMSE, and TIC of PM2.5 long-term prediction decreased by 6, 11%, 10, and 0.13 respectively.

In the case of known air quality and weather data, the proposed model in the manuscript can effectively predict the PM2.5 concentration value in the next day (24 h). After adjusting the predicted data structure, the proposed model can also be used to predict the PM2.5 concentration in the next 2 days (48 h) or 3 days (72 h), however, the accuracy of long-term forecasting is affected by the accumulation of forecasting errors.  How to make a more accurate long-term prediction of PM2.5 concentration is what we will continue to study in the future.


REFERENCES


  1. Cai, Z., Ma, H., Zhang, L. (2019). Feature selection for airborne LiDAR data filtering: A mutual information method with Parzon window optimization. GISci. Remote Sens. 57, 323–337. https://doi.org/10.1080/15481603.2019.1695406

  2. Chen, J.F., Li, Y. (2019). Forecasting of PM2.5 concentration based on multimodal support vector regression. Environ. Eng. 37, 122–126. (in Chinese)

  3. Chen, L., Pai, T.Y. (2015). Comparisons of GM (1, 1), and BPNN for predicting hourly particulate matter in Dali area of Taichung City, Taiwan. Atmos. Pollut. Res. 6, 572–580. https://doi.org/10.5094/apr.2015.064

  4. Cura, T. (2020). Use of support vector machines with a parallel local search algorithm for data classification and feature selection. Expert Syst. Appl. 145, 113133. https://doi.org/10.1016/j.eswa.2019.113133

  5. Dhamecha, T.I., Noore, A., Singh, R., Vatsa, M. (2019). Between-subclass piece-wise linear solutions in large scale kernel SVM learning. Pattern Recognit. 95, 173–190. https://doi.org/10.1016/j.patcog.2019.04.012

  6. Fei, S. (2016). A hybrid model of EMD and multiple-kernel RVR algorithm for wind speed prediction. Int. J. Electr. Power Energy Syst. 78, 910–915. https://doi.org/10.1016/j.ijepes.2015.11.116

  7. Gu, F., Hu, M., Wang, Y., Li, M., Guo, Q., Wu, Z. (2016). Characteristics of PM2.5 pollution in winter and spring of Beijing during 2009-2010. China Environ. Sci. 36, 2578–2584. (in Chinese with English Abstract)

  8. Hua, Z., Zhou, J., Hua, Y., Zhang, W. (2020). Strong approximate Markov blanket and its application on filter-based feature selection. Appl. Soft Comput. 87, 105957. https://doi.org/10.1016/j.asoc.2019.105957

  9. Huanrui, H. (2016). New mixed kernel functions of SVM used in pattern recognition. Cybern. Inf. Technol. 16, 5–14. https://doi.org/10.1515/cait-20160047

  10. Jayakumar, C., Sangeetha, J. (2020). Kernellized support vector regressive machine based variational mode decomposition for time frequency analysis of Mirnov coil. Microprocess. Microsyst. 75, 103036. https://doi.org/10.1016/j.micpro.2020.103036

  11. Jiao, W., Liu, G.B. (2009). An Improved Particle Swarm Optimization Algorithm with Immunity. 2009 Second International Conference on Intelligent Computation Technology and Automation, Presented at the 2009 Second International Conference on Intelligent Computation Technology and Automation, pp. 241–244. https://doi.org/10.1109/ICICTA.2009.66

  12. Ju, Z., He, J.J. (2018). Prediction of lysine glutarylation sites by maximum relevance minimum redundancy feature selection. Anal. Biochem. 550, 1–7. https://doi.org/10.1016/j.ab.2018.04.005

  13. Kim, M.H., Kim, Y.S., Lim, J., Kim, J.T., Sung, S.W., Yoo, C. (2010). Data-driven prediction model of indoor air quality in an underground space. Korean J. Chem. Eng. 27, 1675–1680. https://doi.org/10.1007/s11814-010-0313-5

  14. Liu, P., Choo, K.K.R., Wang, L., Huang, F. (2016). SVM or deep learning? A comparative study on remote sensing image classification. Soft Comput. 21, 7053–7065. https://doi.org/10.1007/s00500-016-2247-2

  15. Ni, X.Y., Huang, H., Du, W.P. (2017). Relevance analysis and short-term prediction of PM2.5 concentrations in Beijing based on multi-source data. Atmos. Environ. 150, 146–161. https://doi.org/10.1016/j.atmosenv.2016.11.054

  16. Niu, M., Wang, Y., Sun, S., Li, Y. (2016). A novel hybrid decomposition-and-ensemble model based on CEEMD and GWO for short-term PM2.5 concentration forecasting. Atmos. Environ. 134, 168–180. https://doi.org/10.1016/j.atmosenv.2016.03.056

  17. Patricio, P., Camilo, M., Camilo, R. (2020). PM2.5 forecasting in Coyhaique, the most polluted city in the Americas. Urban Clim. 32, 100608. https://doi.org/10.1016/j.uclim.2020.100608

  18. Perez, P., Gramsch, E. (2016). Forecasting hourly PM2.5 in Santiago de Chile with emphasis on night episodes. Atmos. Environ. 124, 22–27. https://doi.org/10.1016/j.atmosenv.2015.11.016

  19. Qiao, J., Cai, J., Han, H. (2017). Predicting PM2.5 concentrations at a regional background station using second order self-organizing fuzzy neural network. Atmosphere 8, 10. https://doi.org/10.3390/atmos8010010

  20. Qin, X.W., Liu, Y.Y., Wang, X.M., Dong, X.G., Zhang, Y., Zhou, H.M. (2016). PM2.5 prediction of Beijing city based on ensemble empirical mode decomposition and support vector regression. J. Jilin Univ. (Earth Science Edition) 46, 563–568.

  21. Singh, K.P., Gupta, S., Kumar, A., Shukla, S.P. (2012). Linear and nonlinear modeling approaches for urban air quality prediction. Sci. Total Environ. 426, 244–255. https://doi.org/10.1016/j.scitotenv.2012.03.076

  22. Song, G., Guo, X., Yang, X., Liu, S. (2018). ARIMA-SVM combination prediction of PM2.5 concentration in Shenyang. China Environ. Sci. 38, 4031–4039. (in Chinese with English Abstract)

  23. Sun, W., Sun, J. (2016). Daily PM2.5 concentration prediction based on principal component analysis and LSSVM optimized by cuckoo search algorithm. J. Environ. Manage. 188, 144–152. https://doi.org/10.1016/j.jenvman.2016.12.011

  24. Vapnik, V. (1998). Statistical learning Theory. New York.

  25. Wang, H., Ling, Z., Yu, K., Wu, X. (2020). Towards efficient and effective discovery of Markov blankets for feature selection. Inf. Sci. 509, 227–242. https://doi.org/10.1016/j.ins.2019.09.010

  26. Wang, L.M., Wu, X.H., Zhao, T.L., Cheng, G.S., Zhang, X.Z., Tang, L.L., Jia, M.W., Chen, Y.S. (2017a). A scheme for rolling statistical forecasting of PM2.5 concentrations based on distance correlation coefficient and support vector regression. Acta Sci. Circumst. 37, 1268–1276. (in Chinese with English Abstract)

  27. Wang, P., Zhang, H., Qin, Z., Zhang, G. (2017b). A novel hybrid-Garch model based on ARIMA and SVM for PM2.5 concentrations forecasting. Atmos. Pollut. Res. 8, 850–860. https://doi.org/10.1016/j.apr.2017.01.003

  28. Wang, P., Zhang, H., Qin, Z.Z., Yao, Q.C., Geng, H. (2017c). PM10 concentration forecasting model based on Wavelet-SVM. Environ. Sci. 38, 3153–3161 https://doi.org/10.13227/j.hjkx.201612194

  29. Wang, Z., Li, Y., Chen, T., Zhang, D., Sun, F., Pan, L. (2015). Spatial-temporal characteristics of PM2.5 in Beijing in 2013. Acta Geogr. Sin. 70, 110–120. (in Chinese)

  30. Yan, H., Zhang, J., Rahman, S. S., Zhou, N., Suo, Y. (2020). Predicting permeability changes with injecting COin coal seams during CO2 geological sequestration: A comparative study among six SVM-based hybrid models. Sci. Total Environ. 705, 135941. https://doi.org/10.1016/j.scitotenv.2019.135941

  31. Yin, J.G, Peng, F, Xie, L.K, Xu, Y., Liu, H., Gong, Q.Q., Wang, K. (2018). The study on the prediction of the PM2.5 concentration based on model of the least squares support vector regression under wavelet decomposition and adaptive multiple layer residuals correction. Acta Sci. Circumst. 38, 2090–2098. (in Chinese with English Abstract)

  32. Yu, K., Liu, L., Li, J. (2020). Learning markov blankets from multiple interventional data sets. IEEE Trans. Neural Networks Learn. Syst. 31, 2005–2019. https://doi.org/10.1109/TNNLS.2019.2927636

  33. Zhang, K. (2019). Forecasting regional economic growth using support vector machine model. Ecol. Econ. 15, 186–192.

  34. Zhang, L., Wang, Z. (2018). Multi-label feature selection algorithm based on joint mutual information of max-relevance and min-redundancy. J. Comm. 39, 111–122. (in Chinese with English Abstract) https://doi.org/10.11959/j.issn.1000-436x.2018082

  35. Zhong, Z., Carr, T.R. (2016). Application of mixed kernels function (MKF) based support vector regression model (SVR) for CO2-Reservoir oil minimum miscibility pressure prediction. Fuel 184, 590–603. https://doi.org/10.1016/j.fuel.2016.07.030

  36. Zhou, G.Q., Gao, W., Gu, Y.X., Qu, Y (2017). Impact of precipitation on Shanghai PM2.5 forecast using WRF-Chem. Acta Sci. Circumst. 37, 4476–4482. (in Chinese with English Abstract)


Share this article with your colleagues 

 

Subscribe to our Newsletter 

Aerosol and Air Quality Research has published over 2,000 peer-reviewed articles. Enter your email address to receive latest updates and research articles to your inbox every second week.

7.3
2022CiteScore
 
 
77st percentile
Powered by
Scopus
 
   SCImago Journal & Country Rank

2022 Impact Factor: 4.0
5-Year Impact Factor: 3.4

The Future Environment and Role of Multiple Air Pollutants

Aerosol and Air Quality Research partners with Publons

CLOCKSS system has permission to ingest, preserve, and serve this Archival Unit
CLOCKSS system has permission to ingest, preserve, and serve this Archival Unit

Aerosol and Air Quality Research (AAQR) is an independently-run non-profit journal that promotes submissions of high-quality research and strives to be one of the leading aerosol and air quality open-access journals in the world. We use cookies on this website to personalize content to improve your user experience and analyze our traffic. By using this site you agree to its use of cookies.