Concentration Prediction Based on Markov Blanke Feature Selection and Hybrid Kernel Support Vector Regression Optimized by Particle Swarm Optimization

This study employed air quality and meteorological data as research materials and extracted the optimal feature subset by using the approximate Markov blanket-based normal maximum relevance minimum redundancy (nMRMR) algorithm to serve as the input data of the prediction model. In addition, a hybrid kernel (HK) was created to improve upon the traditional support vector regression (SVR) model. Particle swarm optimization (PSO) was used to calculate the optimal parameters of hybrid kernel (HK) SVR, which were then used to establish the nMRMR-PSO-HK-SVR model for PM 2.5 concentration prediction. The 2016–2019 year air quality and weather data of Wuhan and Tianjin were employed to test the proposed method. The experimental results show that the mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE) and Theil’s inequality coefficient (TIC) of nMRMR-PSO-HK-SVR model are lower than those of SVR, PSO-SVR, nMRMR-SVR and PSO-HK-SVR model. But also, the proposed model could more precisely track moments of sudden PM 2.5 concentration change. Thus, the nMRMR-PSO-HK-SVR model has more satisfactory generalizability and can predict PM 2.5 concentration more precisely.


INTRODUCTION
Rapid development of economies worldwide has caused increasingly severe air pollution. A major pollutant, PM2.5 remains in the air for a long time and can be transported over great distances because of its small size. These results in lowered visibility and severe deterioration of air quality and the atmospheric environment because of the copious toxic substances PM2.5 carries, thus posing a health risk (Gu et al., 2006). Estimation of the PM2.5 concentration has critical value and significance to early warnings of severe pollution events. To date, PM2.5 concentration has generally been estimated using validation or statistical models (Patricio et al., 2020). Validation models are primarily constructed using historical weather information and chemical initial and boundary conditions to infer the complex process of pollutant formulation (Wang et al., 2017a). Therefore, the estimation precision of these models is dependent on the accuracy of complex historical records, and accurate historical records are usually difficult to obtain. Because of the development in regression learning, artificial neural network and support vector regression (SVR) models have been successfully applied to estimate PM2.5 concentration. Perez and Gramsch (2016) verified that when historical pollutant concentration data and weather data are available, a feed forward neural network model can effectively predict the hourly concentration of PM2.5. Sun and Sun (2017) combined principal component analysis (PCA) with least-squares SVR to predict the daily PM2.5 concentration; their experimental results revealed that the prediction precision was high. Statistical models describe the relationship between PM2.5 concentration and the factors influencing it; therefore, the models have high prediction precision (Yin et al., 2018). Although artificial neural network models can be employed in PM2.5 concentration estimation, problems tend to occur, such as local optimal solutions being obtained and over fitting (Niu et al., 2016). SVR models are based on statistical learning theories (Zhang, 2019); structural minimization is adopted as a principle, and the problem of over fitting does not exist. Hence, such models exhibit favorable generalizability (Yan et al., 2020). As a major type of air pollutant, PM2.5 has complex origins and forms through a complicated process under the influence of numerous factors (Ni et al., 2017;Song et al., 2018). It exhibits high complexity and nonlinearity (Wang et al., 2017b). Most studies on PM2.5 concentration estimation have been based on PM2.5 time series (Qin et al., 2016;Wang et al., 2020), which strongly affect the prediction precision. In the present study, six air quality indices (PM2.5, PM10, SO2, NO2, CO, and O3 concentrations) and five weather factors (temperature, relative humidity, precipitation, wind speed, and air pressure) were employed to predict the PM2.5 concentration on the next day. However, redundant data had to be eliminated, and this required dimension reduction of the aforementioned 11 factors. Some scholars (Wang et al., 2017c;Jayakumar and Sangeetha, 2020) have used correlation coefficients or PCA for feature selection. Qiao et al. (2017) combined with principal component analysis (PCA) and fuzzy neural network to predict the concentration of PM2.5, and obtained better prediction results. Singh and Gupta (2012) used stepwise linear regression method to select the original features in the prediction of urban air quality, and used linear and nonlinear prediction models for experimental comparison. Kim et al. (2010) used the partial least squares (PLS) method to select the variables that have a greater impact on the output to predict PM2.5 and PM10 in the subway station, and compared with the prediction results obtained by taking all the measured variables as inputs, which proved the necessity of selecting characteristic variables.
The selection of PM2.5 characteristic variables by the above common methods only reflects the linear relationship between variables, and does not consider the nonlinear relationship between variables. However, correlation coefficients are appropriate only when all features are independent, PCA can only process linear questions, and the 11 features considered in this study exhibit strong correlations and nonlinearity (Wang et al., 2015). Therefore, this paper proposes a feature selection method based on the approximate Markov blanket-based normal maximum relevance minimum redundancy (nMRMR) algorithm (Zhang and Wang, 2018;Cai et al., 2019), aiming at the shortcomings of the commonly used PM2.5 feature variable selection method. The maximum correlation and minimum redundancy mutual information based on the approximate Markov blanket is used to calculate and sort out some features with small correlation, and the optimal feature collection is selected. Support Vector Regression (SVR) is one of the most robust and accurate methods in data mining algorithms (Vapnik, 1998), which mainly includes SVM classification and support vector regression. Sun et al. (2016) presented a hybrid model based on PCA and least square support vector regression (LSSVR) optimized by cuckoo search algorithm to predict PM2.5 concentrations. The prediction precision of an SVR model is dependent on the type of kernel functions (Cura, 2020). The radial basis function kernel (RBF kernel) has a high degree of local fitting, whereas the polynomial kernel (Poly) has strong generalizability (Dhamecha et al., 2019). Poly and the RBF kernel can be linearly integrated to form a hybrid kernel (HK), which retains the advantages of the two original functions, enhances generalizability in the SVR model, and has been applied satisfactorily in numerous fields (Fei, 2016;Zhong and Carr, 2016).
In the Hybrid kernel SVR algorithm, the choice of SVR parameters (penalty factor and kernel function parameters) and kernel function combination parameters has an important impact on the accuracy of prediction, but at present, there is no optimal value method for kernel parameter, penalty factor and kernel function combination parameters. In traditional SVR, parameter selection is obtained by repeated experiments, which has great randomness of artificial selection. However, it takes a lot of time to select parameters by cross validation, although it overcomes human randomness to some extent. Particle swarm optimization (PSO), as an optimization method developed in recent years, has been widely used in function optimization, pattern recognition and other fields due to its easy implementation and deep intelligent background. In order to achieve the optimal selection of parameters in the hybrid kernel SVR model, this paper combines PSO with SVR, and uses the global search ability of PSO to search the parameters in HK-SVR In summary, the present study employed historical air quality and weather data. The nMRMR algorithm was used to first select features, and the optimal feature subset was chosen and input to the SVR model. A particle swarm optimization (PSO)-based HK was constructed to improve the conventional SVR model. Eventually, an nMRMR-PSO-HK-SVR model was established for estimating PM2.5 concentration. Fig. 1 briefly shows the geographical location of Wuhan and Tianjin. Wuhan is located in the eastern part of the Jianghan Plain in the middle and lower reaches of the Yangtze River (39°N, 114°E). Wuhan is the largest and most populous city in Hubei Province, with a total area of 8494.41 km 2 and a population of 11.08 million. Wuhan's terrain is mainly hilly, which is not conducive to the spread of PM2.5 Although Wuhan is currently making efforts to improve its environment, the air quality does not meet the national secondary standard, especially in winter, when the concentration of PM2.5rises sharply. In recent years, PM2.5 pollution has become a thorny issue in Wuhan, and it is of great significance to study the pollution control of Wuhan, a mega-city in a period of economic growth.

Research Site
Tianjin is the largest coastal city in northern China, located on the west coast of Bohai Bay (39°N, 117°E), which has become China's new growth pole and center of advanced industrial and financial activities. Tianjin has a warm temperate sub-humid monsoon climate, with a significant monsoon and four distinct seasons. In the past decades, with the rapid urbanization, Tianjin has become a mega-city with a population of over 10 million. At the same time, due to the development of industrialization and the increase of motor vehicles, hazy weather is a common occurrence. In order to protect public health and atmospheric environment, there is an urgent need to simulate and predict the concentration of PM2.5 in Tianjin.

Research Data
In this paper, data were collected from January 1, 2016 to June 30, 2019, including air quality data and meteorological data. The air quality data from https://www.aqistudy.cn/historydata/, including PM2.5 and PM10, sulfur dioxide (SO2), nitrogen dioxide (NO2), carbon monoxide (CO), Fig. 1. Geographical location of Wuhan City and Tianjin City. and the daily average density of ozone (O3) eight hours.The "8 hours of ozone" is a sliding average value, which is calculated based on the average concentration of the 8 consecutive hours with the highest ozone level between 8:00 and 24:00. The meteorological data were collected from http://www.wunderground.com/history/,including wind speed, precipitation, atmospheric pressure, relative humidity and temperature. A total of 1065 data from January 1, 2016 to November 30, 2018 in Wuhan and Tianjin were selected as the training sample set, and 31 data from December 1, 2018 to December 31, 2018 in the two regions were selected as the test sample set for the short-term prediction. A total of 181 data from January 1, 2019 to June 30, 2019 in the Wuhan region were selected as the test sample set for the long-term forecasts.
For a small number of missing data (1 in 2016, 3 in 2017, and 3 in 2018), we fill in the data with the mean value of the adjacent two days. For the prediction training, we first normalize the data. The normalization formula is as follows: In the formula, y denotes normalized data; x denotes pre-normalized data; xmin denotes data minimum; xmax denotes data maximum. In Section 1.4, four evaluation metrics are given in this paper to evaluate the prediction accuracy of the prediction model for PM2.5, and all algorithmic models and experiments in this paper are implemented in Matlab 2018b.

Description of the Problem
The goal of PM2.5 concentration prediction is to predict PM2.5 concentration for a fixed period of time in the future (for example, the next 24, 48 or 72 hours) using the observed values for a fixed period of time in the past (for example, 24 hours). The aim of this manuscript is to make a short-term prediction of PM2.5 concentrations for the next 24 h (next day) using past observations. For a given moment t, assume that the observed data in the past L (L × 24 h) day are in the sequence is a d dimensional vector consisting of pollutant concentration and some meteorological element observations. After the prediction model is constructed, PM2.5 concentration value in the next {t + 1, t + 2, …, t + K} day can be predicted by the data set ST as input.
In this manuscript, we use the maximum correlation minimum redundancy algorithm to extract the optimal features from the observed data ST, and use particle swarm optimization kernel support vector regression as a prediction model to make a short-term prediction of PM2.5 concentration in the next 24 hours (the next day).

Approximate Markov blanket-based nMRMR algorithm
Because the factors influencing PM2.5 concentration are strongly correlated, data redundancy and specifically nonlinear relationships must be considered. Therefore, the approximate Markov blanket-based nMRMR algorithm was employed in this study for feature selection. The core of this algorithm is maximization of the correlation between a feature and the target feature at the same time as minimization of the correlations among the other features. The correlation at this time was expressed through mutual information (Ju and He, 2018). The mutual information between variables x and y is defined as where, p(x) and p(y) are respectively the probability density function of x and y, and p(x, y) is the joint probability density function of x and y. The measurement indicators of maximum relevance and minimum redundancy are defined respectively as where, S is the feature subset; n is the number of features; I(xi, p) is the mutual information between the 11 features (i.e., six air quality features and five weather features) and the PM2.5 data for the next day; p is the target feature and I(xi, xj) is the mutual information among the 11 features.
The criteria for feature selection are excellent classification performance and the smallest possible number of dimensions. These entail the maximum relevance within the feature set and categories as well as the minimum redundancy among the features. After comprehensive consideration of the aforementioned two measurement indicators, the following criterion for maximum relevance and minimum redundancy is obtained: Markov blanket (Yu et al., 2019): Let F be the feature set and fi be a feature within F. For a feature subset S, if S ⊂ F, then fi ∈ S and the Markov blanket condition of fi is fi ⊥ {F -S -{fi},C}|S, where ⊥ indicates independence and |S means condition on S. In a given S, fi is independent of F -S -{fi} and C, suggesting when S exists, fi makes no contribution to the label C and should therefore be deleted. In addition, this shows that the smallest S that satisfies the aforementioned conditions is the Markov blanket of feature fi.
According to the definition of the Markov blanket, the criteria for determining the approximate Markov blanketcan be obtained (Hua et al., 2020); that is, for features fi and fi (i ≠ j), the Markov blanket criteria for feature fi to be feature fi are The specific steps of the approximate Markov blanket-based nMRMR algorithm used in this study are as follows: Step 1: Initialize F, the set of factors influencing PM2.5 concentration, and establish an empty set S.
Step 2: Calculate the mutual information between each feature in F and the PM2.5 concentration of the next day. Arrange the features in F in descending order by the size of their mutual information.
Step 3: Deposit the first feature in F into S, and delete fi from F.
Step 4: Arrange the features in accordance with the principle of maximum relevance and minimum redundancy. Deposit the arranged features into S.
Step 5: Delete irrelevant and redundant features in S according to the criteria of the approximate Markov blanket.
Step 6: Export the optimal set of factors affecting PM2.5 concentration, which is S.

PSO hybrid Kernel SVR (PSO-HK-SVR)
Based on statistical principles, particularly that of structural minimization, SVR can satisfactorily solve high-dimensional and over fitting problems. The core idea is to use the kernel function to map the imported data to a high-dimensional feature space rather than transforming the nonlinear problem into a linear problem. Through the use of the kernel function, dot product operations in high-dimensional space can be avoided, and an objective function can be formulated as follows: where, w the weight is vector and b is the offset constant. Substituting them into the kernel function yields the optimal hyper plane fitting function: where, ai and ai * are Lagrange multipliers and K(x, xi) is the kernel function. When performing PM2.5 prediction using an SVR, kernel function selection has a decisive influence on the prediction result. Of the numerous types of kernel functions created for SVR applications, the following three types are the most commonly used: (1) Poly Kernel Poly kernel is a global kernel function that has a significant influence on dots that are far apart; it has extremely strong generalizability but weak learning capacity. By contrast, Gaussian RBF is a local kernel function that only influences dots that are relatively close; it has strong learning capacity but weak generalizability (Huanrui, 2016;Liu et al., 2016).
To create a kernel that influences close-together dots as well as far-apart dots in fuzzy prediction, Poly and the Gaussian RBF kernel can be weighted and integrated to formulate a new linear HK (Huanrui, 2016): where, Krbf is the RBF kernel; Kpoly is Poly; q is the polynomial order; σ is the bandwidth parameter of the RBF kernel; r is the weight coefficient for the HK, and r ∈ [0,1]. When r = 1, the HK functions as Poly, whereas when r = 0, the HK is the RBF kernel. An analysis of Eq. (8) reveals that three variables in the equation-namely q, σ and r-must be optimized. Therefore, obtaining the solution to the HK entails optimization of three variables, which can be expressed as x = [q, σ, r]. Smart algorithms such as the genetic algorithm, PSO, and artificial bee colony algorithm are one of the most effective methods of solving such problems. In this study, PSO, the most easily performed smart algorithm, was used to optimize the three parameters of HK-SVR. The core of the PSO algorithm is update of the speed and location of particles (Jiao and Liu, 2009) where, λ is the inertia weight, the value of which generally decreases from 0.9 to 0.4; c1 and c2 are the learning factors, the values of which are typically 2; and u1 and u2 are two random numbers. The PSO algorithm was employed to optimize x = [q, σ, r], a parameter combination in HK-SVR. The parameter combination with the optimal fitness was used in the HK-KELM model. The fitness was obtained by where, yi is the actual value; yi * is the predicted value, and m is the total number of trained samples. The optimized parameters were then substituted into Eq. (8) to derive the final equation of the HK-SVR model.

Construction of nMRMR method-based PSO-HK-SVR model
Because PM2.5 concentration is affected by numerous factors and strong correlations exist among these factors, this study first used the approximate Markov blanket-based nMRMR algorithm to select features. The optimal feature subset was then employed as the SVR model input. Because PM2.5 concentration variation is difficult to describe using a single kernel function, the HK was employed to modify the conventional SVR model; furthermore, the PSO algorithm was adopted to optimize the parameters of the HK. Finally, the nMRMR-PSO-HK-SVR model was constructed in MATLAB. Fig. 2 illustrates the process.
(1) Collect the 2016-2019 year air quality statistics and weather information of Wuhan and produce the time series data of next-day PM2.5 concentration to be used as the model output.
(2) Use the approximate Markov blanket-based nMRMR algorithm to select the optimal feature subset.
(3) Create a training set based on the optimal feature subset selected in Step (2). Subject the training set to normalization using Eq. (10) to eliminate interference caused by unit differences among the features. min max min where, x and x' are the values before and after normalization respectively; and xmin and xmax are the minimum and maximum values of the original data series respectively.
(1) Use the HK to improve the conventional SVR model.
(2) Train the SVR using the normalized training set; use the optimized PSO algorithm to obtain the corresponding parameters and a prediction model.
(3) Input the testing sample to the prediction model to obtain the PM2.5 concentration of the next day.

Prediction Model and Evaluation Indices
In order to verify the validity of the proposed model, the following error evaluation indexes were selected: mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE), mean square error (RMSE) and Theil's inequality coefficient (TIC) (Chen and Li, 2019), which is calculated as follows.
where, yi and yi * represent the measured and predicted values of PM2.5 concentration, and m is the number of samples in the test set. In the evaluation index, MAE, RMSE and MAPE are used to quantify the error of prediction results. The smaller MAE, RMSE and MAPE indicate that the prediction accuracy is higher. TIC is used to evaluate the prediction ability of different prediction models. The smaller TIC is, the better the prediction ability of the model is.
To compare the validity of the nMRMR-PSO-HK-SVR method proposed in this paper, we also use back propagation (BP), support vector regression (SVR), normal maximum relevance minimum redundancy SVR (nMRMR-SVR) and Particle Swarm Optimization-hybrid kernel-SVR (PSO-HK-SVR) to predict the concentration of PM2.5. Finally the results of the five prediction methods are compared and analyzed. In the BP neural network prediction model, two hidden layers are set up; the first hidden layer has 30 neuron nodes and the second hidden has 20 neurons; the iteration step size is chosen to be 0.001; the number of training sessions is 1,000; the training target is taken to be 0.0001. In the SVR and nMRMR-SVR models, the radial basis function is chosen as the nuclear function, and the nuclear function parameters are selected by ten-count cross-validation. In the PSO-HK-SVR and nMRMR-PSO-HK-SVR models, radially based and polynomial kernel functions are selected to form hybrid kernel functions, the parameters of which are determined by a particle swarm optimization algorithm. The parameters of the particle swarm algorithm are set as follows: population size is 25, the number of evolutions is 200 times, and the learning factor is taken to be c1 = c2 = 2. Fig. 3 shows that the concentration of PM2.5 is U-shaped and has obvious seasonal characteristics, with a low concentration in summer and a sharp increase in winter. It can be seen from Table 1 that the current pollution situation in Wuhan and Tianjin has been relieved after the treatment in previous years. And the annual average value is about 45.71 µg m -3 . But the concentration of PM2.5 also often reaches 150 µg m -3 in winter.

nMRMR-method Based Feature Selections
In this study, the 2016-2019 year data for 11 features in Wuhan were obtained to produce the times series of next-day PM2.5 concentration, which is expressed as PM2.5 Table 2 presents the between-feature mutual information values of the 12 features. A greater value indicates stronger correlation. A few features were indeed strongly correlated; therefore, feature selection could not be conducted as if the features were mutually independent. When employed as the output of the prediction model, PM2.5p was most strongly correlated with air pressure, temperature, and the concentrations of PM10, CO, and O3. When the between-feature degree of redundancy was also considered, the optimal subset selected using the nMRMR algorithm was air pressure, temperature, PM10 concentration, 46 concentration and precipitation. These five features were used as the input of the prediction model. Despite being extremely strongly correlated with PM2.5p, CO was not included in the optimal input subset because CO had extremely strong correlation with PM10 and O3. This proves that the nMRMR algorithm indeed takes the redundancy in data into consideration when selecting the optimal input subset.  -Note: PM2.5p represents the concentration of PM2.5 on the next day; the other characteristics represent the observed values on the same day. WD, RF, P, RH, T represents wind speed, rainfall, pressure, relative humidity and temperature, respectively.

Short-term Prediction of PM2.5 in Wuhan City
In order to analyze the validity of the nMRMR-PSO-HK-SVM model proposed in this paper, a total of 1065 data from 2016/01/01 to 2018/11/30 is selected as the training sample set, and a total of 31 data from 2018/12/01to 2018/12/31 is selected as the test sample set for the short-term prediction experiment of PM2.5. Meanwhile, the prediction results of the proposed method are compared with BP method, SVR method, nMRMR-SVR method, and PSO-HK-SVR method. The prediction performance of all prediction models are evaluated using four error indices including MAE, MAPE, RMSE and TIC.
The prediction results of the five models are shown in Fig. 4, and the four error performance indicators (MAE, MAPE, RMSE, TIC) for all models are shown in Table 3. In Table 3, the minimum values for each error performance indicator are marked in bold black. From Table 3, it can be seen that the four error performance metrics (MAE, MAPE, RMSE, TIC) for the prediction results of the nMRMR-PSO-HK-SVR model are minimal compared to the other four methods. This indicates that the hybrid nMRMR-PSO-HK-SVR model proposed in this paper has the best prediction performance performance. To compare the prediction errors of different models more visually, in Fig. 5 we present histograms of MAE, MAPE, RMSE, and TIC for different methods.
In order to further analyze the effect of nMRMR feature selection method and PSO-HK technique on the prediction accuracy, the following three types of comparison testsare conducted in this paper. The first type of comparison test (Comparison Test I) is used to analyze the effect of nMRMR method on the prediction accuracy of PM2.5, in which BP model, SVR model and nMRMR-SVR model are compared and analyzed. The second type of comparison test (Comparison Test II) is used to analyze the effect of Particle Swarm Optimization (PSO) algorithm and hybrid kernel function on the prediction of SVR, in which we compare the prediction results of the SVR model, PSO-HK-SVR. The third type of comparison test (Comparison Test III) is used to analyze the effect of the hybrid model nMRMR-PSO-HK on the prediction accuracy of SVR, in which the PM2.5 prediction results of the nMRMR-SVR model, the PSO-HK-SVR model, and the nMRMR-PSO-HK-SVR model are Contrast analysis. The results of the analysis of the three types of comparison experiments are shown in Table 4. From the results of Table 4, we can obtain the following conclusions.

Comparison results between SVR and BP, nMRMR-SVR and SVR
From Table 3, it can be seen that both SVR and BP models are able to complete the PM2.5 concentration prediction to a certain extent, but the prediction accuracy of SVR model is somewhat improved compared to BP model. The final decision function of SVR is determined by only a few support vectors, and the complexity of the calculation depends on the number of support vectors rather than the dimension of the sample space, which in a sense avoids the "Dimensional catastrophe". Since only a few support vectors are needed to determine the final     result, it helps us to catch the key samples and "eliminate" a large number of redundant samples, which makes the algorithm more "robust". The experimental results also show that the SVR method outperforms the BP network method in terms of stability and generalization Comparing the nMRMR-SVR model with the SVR model shows that the nMRMR-SVR model after feature selection is superior to the traditional SVR model in terms of MAE, MAPE, RMSE, and TIC evaluation indexes. From Table 4, it can be seen that compared with the SVR model the nMRMR-SVR model reduced the values of MAE, RMSE, MAPE, and TIC by 14%, 11%, 15%, and 38% 3.3.2 Comparison results of PSO-HK-SVR and SVR From Table 4, it can be seen that the optimization of SVR by using the hybrid kernel function (HK) and particle swarm algorithm (PSO) significantly improves the prediction accuracy of SVR for PM2.5. Compared with SVR model, the values of MAE, RMSE, MAPE, and TIC of PSO-HK-SVR model are reduced by 15%, 6%, 32%, and 32%, 10%.
Therefore, optimizing the parameters of the kernel functions by mixing the kernel functions and PSO can improve the predictive ability of SVR. Furthermore, it can be seen that SVR cannot reduce the prediction error more effectively by relying only on a single radial basis function. This is because the radial basis kernel function, as a local kernel function, does not have a strong generalization ability, and cannot accurately track the sudden changes in PM2.5 concentration, which limits the prediction accuracy of the model to a certain extent and makes it difficult for the traditional SVR model to warn of sudden air pollution events. Therefore, the use of mixed kernel functions has a significant impact on the improvement of the prediction accuracy.

Comparison results of nMRMR-PSO-HK-SVM and nMRMR-SVR,PSO-HK-SVR
From Table 3, it can be seen that the nMRMR-PSO-HK-SVR model has a higher prediction accuracy compared to the nMRMR-SVR model, PSO-HK-SVR model, and its prediction results are closer to the measured values. The four error performance indicators of the nMRMR-PSO-HK-SVR model are also minimal. From Table 4, it can be seen that compared with the nMRMR-SVR model the values of MAE, MAPE, RMSE and TIC of the nMRMR-PSO-HK-SVM model are reduced by 47%, 34%, 64% and 39%; compared with the PSO-HK-SVR model the values of MAE, MAPE, RMSE and TIC of the nMRMR-PSO-HK-SVM model are reduced by 20%, 24%, 51% and 24%.
In the nMRMR-PSO-HK-SVM model, the input vector is still the optimal subset elected by the nMRMR algorithm. But in order to improve the shortcomings of the nMRMR-SVM model and enhance the generalization ability of the model, the hybrid kernel function (HK) is constructed by Eq. (8) and used as the kernel function to improve the model. Compared with the radial kernel function (Chen and Pai, 2015), the hybrid kernel function in this paper is more suitable to describe the complex changes of PM2.5.
The weight coefficients of the mixed kernel function are determined by the PSO search algorithm (Zhou et al., 2017). From the prediction results in Fig. 4, the model prediction results are very close to the measured values. And the prediction accuracy is also high in some positions with large fluctuations, which further indicates that the optimized mixed kernel function can further enhance the generalization ability of the model.

Short-term Prediction of PM2.5 in Tianjin City
In order to further systematically and comprehensively analyze the validity and applicability of the proposed nMRMR-PSO-HK-SVR method, we use the Tianjin PM2.5 data for short-term prediction experimental analysis. In order to analyze the validity of the nMRMR-PSO-HK-SVM model proposed in this paper, a total of 1065 data from 2016/01/01 to 2018/11/30 is selected as the training sample set, and a total of 31 data from 2018/12/01 to 2018/12/31 is selected as the test sample set for the short-term prediction experiment of PM2.5 in Tianjin. The prediction results of PM2.5 concentration for different models are shown in Fig. 6, and the prediction performance indexes of MAE, MAPE, RMSE and TIC for each prediction model are calculated and the results are shown in Fig. 7, Fig. 8, Table 5 and Table 6.
From the experimental results, it can be seen that different models can obtain similar results as those of Wuhan in predicting PM2.5 in Tianjin. Compared with the BP model, SVR model, Volume 21 | Issue 6 | 200144   Compared with the nMRMR-SVR model, the MAE, MAPE, RMSE, and TIC of the nMRMR-PSO-HK-SVR model are reduced by 49%, 37%, 65%, and 47%. Compared with the PSO-HK-SVR model, the MAE, MAPE, RMSE, and TIC of the nMRMR-PSO-HK-SVR model are reduced by 34%, 30%, 53%, and 39%. The experimental results for the prediction of PM2.5 in Tianjin further confirmed that the proposed model is very suitable for the prediction of PM2.5 concentration and has a strong adaptive capability. It also demonstrates that the hybrid model (nMRMR-PSO-HK-SVR) has better prediction performance than the single input feature selection model (nMRMR-SVR) and the single kernel function optimization model (PSO-HK-SVR). From the experimental results we can also see that the input feature selection (nMRMR) and particle swarm optimization hybrid kernel function (PSO-HK) can indeed improve the prediction ability of SVR.

Long-term Prediction of PM2.5 in Wuhan City
In order to evaluate the performance of the model more comprehensively, a total of 1096 data from 2016/01/01to 2018/12/31 in Wuhan is selected as the training sample set, and a total of 181 data from 2019/01/01 to 2019/06/30 is selected as the test sample set for the long-term prediction experiment of PM2.5. Fig. 8 presents the results of the comparison between predicted and measured values for the long-term predictions of the different models. It can be seen that the trend of the predicted and measured values of the nMRMR-PSO-HK-SVR model is almost the same, especially for the location of the "PM2.5 concentration peak", which also has a good dependence effect. The long-term prediction results of PM2.5 concentration in Wuhan show that the nMRMR-PSO-HK-SVR model proposed in this paper can be adapted to the long-term prediction needs, and can be used not only for the prediction of PM2.5 concentration in the weather with good air quality, but also for the prediction of PM2.5 concentration in the heavily polluted weather. Table 7 gives the values of the error performance indicators for the five models when predicted in the long term. As can be seen in Table 7, the four error performance metrics of the nMRMR-PSO-HK-SVR model are also minimal in the long-term predictions. The MAE, MAPE, RMSE, and TIC decreased by 6, 11%, 10, and 0.13 compared to the SVR model. The MAE, MAPE, RMSE, and TIC decreased by 6, 11%, 10 and 0.13 compared to the nMRMR-SVR model. The MAE, MAPE, RMSE, and TIC decreased by 4, 8%, 6, and 0.05 compared to the PSO-SVR model. The MAE, MAPE, RMSE, and TIC decreased by 1, 5%, 4, and 0.04 compared to the nMRMR-PSO-HK-SVR model.
The scatter plot of the predicted and measured values of the five methods on the test set is shown in Fig. 9. In Fig. 9, the horizontal coordinates indicate the measured values, the vertical coordinates indicate the predicted values, the trend line indicates the regression line between the predicted and measured values, and R 2 indicates the coefficient of determination, with the closer R 2 to 1 indicating a stronger linear relationship between the predicted and measured values. From the scatter plots, it can be seen that the R 2 for the predicted and measured values of the SVR model, nMRMR-SVR model, PSO-SVR model, PSO-HK-SVR model and nMRMR-PSO-HK-SVR model are 0.74, 0.88, 0.89, 0.92, and 0.97. nMRMR-PSO-HK-SVR model has the largest R 2 which is close to 1, indicating that the nMRMR-PSO-HK-SVR model is also close to the measured value in the long-term prediction of PM2.5.
Based on the above experimental results, the combined nMRMR and PSO-HK-SVR can achieve accurate prediction of PM2.5 concentrations over longer periods and under different weather quality conditions, indicating that the nMRMR-PSO-HK-SVR model is feasible and reliable for application.

CONCLUSIONS
Fine particulate matter PM2.5 is an important air pollution measurement data, and the prediction of PM2.5 is of great significance for environmental protection. Considering the efficiency, practicability and accuracy of the prediction, this manuscript firstly extracts the optimal feature subset from   the air quality data and meteorological data by the approximate maximum correlation minimum redundancy algorithm. Then, the optimal feature subset was used as the input, and the mixed kernel function support vector regression model was used to predict the PM2.5 concentration in the next 24 h (the next day). The optimal parameters of the hybrid kernel function can be adaptively determined by the particle swarm optimization algorithm.
(1) Air quality elements and weather elements are both strongly correlated with PM2.5 concentration. However, data redundancy between features exists because some features are strongly intercorrelated. This hinders the precision of the SVR model in predicting PM2.5 concentration.
(2) The approximate Markov blanket-based nMRMR algorithm considers the correlations among features while considering the correlations between ordinary features. The optimal feature subset selected by the algorithm can retain the majority of data even though its dimensionality is reduced. (3) When using the SVR model for PM2.5 concentration prediction, the prediction precision is strongly affected by the kernel function employed. Because of the high complexity of PM2.5 concentration variation, a single kernel function has difficulty describing all the varying features. By comparison, a PSO-based HK is more appropriate for describing complex variation. (4) The prediction experiments of PM2.5 concentrations in Wuhan 2018 and 2019 show that the nMRMR-PSO-HK-SVM model has higher prediction accuracy. Compared with the traditional SVR model, the MAE, MAPE, RMSE and TIC of PM2.5 short-term prediction decreased by 5, 11%, 10, 0.11, respectively. The MAE, MAPE, RMSE, and TIC of PM2.5 long-term prediction decreased by 6, 11%, 10, and 0.13 respectively. In the case of known air quality and weather data, the proposed model in the manuscript can effectively predict the PM2.5 concentration value in the next day (24 h). After adjusting the predicted data structure, the proposed model can also be used to predict the PM2.5 concentration in the next 2 days (48 h) or 3 days (72 h), however, the accuracy of long-term forecasting is affected by the accumulation of forecasting errors. How to make a more accurate long-term prediction of PM2.5 concentration is what we will continue to study in the future