Forecasting PM 2.5 in Malaysia Using a Hybrid Model

Predicting future PM 2.5 concentrations based on knowledge obtained from past observational data is very useful for predicting air pollution. This paper aims to develop a hybrid forecasting model using an Artificial Neural Network (ANN) and Triple Exponential Smoothing (TES) on clustered PM 2.5 data from a HPR (High Pollution Region), MPR (Medium Pollution Region), and LPR (Low Pollution Region) in Malaysia. Historical PM 2.5 concentrations in Malaysia from January 2018 to December 2019 were used to develop a hybrid model. The proposed hybrid model was then evaluated in terms of Mean Absolute Percentage Error (MAPE) values by comparing them with real PM 2.5 data from the year 2020 in the HPR, MPR and LPR. The results showed that the hybrid model of ANN and TES presented the lowest RMSE (Root Mean Squared Error) (4.25–8.56 µ g m –3 ), MAE (Mean Absolute Error) (2.51–4.95 µ g m –3 ), MAPE (0.13–0.2%), and MASE (Mean Absolute Scaled Error) (1.45–2.01) in different areas of pollution compared with other models. The comparison between the ANN and TES hybrid models and the real PM 2.5 data in 2020 showed that the models gave sufficient accuracy in the HPR and MPR with MAPE values of between 20% and 50%, while the LPR showed less accuracy due to the high value of MAPE of more than 50%. Overall, the hybrid model developed in this study opens up a new prediction method for air quality forecasting and is sufficiently accurate to be used as a tool for air quality management.


INTRODUCTION
Air is a mixture of aerosols, water vapor and gases that are required for living. Biomass burning and the release of hazardous particle pollutants, including PM2.5 and PM10, are among the factors that lead to air pollution (Zanobetti et al., 2009). PM2.5 is the most hazardous component of these pollution particles because it can easily penetrate deep into the lungs and irritate and erode the alveolar walls, therefore harming lung function (Xing et al., 2016). Vehicles, industry, and burning activities all contribute to high PM10 and PM2.5 concentrations in Malaysia (Fong et al., 2018). The Environmental Quality Report (DOE, 2020) reported that annual average concentrations of PM10 from 2010 to 2020 ranged from 20 to 53 µg m -3 , while the annual average concentrations of PM2.5 from 2018 to 2020 ranged from 12 to 20 µg m -3 in Malaysia. Even though the annual trend of PM2.5 is still below the IT-2 National Ambient Air Quality Standard in Malaysia of 25 µg m -3 , PM2.5 is a critical issue that requires immediate regulation and policy development to address the problem globally. Thus, accurate prediction and forecasting of PM2.5 trends are crucial for improved understanding and management of the problems connected with air pollution.
Numerous studies have shown that artificial intelligence technology is more accurate than traditional statistical approaches (Biancofiore et al., 2017;Mingjian et al., 2011;Taheri Shahraiyni and Sodoudi, 2016). Artificial neural networks are the most extensively used method for statistical predictions of PM2.5, where the advances are based on historical data (Liao et al., 2017;Ventura et al., 2019). In most circumstances, the artificial neural network (ANN) interpolation method has considerable advantages, especially when air quality network density is limited (Alimissis et al., 2018). ANN models are effective at forecasting air quality time series, yet they have some drawbacks. Since the real world is so complicated, there are both linear and non-linear patterns in air quality time series. It is not enough to employ a non-linear model for time series data since the non-linear model may miss some linear elements of the data. Previous research (Denton, 1995;Zhang, 2003) has shown that employing ANN to model linear problems can yield varied outcomes. Integrating ANN with other forecasting models and creating a hybrid model can help solve this problem. One of the most successful forecasting approaches has been discovered to be the exponential smoothing model. Because of its simplicity and ability to include trend and seasonality in data (Roy et al., 2018), the exponential smoothing method was chosen for this study.
Previous research has revealed that hybrid models combined with exponential smoothing models outperform both linear and non-linear models. These types of hybrid models are widely employed in some applications. With advances in the hybridization of various models and algorithms, overall effectiveness and performance of forecasting will increase. Besides, the combination is also much better in terms of time and costs (Pauzi and Abdullah, 2019). Kamisan et al. (2021) applied a combination of an autoregressive integrated moving average model (ARIMA) and single exponential smoothing for wind speed predictions, where the performance model showed an RMSE value of 0.55. Hartomo and Nataliani (2021) developed a rainfall prediction algorithm using an exponential smoothing model and a clustering algorithm, and the performance of the model showed an MAE value of 75.87. Electrical load forecasting using an exponential smoothing model and ANN by Sulandari et al. (2016) and Shukur et al. (2014) had performance models showing MAPE values of 0.9% and 4% to 8%, respectively. Lai et al. (2006) applied this hybrid model to financial time series forecasting and the performance model showed RMSE values of 0.0035 to 0.66. Safi and Sanusi (2021) developed an ANN and exponential smoothing model for COVID-19 time series data, and the performance model showed an MAE value of 63.95. The most recent deep learning technique using ANN, namely Bidirectional Long Short Term Memory (bi-LSTM) combined with exponential smoothing, has been developed for crime forecasting (Butt et al., 2022) and the performance model showed RMSE, MAPE and MAE values of 13.104-13.77, 0.4% and 9.837-10.896 respectively. While hybrid exponential smoothing and ANN models have been employed in some applications, they have not been used so much in air quality. Since the combination of ANN and an exponential smoothing model can capture both linear and non-linear patterns in time series air quality data simultaneously (Pauzi and Abdullah, 2019), it is necessary to hybridize linear and non-linear models for PM2.5 forecasting. In this work, we designed a PM2.5 forecasting model that combined a neural network-based hybrid model with an exponential smoothing model for two years of time series data in Malaysia.
The objective of this study was to develop a hybrid forecasting model that combined ANN and an exponential smoothing model on the clustered PM2.5 in Malaysia. This hybrid model can improve the model performance with the compilation of linear and nonlinear models to forecast PM2.5 concentration in Malaysia regarding to the different regions of pollutant. Cluster analyses using Agglomerative Hierarchy Cluster (AHC) and spatial classifications were used to analyze the overall trend of PM2.5 from 2018 to 2019 at the 65 air quality monitoring stations in Malaysia based on Rahman et al. (2022). The hybrid model was then compared with other development models, such as a combination of multiple linear regression with exponential smoothing and a single exponential model, for assessment of the accuracy of forecasting parameters, which were then used to choose the optimum model for PM2.5 forecasting.

Model Framework
The framework of the prediction model is discussed in this section. The analysis was carried out with some of the most extensively used time series forecasting models: a neural network model and multiple linear regression. Multiple linear regression (MLR) is a well-established statistical forecasting method for predicting variables by quantitatively describing the linear connection between the target variable and multiple independent variables. The MLR model is ideal for particle concentration prediction since numerous factors can affect particle concentrations. We also discuss other forecasting methods, including the triple exponential model, and undertake a comparative analysis to determine which model is the best fit. The flowchart for the process is shown in Fig. 1 for clarity.

The multi linear regression model (MLR)
Multiple linear regression is a common method for fitting a data set to a model in which the predicted variable yi is linearly dependent on numerous predictor variables x1,i, x2,i, …, xk,i. The following are the details of a multiple linear regression model: here k is the number of predictor variables, β1, β2, … are the regression coefficients and ei is an error term which represents the difference between the forecasted ŷi and the measured yi (Abdullah et al., 2020). In this study, x referred to the air pollution and meteorological parameters, and y referred to PM2.5 levels. Three MLR models were developed based on the cluster development from the AHC, which were the HPR, MPR, and LPR, and ten air pollutant and meteorological parameters were used as input variables, as shown in Supplementary 1.

The artificial neural network (ANN) Model
ANNs have been successfully employed in numerous short-and long-term forecasting applications in recent years (Cabaneros et al., 2017;Coman et al., 2008;Lightstone et al., 2017). A back-propagation ANN was used in this work for prediction and to determine the most critical parameters influencing PM2.5 levels. The input layer, hidden layer, and output layer are the three layers that make up this model. The hidden layer values, with 9 to 13 nodes and total input numbers (ten parameters), were calculated, while the output layer (independent test set) had a single node as shown in Supplementary 2. The learning rate for training the neural network is 0.1. The output layers are the PM2.5 values. Based on the cluster formation using AHC according to the PM2.5 pollution level area, three ANN models were developed: HPR, MPR, and LPR, with ten air pollutant and meteorological characteristics as input variables.
Using data from ambient air quality monitoring, an ANN model was created using JMP Pro v. 16.0 software, with 70% of the 730 data points used for training, 15% for validation and 15% for testing for each HPR, MPR, and LPR. Holdback with TanH sigmoid activation was employed as the model approach (Feenstra et al., 2021;Gotwalt et al., 2009). The equation for the neural network was a combination of linear with independent variables with respective weights and bias (intercept) terms for each neural and defined as follows: where Z is the donation of output which is PM2.5, Wn is the weights or beta coefficients, Xn is the independent variables or inputs, and the bias or intercept is equal to W0. The output layer nodes are dependent on their immediately preceding hidden layer, and those nodes are further derived from the input variables.

The Holt-Winters method or Triple Exponential Smoothing (TES)
The Holt-Winters method, also known as Triple Exponential Smoothing (TES), was proposed in the early 1960s. TES can model seasonality, trend and level components for univariate time series data. It smooths the series using three exponential smoothing formulas, which are represented by α, β and γ hyper-parameters (Gardner Jr and Diaz-Saiz, 2008). It employs an exponential smoothing technique with three exponential smoothing formulas used in the TES model. The mean is smoothed to produce a local average value for the series (Eq. (3)). After smoothing the trend, each seasonal sub-series is smoothed separately to provide a seasonal estimate for each season (Eq. (4)). Using the TES additive or multiplicative methods, the exponential smoothing formula is applied to a series with a trend and a constant seasonal component (Eq. (5)).
where α, β and γ are the smoothing parameters, ât is the smoothed level at time, t, b� t is the change in the trend at time, t, ŝt is the seasonal smooth at time, t, and p is the number of seasons per year. The smoothing parameters have been determined by minimizing the sum squared onestep-ahead of forecast errors. When the amplitude of seasonal variation remains constant over time, an additive model is used; when it increases over time, a multiplicative model is used (Winters, 1960). This study employed the TES additive model, with the h-step-ahead prediction calculated as the sum of the level components, trend, and seasonality (Eq. (6)). There is regularity, seasonality and a trend in the PM2.5 time series. These characteristics demonstrate that the TES model is appropriate for forecasting PM2.5 data.
We extended our research to create a hybrid model that combined TES and ANN to estimate PM2.5 concentrations for forecasts one year in the future. Additionally, TES with MLR prediction models were created to evaluate their effectiveness against the suggested hybrid model.

The hybrid model
Exponential smoothing is a type of linear model that can capture linear characteristics in time series, whereas ANN models can model non-linearity and capture non-linear patterns in time series. Combining an exponential smoothing model and an ANN model may produce a more robust method and more satisfactory forecasting results (Shukur et al., 2014;Sulandari et al., 2016;Safi and Sanusi, 2021). As discussed in Smyl (2020), the method uses exponential smoothing equations to successfully capture the main components of the individual series while ANN allows for non-linear trends and cross-learning. In this case, data was used in a hierarchical way, which meant that air pollutant parameters and meteorological parameters were used to extract and combine information from a series or a dataset level, improving forecasting accuracy. As shown in Fig. 2, we present a hybrid methodology for PM2.5 time series forecasting that incorporates exponential smoothing and ANN.
Combining both ANN and the TES algorithm, we proposed the algorithm for the model, given step by step as follows: Step 1: The inputs are entered into the ANN model as X1, X2, X3, … Step 2: The layers of nodes are dependent on their immediately preceding hidden layer, and those nodes are further derived from the input variables. These middle, hidden layers create the features that the network automatically creates and which do not have to be explicitly derived. Let us say there are four hidden nodes used -the equation will be the following: (7) N2 = W1X1 + W2X2 + W3X3 + W4X4 + … Wn × Xn + W0 N3 = W1X1 + W2X2 + W3X3 + W4X4 + … Wn × Xn + W0 N4 = W1X1 + W2X2 + W3X3 + W4X4 + … Wn × Xn + W0 Step 3: Then, the output layer will be derived from the following equation: Z (in TanH) = WZ1N1 + WZ2N2 + WZ3N3 + WZ4X4 + Bias (8) where Nn is a node for the hidden node, WZn or Wn is a weight or beta coefficient, Xn is an input or independent variable and bias = W0.  Step 4: The model prediction output, Z, is in turn entered into the exponential smoothing model. The following formula is used as the outcome of the hybridization process: where ât is the smoothed level at time, t, b� t is the change in the trend at time, t, ŝt is the seasonal smooth at time, t, p is the number of seasons per year, and h is the one-step-ahead prediction.

Performance Indicators
We used daily average PM2.5 data from monitoring stations clustered into HPR, MPR, and LPR to test the MLR and ANN models. The performance indicators used were (Dotse et al., 2018;Nash and Sutcliffe, 1970;Taşpınar and Bozkurt, 2014;Yang et al., 2016):

Assessment of Exponential Smoothing Model and Forecasting Model
Many forecasting accuracy metrics have been created, and various authors have described the fundamental applications of these measurements as well as compared the accuracy of forecasting systems using univariate time series data (Cryer and Chan, 2008;Hyndman and Athanasopoulos, 2018). The development of the exponential smoothing model and hybridizing models were compared using four distinct forecasting accuracy measuring criteria: mean absolute error (MAE), root mean squared error (RMSE), mean absolute percentage error (MAPE), and mean absolute scaled error (MASE).
Suppose y1, y2, …, yn denotes the dataset. The MAE is then defined as: The RMSE is defined as: The RMSE is well-known, mainly due to its theoretical usefulness in statistical modeling. However, because this measure is more sensitive to outliers than the MAE, some authors (Armstrong, 2001;Hyndman et al., 2002) have urged that other predicting accuracy measures should be used. When there are outliers, the MAE is the better option. When comparing predicting systems on a single data set, the MAE or RMSE should be used. If all forecasts are measured on the same scale, the MAE and RMSE should be employed.
The MAPE is defined as: The MAPE is scale-invariant and unit-free because it shows the forecast error as a percentage (Lyhagen et al., 2015). The MAPE is a simple average of absolute percentage errors. When comparing the accuracy of the same or different algorithms on distinct time series data with varying scales, MAPE is recommended unless the data contains zeros or small values (Hyndman and Koehler, 2006). The evaluation criterion for various forecasting accuracy metrics is that the smaller the number obtained, the better the model's forecasting abilities (McKenzie, 2011). Table 1 can be used to determine the MAPE criteria (Febrian et al., 2020).
The MASE is defined as: where ŷi is the forecasting data, yi is the actual data, i is the time period, and n is the number of time periods.

Clustering and Data Analysis
The three clusters are referred to as the High Pollution Region (HPR), medium pollution region (MPR), and low pollution region (LPR) based on the mean and median values for each cluster, as shown in the box plot in Supplementary 3. The AHC classification of regions (HPR, MPR, and LPR) was based on PM2.5 pollution levels in Malaysia. As shown in Supplementary 4, there were 19 HPR stations, 37 MPR stations, and nine LPR stations among the 65 stations monitoring PM2.5 concentrations in Malaysia. The highest mean and median values of PM2.5 were used to classify HPR stations in Peninsular Malaysia's central and southern regions, as well as one station in Sarawak. The majority of MPR stations were located in Peninsular Malaysia's eastern, southern, and northern regions and also in Sabah, with one station in Sarawak. All LPR stations located in Sarawak recorded the lowest mean and lowest median values. Supplementary 5 displays the minimum, maximum, first quartile, median, third quartile, and mean values for HPR, MPR, and LPR between the whiskers. Supplementary 6 depicts the monthly average PM2.5 concentrations by cluster from 2018 to 2019. Except for August, when the monthly PM2.5 concentration was slightly lower than in LPR, HPR had the highest monthly PM2.5 concentration compared to MPR and LPR. Except for July, August, and September, PM2.5 concentrations in LPR were slightly lower than in MPR. In September and December, the highest and lowest monthly PM2.5 levels were recorded. The southwest monsoon usually occurs between May and September, while the northeast monsoon occurs between November and March. As a result, monsoon events were most likely responsible for the highest and lowest monthly PM2.5 concentrations, which occurred in September and December, respectively. PM2.5 concentrations are usually higher in the southwest monsoon (May-September) than in the northeast monsoon (November-March) (Abdullah et al., 2017). Higher PM2.5 concentrations during this time period can be attributed to drier meteorological conditions, a stable atmosphere, local effects, and transboundary movement of air pollution from biomass burning in neighboring countries.

Evaluation of Prediction Model
Two types of prediction models have been developed for HPR, MPR, and LPR, an MLR model and an ANN model. Table 2 lists the performance indicators of the R 2 , RMSE, IA, E, and the percentage of deviation of the MLR and ANN models for PM2.5 concentrations in HPR, MPR, and LPR. The best model is revealed when the evaluated values are close to zero by error measurements, such as RMSE, whereas the best model is revealed when the evaluated values are close to one by accuracy measurements, such as R 2 , E, and IA. The best outcomes are highlighted in bold ( Table 2). In general, the results demonstrated that the ANN and MLR models were essentially equal. However, the results showed that the ANN model, rather than the MLR model, was able to reduce error and achieve high accuracy in HPR, MPR, and LPR. This is consistent with Li et al. (2016), who showed that ANN outperformed MLR for PM2.5 prediction in agricultural parks. In Biancofiore et al. (2017), the findings for PM2.5 prediction also showed that ANN performed better than MLR. Fig. 4 illustrates a comparison plot of measured and predicted PM2.5 values for HPR, MPR, and LPR. As the performance indicator gives the lowest percentage deviation with 0.13%, 0.19%, and 0.52% for HPR, MPR, and LPR correspondingly, it can be shown that the predicted results using ANN were extremely close to the original observed results. The statistical indices of the non-linear model, in particular ANN, were substantially better in comparison with those of the reference model, indicating that it was a better model in the prediction of PM2.5 than the linear model.
The level, trend, and seasonal characteristics of PM2.5 concentrations for HPR, MPR, and LPR were investigated in this study by applying additive triple exponential smoothing (TES). The model performance of the exponential smoothing prediction model for each cluster is shown in Table 3. The values of β and γ for each of the clusters showed that the PM2.5 concentrations in the HPR, MPR, and LPR were influenced by trend and seasonal factors. As seen from the readings of PM2.5 concentrations recorded from 2018 and 2019, PM2.5 concentrations will increase when the dry weather arrives when Malaysia is in a state of monsoon transition from March to May and during the southwest monsoon from August to October each year. These conditions recur every year, and this means that PM2.5 concentrations are influenced by seasonal factors. We found that the MPR showed better performance, with the highest R 2 value of 0.844 and the lowest RMSE value of 3.597. The seasonality is an extremely important factor in determining the effectiveness of this prediction model in MPR due to the fact that emissions are continuous and Table 2. Performance evaluation of the multiple linear regression (MLR) and artificial neural network (ANN) models in high pollution regions (HPR), medium pollution regions (MPR) and low pollution regions (LPR).  Comparing the single prediction models of ANN, MLR and TES, ANN gave the best performance prediction model with the lowest value of RMSE and the highest value of R 2 in HPR, MPR and LPR as shown in Table 4. It is proven that the ANN give the more accuracy in prediction of PM2.5 in Malaysia. The TES model gave the worst performance compared to the others. However, integrating TES with other single model will give the better performance to forecast the future time series PM2.5 because of its simplicity and ability to include trend and seasonality in data and to overcome model linear problems in ANN.

Assessment of Hybrid Model
In this section, we combine the prediction models used in this study, which were ANN and MLR models, with the TES model as a hybrid model to predict long-term PM2.5 for one year. We carried out forecasting of PM2.5 data in 2020. Fig. 5 shows a forecasting plot of the MLR-TES and ANN-TES models of PM2.5 values for HPR, MPR, and LPR. The forecasting plot using hybrid models resulted in good patterns for the long-term forecasting of PM2.5 in HPR, MPR, and LPR, with PM2.5 concentrations forecasted in HPR slightly higher compared with MPR and LPR. The forecasted values of PM2.5 did not exceed the daily average IT-2 National Ambient Air Quality Standard in Malaysia, which is 50 µg m -3 for all clusters. However, the best hybrid models were chosen based on their performance.
As demonstrated in Fig. 6, the hybrid model from ANN_TES produced the best performance compared with others for HPR, MPR, and LPR with the lowest RMSE (4.25-8.56 µg m -3 ), MAE (2.51-4.95 µg m -3 ), MAPE (0.13-0.2%) and MASE (1.45-2.01). When comparing the best performance of the ANN_TES hybrid model among all regions, the ANN_TES hybrid model for MPR performed better than other clusters because all evaluations gave low predictive performance parameter values with MAE, RMSE, MAPE, and MASE values of 2.51 µg m -3 , 4.25 µ µg m -3 , 0.13% and 1.45, respectively. The ANN_TES hybrid model for LPR performed less well than others, as all evaluations gave high predictive performance parameter values except for MAE, where the recorded value for HPR was higher than LPR. Among other applications, the performance hybrid model of ANN-TES from this study to forecast PM2.5 concentration produced the good results in term of RMSE, MAE and MAPE as shown in Table 5.   We used the real data set of daily average PM2.5 concentrations from 1 January 2020 to 31 January 2020 to compare the hybrid model of ANN_TES developed in HPR, MPR, and LPR, as shown in Fig. 7. The pattern analysis that showed the best performances were the smallest RMSE and MAPE values. Based on the MAPE values in Table 6, the accuracy of PM2.5 forecasting in HPR and MPR was in the right category (20-50%) with values of 33% and 30%, respectively. In contrast, the accuracy of PM2.5 forecasting in LPR fell under the not accurate category (> 50%), where the MAPE value was 71%.
In 2020, Malaysia was faced with the "Movement Control Order" to control the spread of the COVID-19 virus, where premises and schools were closed, people were instructed to stay at home,   and only essential sectors were allowed to operate. Malaysia also experienced wet weather conditions in the same year. These two situations contributed to the high accuracy of forecasting in HPR and MPR, where less air pollution was released in 2020 compared with 2018 and 2019. There were no major haze incidents in LPR that contributed to the lower accuracy of forecasting in that particular year if compared with 2018 and 2019. This study suggests that hybridizing both models can produce robust methods and more satisfactory prediction results compared to a single model (Roy et al., 2018;Safi and Sanusi, 2021). Overall, the ANN_TES hybrid model demonstrated its capability as a good predictive model and is suitable for long-term PM2.5 predictions for clusters in HPR and MPR but not in LPR.

CONCLUSIONS
As air pollution continues to have an impact on quality of life, there is a need for a framework that not only monitors but also analyses data and anticipates air quality. It is crucial to have an accurate forecasting system to ensure that people are aware of future air quality well in advance. In this study, a hybrid forecasting model of ANN and TES was successfully implemented on clustered regions based on PM2.5 concentrations, namely High Pollution Regions (HPR), Medium Pollution Regions (MPR), and Low Pollution Regions (LPR). The models were analyzed using parameters including the mean absolute error (MAE), the root mean squared error (RMSE), the mean absolute percentage error (MAPE), and the mean absolute scaled error (MASE). In comparison studies, the ANN_TES model provided the most accurate forecasting performance metrics for HPR, MPR, and LPR and the best hybrid model had the minimum RMSE (4.25 µg m -3 ), MAE (2.51 µg m -3 ), MAPE (0.13%), and MASE (1.45), respectively, in MPR. The ANN_TES model to forecast PM2.5 concentration also showed the good performance results compared with other hybrid models in other applications. Based on the comparison of the hybrid model ANN_TES and the real data set of PM2.5 in 2020, HPR and MPR showed sufficient accuracy of forecasting performance while LPR showed less accuracy referred to the MAPE value. There was still a gap between the actual data and the predicted data from the hybrid model, so future work would utilize some deep learning approaches to automatically understand the temporal dependencies and handle temporal patterns such as trends and seasonality.