Development of a PM2.5 Forecasting System Integrating Low-cost Sensors for Ho Chi Minh City, Vietnam

Air pollution is a serious concern in urban areas, especially cities such as Ho Chi Minh City (HCMC). Because the air quality directly affects people’s health, air quality monitoring is urgently needed. In this study, the models of Weather Research and Forecasting (WRF), Sparse Matrix Operator Kernel Emission (SMOKE), and Community Multiscale Air Quality (CMAQ) were integrated to develop an air quality forecasting system. Drawing input data from transportation and industrial emission inventories, the forecasting system was calibrated and configured using local parameters to deliver hourly forecasts for HCMC. To increase the accuracy of WRF and the meteorological forecasting, the global DEM and land use data were replaced by Lidar data, and land use data were also retrieved from MODIS. Output from the MOZART model served as the boundary conditions for CMAQ, and AOD values reported by the MODIS Aerosol Product were assimilated to enhance the accuracy of the results. A low-cost PM2.5 sensor connected to a LinkIt ONE, a development board for Internet of things (IoT) devices, was employed for calibration and verification. The strong correlation (R = 0.8) between the measured and predicted concentrations indicates that the estimates delivered by the proposed forecasting system are consistent with the values obtained via monitoring.


INTRODUCTION
Air pollution is currently one of the most pressing public health issues and a major challenge to the environment. According to the World Health Organization (WHO), air pollution causes 7 million early deaths every year (Prüss-Ustün et al., 2016). Air pollution is one of the top four causes of premature deaths worldwide, and it creates an economic burden of approximately USD 225 billion (World Bank, 2016).
Atmospheric particles can cause multiple effects on human health and the environment. Particulate matter with an aerodynamic diameters of less than 10 µm (PM 10 ) have long been implicated in adversely affecting health and increasing mortality (Dockery and Pope, 1994); however fine (PM 2.5 ) and ultrafine particles pose an even higher risk than PM 10 (Donaldson et al., 1998;Schwartz and Neas, 2000;Ostro et al., 2006). Atmospheric particles also interact directly and indirectly with the earth's radiation energy balance and can subsequently affect the global climate (Liu and Daum, 2002). * Corresponding author.
Tel.: +84 908 275 939 E-mail address: kyphungng@gmail.com According to the Environmental Performance Index published in 2017 by Yale University, which ranked countries on the basis of environmental issues, Vietnam only achieved a score of 49.9/100 on air quality, with a ranking of 170 out of 180 countries. Ho Chi Minh City (HCMC) is the largest and most populous city in Vietnam. Growing industrial activity and vehicular traffic in HCMC have led to an increase in all aspects of environmental pollution, of which air pollution is a major issue that considerably affects the quality of life of its residents (Nguyen and Pham, 2002). Air pollution in HCMC is mostly caused by emission from transport vehicles and industries. The city has the highest number of motorcycles in the world (7.3 million) and more than 600,000 cars, which consume 4 million liters of fuel per day. In the first 3 months of 2017, the average concentration of PM 2.5 in HCMC was higher than the Vietnamese national standard (50 µg m -3 ) and WHO standard (25 µg m -3 ) for 6 and 78 days, respectively. The air quality in HCMC for the first 3 months of 2017 was lower than that for the same period in 2016. The average air quality index in the first quarter of 2017 (100.8) was higher than that in the first quarter of 2016 (91.2), and the average PM 2.5 concentrations in the first quarters of 2017 and 2016 were 35.8 µg m -3 and 30.72 µg m -3 , respectively (Nguyen et al., 2018).
Currently available information on air quality is inadequate in quantity and quality for the authorities to warn people regarding the status of air quality, create health safeguards, and provide forecasts. To improve the quantity and quality of air quality information, the creation of processes and tools that can provide detailed air quality forecasts in nearreal time for residents as well as the development of agencies in HCMC to analyze air quality conditions are necessary.
In this study, we developed a forecasting system and air quality forecasting process based on simulation results of the PM 2.5 concentration. The research was divided into specific sections as follows: • Data collection and development of an emission inventory for transport and industry by using the SMOKE 4.5 model. • Development of a meteorological forecasting process by using the Weather Research and Forecasting (WRF) 3.9 model with customized parameters for HCMC. • Developing a process for forecasting the PM 2.5 concentration by using the Community Multiscale Air Quality (CMAQ) 5.2 model with customized parameters for areas in HCMC. • Integrating low-cost sensors to measure the surface PM 2.5 concentration and MODIS satellite data to enhance the forecast results.

Development of the Emission Inventory Traffic Emission
In January 2017, the Department of Transportation announced HCMC's online traffic management portal. The system has more than 300 cameras for tracking traffic conditions continuously 24 hours a day and 7 days a week. 55 cameras (Fig. 1) were selected to conduct traffic surveys according to the three criteria: evenly distributed across districts in the city; high quality and wide viewing angles and restricting the selection of multiple cameras on a route or the same orthogonal route.
The period of camera recording was July 18-24, 2017. Recording was conducted by the Thu Thiem Tunnel Management Department.
In addition, because the camera system focused on major roads and central areas of HCMC, surveys were also conducted on some roads located in the suburban areas of HCMC to ensure that the data reflected the current condition of HCMC's traffic. On each route, the time period of the survey was 12 hours a day (from 6 a.m. to 7 p.m.). Each hour, the surveyor recorded the traffic flow in the first quarter. We have determined the proportion of road traffic flow for nighttime conditions at the routes which prevailing traffic counted under the same traffic conditions and over a specific counting period to estimate traffic flow for the nighttime condition. The recordings were used for counting the number of vehicles.
The vehicle statistics were processed as follows: • The vehicles were divided into six groups according to the study of traffic in Hanoi (Hung et al., 2010) because of the similarity in traffic movement in both cities. • The types of vehicles included in the count were motorbikes, buses, cars with 4-16 seats, cars with ≥ 24 seats, trucks, and containers. For analyzing vehicle traffic statistics in this study, a vehicle recognition software program was developed to support counting. However, considerable errors were observed in large intersection areas with high traffic flow during peak hours. Hence, manual calculations were applied (Fig. 2).
This study used 141 traffic statistics data from the study "Modeling PM 10 in Ho Chi Minh City, Vietnam and evaluation of its impact on human health" (Bang, 2017). This data source was used for cross-reference and for updating the dataset. Gas emissions were calculated according to the distance traveled by using the following formula: Definitions E i,m : mass of gas emission i of vehicle type m (g); N m : number of vehicles m per kilometer of traveling; EF i,m : emission factor i of vehicle type m (g km -1 ); VKT m : total length traveled by the vehicle.

Industrial Emission
In this study, information was obtained on 15 industrial zones (Table 1) by using the environmental monitoring report prepared by the HCMC Export Processing and Industrial Zones Authority and HCMC Environment Protection Agency. Simultaneously, field surveys were conducted at the industrial zones to verify the information.
In addition, the study also used chimney emission data of facilities located outside the industrial zones from the study "Modeling PM 10 in Ho Chi Minh City, Vietnam and evaluation of its impact on human health" (Bang, 2017) ( Fig. 3).

WRF Meteorological Model
The WRF model is a next-generation mesoscale numerical weather prediction system designed for both atmospheric research and operational forecasting needs (Su et al., 2017). The WRF system contains two dynamic solvers, which are referred to as the Advanced Research WRF (ARW) core and the Nonhydrostatic Mesoscale Model core. In this study, the meteorological module was based on the ARW core developed by the National Center for Atmospheric Research (NCAR). The following three computational domains were used for HCMC ( Fig. 4) The initial and boundary conditions for WRF simulation were extracted from the Global Forecast System (GFS) model with a time of 3 hours per data in four sessions per day (00:00, 06:00, 12:00, and 18:00 UTC); National Centers for Environmental Prediction (NCEP) Final Analysis (FNL)  Operational Model Global Tropospheric Analyses dataset with a resolution of 0.5° × 0.5°; parameters in the dataset including 21 surface variables (e.g., rain, t2m, q2m, um, v10m, cloud, OLR, and Tsoil); and the variables of pressure, terrain elevation (H), wind (U, V), temperature (T), and humidity (Q). The physical-chemical parameters were selected in accordance with the conditions of HCMC, as presented in Table 2.
To enhance the accuracy of local surface data, this study manipulated terrain data according to Lidar data which were collected from the "Applying Lidar technology to build 3dimensional models for urban management in Ho Chi Minh City" project in 2012. Lidar point cloud data were processed into Digital Surface Model (DSM) and Digital Terrain Model (DTM) data with a resolution of 30 m. The DSM and DTM data were used to calculate roughness, which is a parameter affecting the simulation of meteorology and air quality. Land use data were extracted from the MODIS Land Cover Product (MCD12Q19) with a resolution of 500 m and  Terrain data and vegetation cover data were converted to GEOGRID format before being transferred to WRF's data library.
Meteorological data, such as the temperature, wind speed, and wind direction at Tan Son Hoa, So Sao, and Bien Hoa stations, were used to evaluate, calibrate, and verify the performance of WRF simulation. The evaluation indices were the mean error (ME), mean absolute error (MAE), and root mean square error (RMSE).

ME:
where F and O are the forecasted and observed meteorological data, respectively, and N represents the total number of data.

LinkIt ONE
In general, low-cost sensors have limitations. Not all lowcost sensors provide meaningful air quality data (Williams et al., 2014). In addition, these sensors are easily affected in field conditions. Low-cost sensors are sensitive, and their range of detection may be a limitation. A study reported that PM measurements can be affected by humidity (Wang et al., 2015).
Currently, no criteria are available for evaluating the performance of low-cost particulate matter sensors and vendors do not provide performance information. Researchers have begun to fill these gaps by assessing the performance of various particulate matter sensors under different environmental and control conditions (Dinoi et al., 2017).
Many studies have been conducted in the field of sensor performance. Jiao et al. (2016) evaluated five types of lowcost sensors in low-pollution suburban environments. They found high correlation between the measurements of the same type of low-cost sensors (R 2 = 0.980).
Han This study aimed to use the low-cost Plantower PMS 3003 sensor with the LinkIt device. LinkIt ONE is an open source platform used to develop Internet of Things (IoT) device templates that integrate air monitoring sensors with GPS and GPRS chips (Fig. 5). IoT devices integrate the PMS 3003 sensor. The PMS 3003 sensor uses the principle of laser light scattering to measure the particle concentration between 0.3 and 10 µm, a fan in the sensor exhausts air through a small space where the air interacts with a laser beam (the length of the laser wave is estimated to be 650 ± 10 nm), when laser light passes through a particle, four types of light scattering occur, namely light reflection, absorption, refraction, and diffraction.
When the particle size is less than 20 µm, the refraction phenomenon plays a crucial role. The velocity of light changes when it passes through a particle, which results in the light being deflected. The light disperses in specific directions according to the particle diameter The LinkIt ONE device was calibrated and validated using a GRIMM 107 specialized dust concentration measuring device in HCMC (Fig. 6).

CMAQ Model
The CMAQ model is a comprehensive multipollutant air quality modeling system developed and maintained by the

U.S. Environmental Protection Agency (EPA)'s Office of
Research and Development. For air quality forecasting, this study focused on determining the boundary and initial conditions because they considerably affect the simulation results. Many results of the global air quality model were selected, and the results from the Model for OZone and Related chemical Tracers version 4.0 (MOZART-4) model by NCAR were found to be highly consistent with the requirements and highly accurate. MOZART-4 has already been used in several studies where it has been shown to reproduce well tropospheric chemical composition; when driven with time-varying emission inventories (particularly for biomass burning), MOZART-4 reproduces the spatial and temporal variability in observations, such as the NOAA GMD network and MOPITT CO, ozonesondes and MODIS aerosol optical depth measurements (Emmons et al., 2010). The MOZART model was combined with MOPITT satellite data and ground station measurements so that it was consistent with the monitoring results.
The boundary and initial air quality conditions were extracted from the 72-h forecast results obtained with the MOZART model. Meteorological input data were extracted from previous forecasting WRF. Traffic and industrial data were used to calculate the emission data in SMOKE model. Emission data obtained from the SMOKE model were integrated with the CMAQ model. In addition, the simulation results were improved using data from IoT LinkIt ONE with a low-cost sensor and the MODIS Aerosol Product (Fig. 7) This study used the emission data for carbon dioxide (CO 2 ), nitrogen oxides (NO x ), sulfur dioxide (SO 2 ), and methane (CH 4 ) and coarse particulate matter (PM 2.5 and PM 10 ) from emission inventory processing as inputs in the CMAQ modeling framework version 5.2. The model estimated the primary PM 2.5 as well as NO x , CO 2 , and SO 2 (through secondary PM 2.5 formation) from each emission source in HCMC.
The Carbon Bond chemical mechanism (CB05) (Yarwood et al., 2016) and the AERO6 aerosol modules (Nolte et al., 2015) were used for gas-phase and aerosol chemical mechanisms, respectively.
The outputs of the model were used to estimate PM 2.5 according to the equations of the specific definition file CB05 and Aerosol 6 ( Fig. 8) (available at: https://github.com/ USEPA/CMAQ/blob/5.2/CCTM/src/MECHS/cb05e51_ae6 _aq/SpecDef_c05e51_ae6_aq.txt). Fig. 9 displays the percentage of different vehicle types on the Dien Bien Phu route. Overall, the commonly used vehicle type was motorbikes, which accounted for 80.46% of vehicles. Cars and trucks accounted for 11.23% and 4.59% of vehicles, respectively. This study determined the emission factors from four papers, namely "Development of emission factors and emission inventories for motorcycles and light duty vehicles in the urban region in Vietnam" (Tung et al., 2011), "Modeling  PM 10 in Ho Chi Minh City, Vietnam and evaluation of its impact on human health" (Bang, 2017), "Estimation of air pollutants emission factors for vehicles on road traffic suitable with Ho Chi Minh City condition" (Dung and Thang 2008), and "Roadside PM 2.5 and BTEX Air Quality in Ho Chi Minh City and Inverse Modeling for Vehicle Emission Factor" (Huong Giang and Kim Oanh, 2014). The data were uniform in units (g km -1 ), which facilitated calculation (Tables 3 and 4). From the traffic flow data, the discharge load was calculated and used as input data for the CMAQ model.

Industrial Emission
The discharge load was calculated from the survey data for the industrial zones (Table 5).

Meteorological simulation
Simulations were conducted at 1 a.m., 6 a.m., 1 p.m., and 7 p.m. at three stations, namely Tan Son Hoa, Bien Hoa, and So Sao. Table 6 presents the difference in wind speed at the Tan Son Hoa, Bien Hoa, and So Sao stations. The statistical results indicated that the forecasted wind speed for the So Sao station was more accurate than that for the Bien Hoa and Tan Son Hoa stations. Due to limitations in measuring equipment, the observed wind speed was often rounded (e.g., 0.1 m s -1 and 2 m s -1 ), which caused errors compared with the results of the model.

Wind Direction
With regard to wind direction, the MAE for the Bien Hoa station was lower than that for Tan Son Hoa and So Sao (Table 6). In addition, the absolute error of the wind direction was large (ranging from 55-87°) because the observed wind direction data were only given in 16 main directions (e.g., N, NNE, NE, E, S, SE, WSW, WNW, and  Number orf vehicils SSW). Each main wind direction had a range of 22.5°, whereas the model results indicated a specific wind direction (e.g., 45°, 245°, and 90°). Furthermore, the simulation data did not include the elevation data of buildings in the city, which was also a cause of the difference in results. Table 7 presents the temperature forecasting results of the WRF model at the Tan Son Hoa station. The ME index indicated that the predicted temperature was lower than the actual temperature, and the MAE index indicated that the MAE between the forecasted and observed temperatures was approximately 0.64°C. In addition, the values of RMSE, which were higher than the MAE index, indicated different fluctuations for different time periods in the forecasted results; however, this fluctuation may not be significant.

Temperature
The simulation results from the WRF model were combined with the actual measured data to apply the post-calibrated method by determining the F (bias) distribution of the bias ME index with the wind speed and temperature results of the WRF model. The corrected result was obtained by subtracting the WRF results from the F (bias). The F (bias) distributions were determined separately for each monitoring station. Table 8 presents the wind speed and temperature forecasted for Tan Son Hoa by using the WRF model after calibration. The forecasted wind speed results were close to the actual measurements, and the MAE index was approximately 0.25 m s -1 . The forecasted temperature results were accurate; the ME and MAE were -0.4°C and 0.4°C, respectively; and the MAE value after calibration was 1.4%.

LinkIt ONE
Figs. 11-13 display the comparison of the PM 2.5 and PM 10 concentrations measured using LinkIt ONE and GRIMM. The following observations were made: • The tendencies of the PM 2.5 and PM 10 concentrations measured using the two devices in the same measurement position were similar.    • The measured value of LinkIt ONE was higher than that of GRIMM. • The correlation coefficients between the PM 2.5 and PM 10 concentrations measured using the two devices indicated high consistency (R 2 = 0.83 for the PM 2.5 concentration and R 2 = 0.73 for the PM 10 concentration). • The PMS 3003 observation sensor could be used for real-time measurement with high accuracy.

CMAQ
The time period used for the calibration process was 3 days, July 12-14, 2017 (Fig. 14). The time used for verification was 2 days, July 15-17, 2017 (Fig. 15). The simulation results during calibration and validation exhibited high consistency with the measured values, as indicated by correlation coefficient (R 2 ) values of 0.84 and 0.8. The simulation results described the trend of the PM 2.5 concentration; however, at three instances, the fluctuation of the concentration was considerably large.
The maximum simulation value of the PM 2.5 concentration reached 30-34 µg m -3 , and the lowest value was 5-10 µg m -3 . This trend was consistent with the current air condition of HCMC. However, the spatial distribution had many unreasonable spots. This could be explained by the fact that the emission data could not reflect the immediate and volatile characteristics of industrial and realtime traffic.
The results also indicated that the PM 2.5 concentration tended to increase when traffic activities increased, especially during peak hours of the day. In the morning, the PM 2.5 concentration increased to 32 µg m -3 from 5 a.m. to 9 a.m. In the afternoon, the concentration reduced marginally to 13.7 µg m -3 at 5 p.m. (Fig. 16). However, the trend of increasing concentration in the afternoon was unstable, which can be explained by the fact that in the rainy season, the PM 2.5 concentration was low due to the wet deposition process and the trend of increasing PM 2.5 concentration only occurred in the morning.

CONCLUSIONS
This study developed a meteorological and air quality forecasting system for HCMC based on the WRF and CMAQ models by establishing a dataset of emissions from the two main pollution sources in the city, transportation and industry, and applying suitable parameters for the local conditions. The MAE of the predicted wind speed was close to that of the actual measurements, with an MAE index of approximately 0.25 m s -1 . Furthermore, the predicted temperatures were moderately accurate, with an ME and MAE of -0.4°C and 0.4°C, respectively.
The CMAQ simulation results exhibited a close relationship with the measured values (R 2 = 0.8-0.84) during calibration and validation. Thus, this model's predictions agreed very well with the monitoring data.
Synchronizing the satellite and monitoring data from the IoT device also increased the accuracy of the CMAQ model, especially on the city scale. This study demonstrates the potential applications of IoT devices with low-cost air pollution sensors.