Calibration of Low-cost Sensors for Measurement of Indoor Particulate Matter Concentrations via Laboratory/Field Evaluation

Recently, low-cost sensors (LCSs) have been widely used in monitoring particulate matter (PM) mass concentrations. Maintaining the accuracy of the sensors is important and requires rigorous calibration and performance evaluation. In this study, two commercial LCSs, Plantower PMS3003 and Plantower PMS7003, were evaluated in the laboratory and in the field using a reference-grade PM monitor (GRIMM 11-D). Laboratory evaluation was conducted with polystyrene latex (PSL) particles in a 1 m 3 chamber at 20 ° C with a relative humidity of 20%. Each LCS indicated higher mass concentrations than GRIMM 11-D for small-sized PSL particles (0.56 µ m); however, the LCSs indicated lower mass concentrations than GRIMM 11-D for PSL particles larger than 0.56 µ m. In addition, the difference in mass concentrations between the LCS and GRIMM 11-D became higher with particle sizes greater than 0.56 µ m. Nonetheless, a high correlation (R 2 > 0.9) between each LCS and GRIMM 11-D was obtained. Field evaluation was conducted at Yonsei University (Seoul, South Korea) from February 12 to March 31, 2022. The LCSs showed generally higher PM mass concentrations than GRIMM 11-D; however, some data points of the LCSs revealed different trends. We observed that outdoor PM 10 /PM 2.5 and relative humidity had notable impacts on the LCS data; in addition, LCS sensitivity depended on whether the PM concentration was low or high. Based on these observations, regression-based calibration models were constructed using the selected independent variables (outdoor PM 10 /PM 2.5 and relative humidity) after dividing the PM concentration into low and high sections. Consequently, the accuracy of the LCSs was significantly enhanced. Therefore, using LCSs with the calibration models can replace the use of expensive reference PM monitors, resulting in cost savings.


INTRODUCTION
Recently, the use of low-cost particulate matter (PM) sensors that measure total light scattering intensity by particles has enabled wide-ranging and high-density spatiotemporal measurements to be conducted (Giordano et al., 2021;Zheng et al., 2018). As light scattering-based low-cost PM sensors (LCSs) can provide almost instantaneous feedback on changes in air quality, users can immediately check and respond to high air pollution levels (Bhattacharya et al., 2012;Kim et al., 2010;Zheng et al., 2018). The light scattering-based LCS generally comprises a laser (650 nm), a phototransistor, and a focusing lens (Fig. 1).
A major limitation of LCS is that they are not as accurate as expensive reference PM monitors (Gao et al., 2015;Kelly et al., 2017;Zheng et al., 2018). An LCS can be affected by operating conditions (relative humidity, temperature, PM mass concentration) and aerosol characteristics (aerosol composition, size distribution). These disturbance factors can cause an LCS to produce output data showing different trends (Gao et al., 2015;Kelly et al., 2017). To minimize the different trends triggered by the disturbance factors, LCS output should be compared with output from a reference PM monitor. In this study, GRIMM 11-D, one of the portable research-grade PM monitors widely used for indoor PM measurement was selected as a reference PM monitor. The GRIMM 11-D is an advanced optical particle counter (OPC) with the ability to divide particles from 0.253 to 35.15 µm by optical size into 31 channels and to count individual particles with a diode laser (30 mW, 655 nm).
In general, air quality regulations stipulate 'dry' PM when the relative humidity is less than 40%. In higher relative humidity environments, a hygroscopic effect may overestimate the LCS output value. Magi et al. (2020) reported that the hygroscopic effect started to affect Plantower PMS5003 when the relative humidity exceeded 65-70%. Jayaratne et al. (2018) showed that relative humidity has a significant effect on LCSs Sharp GP2Y and Shinyei PPD42NS, even when the relative humidity is over 50%. In Badura et al. (2019), linear regression models for relative humidity showed equal or better performance than nonlinear regression models. Zheng et al. (2018) suggested that applying a nonlinear empirical calibration model for relative humidity could improve the PM mass concentrations of Plantower PMS3003 to close to the dry state, even in a high humidity environment. Gao et al. (2015) and Malings et al. (2020) reported that nonlinear empirical calibration models can obtain lower mean absolute errors (MAE) and higher correlation coefficients than theoretical approaches can. Ultimately, the choice between linear/nonlinear and theoretical/empirical models to calibrate for relative humidity will depend on end-user preferences and requirements.
In contrast, temperature has less effect on LCSs than relative humidity. The literature generally reported that low error values (-5 to 5 µg m -3 ) were shown even at low temperatures (-5 to 5°C). In the linear/quadratic calibration model for temperature, the error term appears larger than the coefficient; thus, the effect of temperature can be neglected in normal environments but not in extreme environments, such as deserts (Magi et al., 2020).
In this study, laboratory and field (indoor) evaluations were conducted to construct an effective calibration model for LCSs. The GRIMM 11-D and LCSs were placed at the same location to measure PM1.0, PM2.5, and PM10. The room temperature and relative humidity were also measured using a temperature/humidity sensor module. The LCSs and the sensor module were controlled by Arduino and integrated into a wireless network type sensor box using Wi-Fi module. Simultaneously, real-time open API (Application Programming Interface) data (outdoor PM2.5 and PM10) were collected by Air Korea (www.airkorea.or.kr). Among the collected data, the outdoor PM10/PM2.5 ratio (written as (PM10/PM2.5)API) and relative humidity were selected as independent variables. The purpose of our study was to suggest an effective calibration model for LCSs to be used in indoor spaces using various regression analyses.

Structure of Wireless Network Type PM Evaluation Sensor Box
First, the measured values of PM and temperature/humidity were determined by Arduino (Fig. 3). Second, Arduino converted the data into byte format and transmitted the converted data to a NodeMCU module by I2C (Inter-Integrated Circuit) communication method with 4 bytes for each measured value. Finally, the data transmitted to the NodeMCU module were converted back to string format, with the converted data being transmitted every 4.5 seconds and stored in a Google datasheet (data cloud) using wireless communication.   The NodeMCU module has an operating voltage of 3.3 V, a power consumption of less than 1.0 mW, and an indoor transmission range of approximately 40 m and an approximately 100 m range outdoors, with an operating radio frequency of 2.4 GHz.

Light Scattering-based PM Measurement and Correction Factor
The GRIMM 11-D PM monitor and LCSs calculate particle number concentration by measuring scattered light intensity from particles entering each device. Assuming that the total light intensity and the particle number concentration have a proportional relationship, the total scattered light intensity (Itotal) is a summation of each light intensity scattered by a single particle (i(dp)) (Friedlander, 2000), as follows: where dp is the particle diameter, and n(dp) is the number concentration of particles of size dp. The total mass concentration (mtotal) can then be calculated from n(dp) as follows: where ρp is the particle density that is set in each device (usually 1 g cm -3 ). Therefore, based on Eq. (1) and Eq. (2), mtotal is calculated through Itotal measurement.
The GRIMM 11-D, a research-grade reference PM monitor, displays the total mass concentration (mtotal,Ref) by measuring the total light scattering intensity (Itotal,Ref) with a diode laser (30 mW, 655 nm). The GRIMM 11-D measures particles in the range of 0.253-35.15 µm (optical diameter) with 31 channels. The GRIMM 11-D measures the particles every 6 seconds with a sampling flow rate of 1.2 lpm.
PMS3003 and PMS7003, the LCSs, display the total mass concentration (mtotal,LCS) by measuring the total light scattering intensity (Itotal,LCS) obtained from a phototransistor. Each LCS displays mass concentrations of PM1.0, PM2.5, and PM10 with three channels.
As mtotal is calculated through Itotal measurement, the following equations are obtained: where ηRef and ηLCS represent the response coefficients of GRIMM 11-D and LCS, respectively. From Eq. (3a) and Eq. (3b), the following equation is obtained: where α is a target correction factor between GRIMM 11-D and LCS and is expressed in Eq.
Other factors may affect the light scattering intensity: temperature (T), relative humidity (RH), and particle shape (S). However, the effect of temperature was neglected in this study because the temperature barely changed in an indoor space. In addition, the effect of particle shape was ignored, as reported in previous studies (Badura et al., 2019;Barkjohn et al., 2020;Crilley et al., 2020;Magi et al., 2020).
The light scattering intensity of GRIMM 11-D, according to time resolution (∆tn), is written in Eq. (6a):

Laboratory Evaluation
The laboratory evaluation was performed in a 83 × 83 × 158 cm chamber (Fig. 4). To prevent particle leakage from the chamber, the edges were siliconized. In addition, four fans were mounted at the corners (mid-height of the chamber, 80 cm) to create a uniform particle distribution. To minimize the effects of temperature and humidity on the measurements, they were maintained at 20°C and 20%, respectively. Generated particles flowed into the chamber through a stainlesssteel tube located at the top center of the chamber. While the GRIMM 11-D was located outside the chamber, two LCSs were located on the inside. The distance between the inlets of the LCSs was approximately 5 cm (narrow gap ratio, 0.06, to total chamber width, 83 cm). The distance between each LCS inlet and the sampling port inlet of the GRIMM 11-D was also approximately 5 cm. Therefore, GRIMM 11-D and LCS measure particles are assumed to exist in the same space (Li and Biswas, 2017).
Laboratory evaluation was conducted with two size distributions of polystyrene latex (PSL) particles: mono-disperse PSL (0.56 µm, corresponding to the peak of a typical atmospheric aerosol size distribution) and poly-disperse PSL (1.3, 1.5, 1.8, and 2.0 µm, realizing a PM2.5 size distribution) ( Table 2). The PSL particles were generated by an atomizer (TSI, Jet Atomizer 9302) and passed through a diffusion dryer (TSI, Diffusion Dryer 3076) to remove moisture. The aerosolized PSL particles flowed into the chamber to increase the particle concentration. The particle generation was stopped when the particle concentration did not increase further. Consequently, after stopping, the particle concentration started decreasing. The measurement was complete when the particle concentration could no longer be detected by the LCSs. Fig. 5 shows the size distributions of PSL aerosols immediately before the particle generation was stopped.

Field Evaluation
Fig. 6(a) shows the field experimental setting, which was conducted in the Engineering Building at Yonsei University (37.5618°N, 126.9366°E) located in Sinchon, Seoul, South Korea. In this study, for accurate evaluation and calibration, all inlets of LCSs and GRIMM 11-D were proximately positioned at a height of approximately 0.8 m to reduce noise caused by re-dispersion of fine dust (Giordano et al., 2021). Fig. 6(b) shows the PM evaluation sensor box comprising PMS3003, PMS7003, and DHT22. Each sensor was operated by Arduino, and the measured data were transmitted wirelessly through the Wi-Fi module (NodeMCU).
Using GRIMM 11-D, operating conditions for temperature and relative humidity (RH) were 0-40°C and 0-40%, respectively. Therefore, dehumidification was required when the relative humidity exceeded 40%. For dehumidification, we fabricated a heat-based dehumidifier capable of increasing the temperature up to 60°C, with a total length of 30 cm (heating part: 10 cm from the front end, Fig. 6(c)). The heat-based dehumidifier was connected to the inlet of GRIMM 11-D and was used when the indoor RH exceeded 40%. Therefore, the RH term in Eq. (6a) can be neglected and can be expressed as Eq. (7): In the field evaluation, we collected open API PM data (atmospheric PM2.5 and PM10) as well as output values from the PM evaluation sensor box (T, RH and PM1.0, PM2.5, PM10 mass concentrations) together with the GRIMM 11-D (PM1.0, PM2.5, PM10 mass concentrations). The open API PM data were collected hourly from a measurement station available near the field test site. In this study, two stations were selected; 0.75 km distance from the site and 3.8 km away from the site. After collecting data from the two stations, average values were used for (PM10/PM2.5)API. The PM mass concentrations measured from GRIMM 11-D and LCS were averaged per 1-hour and are represented as PM Ref (PM mass concentration measured with GRIMM 11-D) and PM LCS (PM mass concentration measured with LCS), respectively. The field test was conducted from 12 February to 31 March 2022.

Calibration Model
In this study, to create a calibration model based on regression analysis, the following three steps were used: 1) select useful independent variables through the regression analysis of collected data; 2) derive the correction factor (CF) corresponding to α of Eq. (5); and 3) construct a calibration model by applying the derived CF.
We constructed four calibration models, and the detailed description of each model is as follows (a description of each calibration model is also summarized in Table 3): In the above equations, the constants a, b, and c represent regression coefficients. The target correction value of CF2 is the reference correction factor (CF ref ) defined by Eq. (11): where, the CF ref is same as α of Eq. (5). To evaluate the calibration performance of PM Cal.LCS (Eq. (9)), a comparison is made with PM Ref .

Calibration model with (PM10/PM2.5)API by dividing concentration sections
As the sensitivity of LCS decreases at low concentrations owing to the limit of detection (LOD), low and high PM concentrations should be considered prior to applying the regression analysis. The criterion for dividing the PM concentration section was PM1.0 of LCS (PM1.0 LCS ). CF3 is a correction factor that applies (PM10/PM2.5)API as an independent variable after dividing PM concentration sections. Therefore, CF3 applied to PM LCS results in PM Cal.LCS . The PM Cal.LCS is calculated using the following equation: In the above equation, the constants a, b, c, and d represent regression coefficients. CF4Low and CF4High represent CF4 in the low and high concentration sections, respectively. The target correction value of CF4 is the reference correction factor (CF ref ) of Eq. (11). To evaluate the calibration performance of PM Cal.LCS (Eq. (13)), a comparison is made with PM Ref .

Evaluation of Calibration Models Performance
In this study, we used three indicators to evaluate the performance of calibration models: correlation coefficient (R 2 ), root mean square error (RMSE), and match rate (MR). The R 2 , RMSE, and MR are represented by Eqs. (14) The performance of the model is better when R 2 is closer to 1, RMSE is closer to 0, and the MR is closer to 100%.

Laboratory Evaluation
To determine the performance of LCSs, laboratory evaluation was conducted with two size distributions of PSL: mono-disperse (0.56 µm) and poly-disperse (1.3, 1.5, 1.8, and 2.0 µm). Fig. 7 shows the results of this evaluation.
Therefore, each LCS was significantly lower than GRIMM 11-D and showed higher deviations than in the case of mono-disperse PSL particles (Fig. 7(d)). To explain these trends, we hypothesized two possible causes: 1) owing to the internal structure of LCSs, larger particles may not fully reach the measurement part of the LCS (Fig. 2); 2) as larger particles can block smaller particles, the light scattering caused by smaller particles remains undetected, resulting in underestimation or noise generation (Figs. 1(b) and 1(c)).

Field Evaluation
The measured data are shown in Fig. 8. The average room temperature was 15.91°C (± 2.16) while the average indoor relative humidity was 34.83% (± 11.32 Prior to constructing the calibration model, the ability of each LCS needed to be evaluated to discriminate between PM2.5 and PM10 as PM10 LCS had a significantly lower correlation than PM2.5 LCS . Fig. 9 shows the correlation between PM10/PM2.5 of GRIMM 11-D (PM10/PM2.5)Ref and that of LCS, (PM10/PM2.5)LCS. As (PM10/PM2.5)Ref increased, (PM10/PM2.5)LCS tended to increase, and PM10 LCS had a non-linear relationship with PM2.5 LCS . Therefore, PM10 LCS was determined as reliable data.  The slope for PMS3003 was higher and closer to unity than the slope for PMS7003. However, no difference in slope was observed in Fig. 10 Fig. 10(e) shows the relationship between PM2.5 Cal.LCS (obtained with CF1) and PM2.5 Ref . Fig. 10(f) shows the relationship between PM10 Cal.LCS (obtained with CF1) and PM10 Ref . Table 4 shows R 2 , RMSE, and MR values obtained with CF1 for PM2.5 LCS and PM10 LCS . The results are shown for both PMS3003 and PMS7003. For PMS3003, the RMSE and MR values of PM2.5 Cal.LCS were significantly lower and higher than those of PM2.5 LCS , respectively, implying performance improvement after calibration. In contrast, the RMSE and MR values of PM10 Cal.LCS were not that different from those of PM10 LCS , implying that the calibration with CF1 did not significantly improve performance.

Simple calibration model
When RMSE and MR were used as indicators for calibration performance, the performance seemed to improve with the use of CF1 for PM2.5 LCS . However, when R 2 was used as an indicator, no performance improvement was observed for both PM2.5 LCS and PM10 LCS (Table 4 shows almost the same R 2 values of PM LCS and PM Cal.LCS for both PM2.5 and PM10). We surmise that this lack of improvement was due to the data shown in the black and red circles (Figs. 10(c) and 10(d)).
In addition, the use of CF1 for PMS3003 caused the minimum detectable concentration (in other words, LOD) to increase from 1.33 to 3.98 µg m −3 for PM2.5 and from 1.65 to 10.48 µg m −3 for PM10 (Figs. 10(e) and 10(f)). For PMS7003, similar trends were obtained.
To better understand the relationship between CF Ref and (PM10/PM2.5)API, the results were re-plotted in Figs. 11(c) and 11(d). Fig. 11(c) shows a linear correlation between CF Ref for PM2.5 and (PM10/PM2.5)API, whereas Fig. 11(d) shows a second-order polynomial correlation between CF Ref for PM10 and (PM10/PM2.5)API. Using these correlations, CF2 in Eq. (10) was determined and then PM10 Cal.LCS values were obtained for each case (Eq. (9)). Finally, Fig. 11(e) shows the relationship between PM2.5 Cal.LCS (obtained with CF2) and PM2.5 Ref for both LCSs. Similarly, Fig. 11(f) shows the relationship between PM10 Cal.LCS (obtained with CF2) and PM10 Ref for both LCSs. Table 4 shows R 2 , RMSE, and MR values obtained with CF2 for PM2.5 LCS and PM10 LCS for both PMS3003 and PMS7003. The RMSE values of PM2.5 Cal.LCS were significantly lower than these of PM2.5 LCS , whereas the R 2 and MR of PM2.5 Cal.LCS values were significantly higher than those of PM2.5 LCS . This finding indicated that the performance improved with the use of CF2 for PM2.5 LCS . Similarly, the performance significantly improved for PM10 LCS .  Moreover, the use of CF2 showed higher performance than the use of CF1 (Table 4). In particular, the data shown in the black and red circles were corrected for PM2.5 and PM10 (Figs. 10(c) and 10(d)). In addition, the use of CF2 did not increase the minimum detectable concentration. However, some data points of PM2.5 Cal.LCS and PM10 Cal.LCS still deviated from the trend-line in the concentration range of 30-90 µg m −3 (Figs. 10(e) and 10(f)).

Calibration model with (PM10/PM2.5)API by dividing concentration sections
For both PMS3003 and PMS7003, the data shown in Fig. 10 where x and y denote PM1.0 LCS and CF1.0 Ref , respectively. The y value rapidly decreased as the x value increased until x was approximately 6-7, and then converged after x = 6-7. This rapid decrease suggests that the LCS is less sensitive than GRIMM 11-D at concentrations below x = 6-7, likely due to the difference in LOD between LCS and GRIMM 11-D. This changing point value (x = 6-7) became the threshold point in Eqs. (12a) and (12b). If the changing point was defined as the x-value when the y′′ is equal to the y̅ ′′ (an average of y′′; y̅ ′′ = 0.0036 for PMS3003, y̅ ′′ = 0.0026 for PMS7003), the changing points were 7.3355 µg m −3 and 6.8852 µg m −3 , respectively, for PMS3003 and PMS7003.   ) and RH for PM2.5 and PM10, respectively. When the RH was higher than 40%, CF Ref slightly decreased with increasing RH, indicating that the hygroscopic effect was not significant. However, when the RH was lower than 40%, no correlation and data points showed significant deviation from the trend-line (y = ax + b). Nonetheless, using these correlations, CF4 in Eq. (13a) and (13b) were determined (Table 5), and then PM Cal.LCS values were obtained (Eq. (13)). Consequently, Figs. 13(c) and 13(d) show that PM Ref values were well correlated with those of PM Cal.LCS (obtained with CF4), respectively, for PM2.5 and PM10. Considering that the red and black circle data shown in Figs. 10(c) and 10(d) were obtained at a RH lower than 40%, we expected that the challenge of data point deviation was solved by using calibration models, including (PM10/PM2.5)API. Table 4 shows R 2 , RMSE, and MR values obtained with CF4 for PM2.5 LCS and PM10 LCS . The results are shown for both PMS3003 and PMS7003. The use of CF4 showed practically insignificant improvement compared with the use of CF3. However, if the RH was higher than 70%, the effect of RH could not be neglected (Magi et al., 2020;Jayaratne et al., 2018;Badura et al., 2019;Zheng et al., 2018).  . Therefore, compared with the time series graph before calibration in Fig. 8, a remarkable performance improvement was observed. Considering the performance evaluation using CF4 and lower susceptibility to hygroscopic effects, we determined that the PMS7003 showed a higher performance than the PMS3003 did (Table 4).  15 shows the MR between PMS3003 and PMS7003 before and after calibration (Raw, CF1, CF2, CF3, and CF4), and the MR of PM Cal.LCS between PMS3003 and PMS7003 was higher than those of PM LCS between PMS3003 and PMS7003. While using CF4, the MR was highest for both PM2.5 Cal.LCS (96.1%) and PM10 Cal.LCS (96.1%).

CONCLUSIONS
In the field evaluation, LCSs (PMS3003 and PMS7003) show generally higher PM mass concentrations than GRIMM 11-D; however, some data points of LCSs show different trends. Outdoor PM10/PM2.5 and relative humidity have notable impacts on the LCSs data. In addition, LCS sensitivity depends on the quantity of PM concentrations. Based on these observations, regression-based calibration models were constructed using the selected independent variables (outdoor PM10/PM2.5 and relative humidity) after dividing the PM concentration into low and high sections. As a result, PMS7003 show better performance than PMS3003 (RMSE: 2.03 µg m −3 , 4.89 µg m −3 for PM2.5 and PM10, respectively; MR: 91.43%, 86.39% for PM2.5 and PM10, respectively).