The Potential of Commercial Sensors for Spatially Dense Short-term Air Quality Monitoring Based on Multiple Short-term Evaluations of 30 Sensor Nodes in Urban Areas in Korea

Recently, highly spatially dense air quality monitoring networks using low-cost sensors have been attempted worldwide. However, the quality of data from these sensor networks remains to be validated. This study assessed the potential of lowcost sensors for spatially dense air quality monitoring. Thirty sets of air quality sensor nodes for CO, NO2, O3, PM2.5, and PM10 were custom-built to evaluate their consistency in measurement, both among the sensor nodes and between the sensor node and instruments that use Federal Reference/Equivalent Methods (FRMs/FEMs) in the real atmosphere under two distinctly differing meteorological conditions (summer and winter) in Seoul and Busan, Korea. We found that commercially available low-cost sensors possess great potential as monitors for short-term air quality studies in urban areas, at least for one-month periods, given that (1) the self-consistency among the 30 sensors was high (R2 > 0.93), (2) the consistency between the sensors and the FRM/FEM instruments was reasonably high (R2 > 0.87 overall for the periods of comparison), and (3) the consistency both among the sensors and between the sensors and the FRM/FEM instruments remained stable throughout the summer and the winter. However, vigorous data post-processing is needed to obtain reliable air quality data. For longer-term or temporally discontinuous monitoring, several issues must be addressed, including the limited lifetime of sensors, the degradation in sensor performance over time, and the long warm-up times for gaseous pollutant sensors. The O3 sensors required minimal post-processing correction, and the particulate matter and CO sensors agreed well with the FEM instruments after appropriate scale correction, but the NO2 sensors required additional efforts to correct for the effects of meteorological conditions and interfering materials. Overall, our results suggest that when investigating spatiotemporally heterogeneous distributions of air pollutants in various urban environments, a three-dimensional sensor network can be a useful tool for short-term monitoring, as long as data are corrected properly.


INTRODUCTION
Air pollution has become one of the major global threats to public health, particularly in low-and middle-income countries (WHO, 2018). About 4.2 million annual deaths could be attributed to ambient air pollution based on the updated global burden of disease in 2015 (Ostro et al., 2018). In addition, about 90% of the global population is exposed to high levels of air pollutants (WHO, 2018). In this respect, public concerns about air pollution have increased rapidly worldwide, leading to changing paradigms in air quality monitoring, from official monitoring stations operated by governments to low-cost air quality sensors carried by citizen scientists to monitor the air quality at the location where they are breathing (Kumar et al., 2015).
In developed countries, 1 air quality monitoring station (AQMS) operated by a government generally represents 100,000 people (Lewis and Edwards, 2016). Seoul, South Korea, which is one of the most populous cities in the world, operates 25 urban and 18 roadside AQMSs, each of which covers an area of roughly 4 × 4 km and represents 227,442 people (Bae et al., 2013;MOIS, 2019). However, air pollutants, particularly those emitted from vehicles in cities, are highly spatially heterogeneous due to the rapid dilution of vehicle emissions and the complex wind fields and turbulence created by built environments (Morawska et al., 2008;Karner et al., 2010;Choi et al., 2016). For example, elevated Park et al.,Aerosol and Air Quality Research,xxxx 2 concentrations of pollutants from vehicular emissions are diluted to ambient levels within a few hundred meters in daytime (Karner et al., 2010). At night when the atmosphere is stable, roadway plumes can reach up to 2 km, but elevated concentrations decrease sharply within the first several hundred meters (Hu et al., 2009;Choi et al., 2012). Wind fields and turbulence intensities are modified by the surrounding built environment (Kim and Baik, 2004), making ventilation more (or less) effective below urban canopy levels, even on a block scale (Pirjola et al., 2012;Choi et al., 2016;Choi et al., 2018).
The high spatiotemporal heterogeneity of pollutant distributions in urban areas requires spatially dense air quality monitoring in near real time to control exposure (Kumar et al., 2015). In addition, as public concern over air pollution has grown recently, the market for low-cost air quality sensors has also grown rapidly. The reasons for using low-cost air pollution sensors are diverse, from warning systems for gas leakage and occurrence of high-pollution events to highdensity air quality monitoring in cities (Lewis and Edwards, 2016). This study focused on assessing the potential of lowcost sensors for spatially dense air quality monitoring.
Numerous recent studies have evaluated the performance of low-cost sensors for their specific purposes under both laboratory and field conditions (Morawska et al., 2018). Bart et al. (2014) compared semiconductor O 3 sensors with a ratified reference instrument over 48 h and showed strong agreement between them, with a standard error of 3 ppb in the ambient O 3 range (0-50 ppb). Moltchanov et al. (2015) compared 2 pairs of electrochemical NO 2 and O 3 sensors at four different sites, including an AQMS. They reported good correlations between the 2 sensors at all sites, with r > 0.92 and 0.78 for O 3 and NO 2 , respectively. They also found good correlations between the sensors and reference instruments at the AQMS site. However, the relationship between the sensor and reference instrument differs among individual sensors. Masson et al. (2015) conducted a long-term comparison between an electrochemical NO sensor and the Federal Reference Method (FRM) instrument and reported good agreement, with a root mean square error of 14.6 ppb after correcting for sensitivity and offset, which was improved to 13.6 ppb with additional corrections using multivariate analysis, although they did not discuss signal attenuation effects over time. Mead et al. (2013) carried out comprehensive testing of the performance of NO, NO 2 , and CO sensors and quantified the effects of meteorological conditions and interfering materials on sensor outputs, suggesting the necessity for sophisticated data correction. They deployed 46 sensor nodes in Cambridge, United Kingdom, and suggested that low-cost sensors have great potential for high-density air quality monitoring. In addition, they reported that the degradation of NO sensor performance was negligible over 1 year. Nonetheless, their sensor reproducibility tests were limited to 2 pairs of sensor nodes over short time periods. Zicoba et al. (2017) carried out intercomparison tests of 66 infrared LED-based dust sensors for PM 2.5 and the Grimm instrument in both indoor and outdoor settings. The correlations among the sensors were R 2 > 0.9, showing excellent sensor reproducibility. However, the correlations between the sensors and the Grimm instrument were not satisfactory, with R 2 = 0.53 and 0.22 in indoor and outdoor environments, respectively.
Although many studies have evaluated sensor performance, most have used a limited number of sensors for comparison with a ratified reference instrument, and results from intercomparison tests among large numbers of sensors, which are essential for quantifying the spatial heterogeneity of air pollutants on a fine scale, are lacking. In addition, despite their promising applications, the quality of data obtained from low-cost sensors remains in doubt (Cross et al., 2017). This data quality issue should be carefully addressed before low-cost sensor networks are used in practice, as the spread of unreliable information may cause further confusion among the public, irrational anxiety or relief about air quality, and the establishment of inappropriate strategies for exposure reduction (Masiol et al., 2018).
In this study, we investigated the performance of low-cost sensors under different meteorological conditions, comparing ambient concentrations of air pollutants among a large number of sensors (30 sets of sensor assemblies to examine inter-sensor variations within a sensor network) and between the sensors and the instruments used in FRMs (to determine the accuracy and sensitivity of the sensor as well as meteorological or temporal effects on sensor characteristics). Piedrahita et al. (2014) suggested that correcting sensor data from collocation calibrations produced more reliable concentrations than laboratory calibrations. Based on experimental results, we discuss the potential of sensor networks for actual air quality monitoring as well as the questions and problems that should be addressed for practical applications of air quality monitoring sensor networks.
During the intercomparison tests between the sensor assembly and FRM or Federal Equivalent Method (FEM) instruments, 29 sensor assemblies were deployed in the center of Seoul, South Korea, in an area of about 800 × 800 m to investigate the heterogeneous distributions of air pollutants in the city. The detailed results of this sensor network study will be discussed in a subsequent paper.

Air Quality Sensor Node
In this study, the focus was on five criteria air pollutants, CO, NO 2 , O 3 , PM 2.5 , and PM 10 ; temperature; and relative humidity. Temperature and humidity data were used for further correction of meteorological effects on sensor performance. Because the goal of the study was not to evaluate the performance of numerous commercially available sensors but rather to apply these sensors to the construction of a highly spatially resolved air quality monitoring network, we chose a sensor for each pollutant from the literature based mainly on evaluation results published by the United States Environmental Protection Agency (U.S. EPA) and the South Coast Air Quality Management District (SCAQMD; California, USA). The U.S. EPA and SCAQMD (Air Quality Sensor Performance Evaluation Center) have evaluated commercially available low-cost air pollutant sensors, providing comparisons with FRMs and FEMs in the real atmosphere and laboratory since 2013 (SCAQMD, 2017; Park et al.,Aerosol and Air Quality Research,xxxx 3 U.S. EPA, 2017). Additionally, we reviewed published articles containing sensor evaluations (e.g., Williams et al., 2013;Jiao et al., 2016;Cross et al., 2017;Ly et al., 2018). Thus, although we cannot say that the selected sensors are the best performing sensors available, they represent sensors with high sensitivity and accuracy on the market. The sensors selected are listed in Table 1. A particulate matter (PM) sensor (PMS5003; Plantower) employs a laser scattering technique with a laser diode for a light source and a photodiode detector for a receptor (Zheng et al., 2018). CO (CO-B4; Alphasense) and O 3 (SM50; Aeroqual) sensors use an electrochemical sensing technique, which measures gas concentrations by measuring the current generated by the potential difference between working and counter electrodes when the target gas undergoes an oxidation or reduction reaction on the working electrode (Mead et al., 2013). NO 2 concentrations were measured with a metal oxide sensor (MiCS-2714; SGX), which measures the change in the electrical conductivity occurring when the target gas contacts the semiconductor surface (Urasinska-Wojcik et al., 2017).
Among these sensors, the CO and O 3 sensors were factory calibrated, but the NO 2 sensor was not calibrated initially. Although we can obtain PM concentrations directly from sensor outputs, the manufacturer did not provide information on a calibration algorithm (Zheng et al., 2018). However, SCAQMD mentioned that the PMS5003 was factory calibrated (SCAQMD, 2019). We did not apply any additional pre-adjustment to sensor readings before storing the data. All the adjustments were made in the data post-processing stage based on intercomparison test results in real atmospheric conditions. The air quality data were averaged for 10 s and stored on a microSD card at a 0.1-Hz temporal resolution in a data logger (ATmega2560; Atmel), which could be simultaneously transferred via a wireless network to a smartphone (with a PHPoC Shield for Arduino). However, due to a limitation of wireless network access for 30 sensor nodes, we used only the stored data in SD card for data analyses, and the wireless connection was used only for checking the malfunction of a sensor node with a direct connection between the sensor node and smartphone.  Park et al.,Aerosol and Air Quality Research,xxxx 4 The sensors were assembled in a platform made of acrylonitrile butadiene styrene plastic (dimensions: 200 × 100 × 300 mm; Fig. 1). An intake for ambient air and an exhaust for in-box air were located at the top and bottom of the front side of the platform, with a fan in the exhaust portion. The calculated ventilation rate was more than 20 times the platform volume per minute. We acknowledge that there could be a wall-loss issue for some pollutants (e.g., ozone). In this study, we could not quantitatively determine the effects of wall loss on ozone concentrations in a box. However, Ainsworth et al. (1981) suggested that ozone wall loss can lead to 2% uncertainty at the surface pressure level, and Itoh et al. (2012) suggested that the effective ozone lifetime due to wall loss in an acrylic resin tube is about 10 4 s at the surface pressure level. Considering the ventilation rate in our sensor node box, we expect the wall-loss effect would be minor. Even in case that we cannot ignore ozone wall-loss effect, we think this effect could implicitly be corrected when we corrected pollutant concentrations from the sensor nodes corresponding to those obtained from the reference instruments, assuming constant wall-loss rates.

Comparison of Air Quality Data among Sensors by Sampling Site, Period, and Meteorological Conditions
To obtain reliable spatial distributions of air pollutants, the comparability of data among sensors should be ensured, so that a value obtained at one site can be compared directly with values from other sites. To date, most sensor evaluation studies have focused on the accuracy of sensor performance, testing a small number of sensors experimentally (e.g., SCAQMD, 2017; U.S. EPA, 2017). To investigate the consistency of data among numerous sensors, we placed 30 sensor platforms together (10 sensors for O 3 ) on the rooftop of a three-story building on the Pukyong National University campus. Pukyong National University is located 650 m west of the coast and is surrounded by heavily trafficked roads in all directions (about 400 m from all streets). Thus, the campus represents the ambient urban air without direct influences from emission sources, considering that daytime roadway plumes dissipate within about 300 m (Karner et al., 2010).
The comparison tests were carried out four times, twice in summer and twice in winter, under distinct meteorological conditions. Within each season, the temporal gap between the two experiments was about four weeks. Each experiment was conducted for 3-5 days, depending on weather conditions. During this temporal gap, experiments using a highly spatially resolved sensor network were conducted in Seoul, the most populated megacity in Korea. The results from this air quality sensor network will be presented separately and are beyond the scope of the present study.

Intercomparison between the Sensor Platform and Ratified Reference Instruments by Sampling Site, AQMS, and Meteorological Conditions
A sensor platform was co-located (< 1 m) with the sample inlet of the AQMS operated by Seoul Institute of Health Environment for 7 days in summer (August 25-31, 2017) and 10 days in winter (January 11-20, 2018). The sample inlet was located at 16 m a.g.l. The instruments used for FRM/FEMs at the AQMS are CA-751 (non-dispersion infrared absorption) for CO, NA-721 (chemiluminescence) for NO 2 , OA-781 (non-dispersive ultraviolet absorption) for O 3 , and PM711 (beta ray attenuation) for particulate matter (PM) (Ambient Air Monitor 700 Series; Kimoto Electric Co., Ltd., Japan).
The AQMS is located in the center of Seoul (Jongno-gu, 37.572°N, 127.005°E) and is surrounded densely with small buildings (mostly three-to four-story buildings). In this area, vehicular emissions are the major source of air pollutants, but there are no prominent emission sources in close proximity to the AQMS. Possible emission sources that may affect the AQMS are the eight-lane road (Jong-ro) located about 100 m south, the six-lane road (Yulgok-ro) about 300 m east, and the four-lane road (Dongho-ro) about 270 m west  Park et al.,Aerosol and Air Quality Research,xxxx 5 of the AQMS. The purpose of this intercomparison study was not to investigate air quality affected by vehicular emission sources; thus, we did not analyze the traffic rates on the surrounding roads. To investigate meteorological effects on variations in the sensor characteristics, we obtained meteorological data from a nearby automatic weather station operated by the Korea Meteorological Administration, as well as temperature and relative humidity measured inside the sensor platform.

Meteorological Conditions
Many sensor evaluation studies have been conducted under specific meteorological conditions (e.g., Williams et al., 2013;Masson et al., 2015;Jiao et al., 2016). The meteorological conditions in Korea, however, vary significantly from season to season (roughly from -20°C to 40°C and from dry to humid). In this study, an intercomparison among sensors and between sensors and FRM instruments was conducted under two distinct weather conditions (hot and humid in summer and cold and dry in winter). In addition, for each season, two intercomparison experiments (among sensors and between each sensor and the corresponding FRM instrument) were conducted in close temporal proximity, within one month. Thus, the meteorological conditions of the two intercomparison experiments were similar in both summer and winter (Table 2).

Intercomparison among Sensors
To investigate the consistency of sensor readings among a massive number of sensors, the comparisons of data were made with unadjusted raw signals from sensor nodes. In these comparisons, we used 15-min-averaged data to examine if the sensors have the potential to capture high temporal variations in pollutant concentrations in cities. All 30 sensors agreed well with each other for all tested pollutants in both summer and winter. Here, we present the statistical results obtained from the comparison of 29 sensor platforms with 1 reference sensor platform (the reference sensor platform was installed at AQMS to conduct further intercomparison tests with reference instruments). The mean R 2 values of the 29 sensors were greater than 0.933 ± 0.012 in summer and 0.965 ± 0.020 in winter for all pollutants (Table 3). Slightly poorer linearity in summer for CO, PM 2.5 , and PM 10 was likely caused by relatively low ambient concentrations in summer compared to those in winter (O 3 showed the opposite trend, with higher summer concentrations). Thus, the consistency among sensors appears to be sufficient to determine the relative spatial distributions of pollutants obtained by the sensor network, at least under similar weather conditions. Accuracy issues are discussed in the next section.
Although the linearity among sensors was excellent for all pollutants, the slopes of 1:1 plots between the 29 sensors and the reference sensor were more widely scattered for NO 2 and CO compared with PM and O 3 sensors. Assuming the zero values of all sensors are identical, if no slope correction is carried out, the concentrations among sensors may differ by up to 40% for CO (slopes of 0.74-1.13 in summer), 73% for NO 2 (0.93-1.66 in winter), 17% for O 3 (0.87-1.05 in winter), and 28% for PM 2.5 (0.84-1.12 in winter). Because the PM sensor is based on a light scattering technique, PM 2.5 and PM 10 showed almost the same temporal variations (discussed in the next section in more detail). Thus, PM 10 is not discussed here.  Park et al.,Aerosol and Air Quality Research,xxxx 6 Intercomparison tests among sensors were conducted twice during each season (before and after two-week intercomparison tests with the FRM instruments). The two datasets obtained from two separate experiments agreed well (falling onto one regression line) for all pollutants in both summer and winter (Figs. S1-S5 in Supplementary  Information). During the two experimental periods in each season, meteorological conditions were similar (Table 2). Thus, we conclude that sensor characteristics do not change over time, at least for a one-month period under similar weather conditions. The consistency among sensors in different seasons (thus, under different meteorological conditions and with a six-month time interval) agreed well, except for that of the CO sensors (Fig. S6), which showed variable consistency. Most CO sensors showed good agreement between seasons ( Fig. S6(a)), but some showed different relationships (Fig. S6(e)). Even when the sensors showed inconsistent seasonal relationships, the slopes between the sensors in the two seasons were similar. Thus, these differences were likely caused by zero drift due to differing meteorological conditions or the time interval. Only 3 sensors out of 29 showed seasonal inconsistency, with a maximum zero-shift of 137 ppb (Fig. S6(e)). Nonetheless, most sensors (26 out of 29) agreed with each other reasonably well (R 2 > 0.9). Thus, we conclude that the consistency among sensors is reasonably good regardless of meteorological conditions and that this consistency is maintained for a period of at least six months (sensors were stored in plastic bags filled with N 2 gas when they were not operated). Although the consistency among sensors was excellent, the correspondence with air pollution data obtained from FRM/FEM instruments under different meteorological conditions should also be examined and is discussed in the next section.

Intercomparison between Sensors and Reference Instruments, and Meteorological Effects
The reference sensor platform described in "Intercomparison among sensors" was co-located with the AQMS inlet to determine the accuracy and consistency of the sensors compared to FRM/FEM instruments for summer and winter. In these comparisons, we used 1-h-averaged sensor readings because the AQMS provides 1-h-averaged air quality data. Generally, the consistency between sensor readings and the concentrations recorded with FRM instruments was reasonably good for CO, O 3 , and PM 2.5 (Table 4 and Fig. 2).

Gas Sensors (CO, O 3 , and NO 2 )
The CO sensor agreed well with the corresponding FRM instrument in both summer (R 2 = 0.814) and winter (R 2 = 0.930). Although the slopes differed significantly (summer: 1.03; winter: 1.63) (Fig. 2(a)), the data ranged much more widely in winter (0.4-2 ppm) than in summer (0.3-0.8 ppm). Thus, the winter slope likely represents the actual sensor characteristics. Indeed, both the summer and winter data appear to lie along the line fitted for the entire seasons with the slope similar to that for only winter data ( Fig. 2(a)). However, considering the possibility of zero shift in the CO sensor, as discussed in "Intercomparison among sensors" (although this shift was observed in a minority of CO sensors), we cannot exclude the possibility that the slope is affected by meteorological conditions. Overall, we conclude that CO sensors have the potential for application in highdensity stationary and mobile sensor networks, but their data should be properly calibrated before being used or shared.
The O 3 sensor showed great consistency with the FRM instrument in both summer (hot and dry conditions) and winter (cold and dry conditions) (R 2 = 0.894 and 0.944 in summer and winter, respectively). In addition, the data from the two seasons showed almost identical relationships with the FRM instrument (summer and winter relationships agreed with each other within 4% over the 20-100 ppb range). Thus, meteorological factors (temperature and relative humidity) in two different seasons did not affect sensor responses within this ambient concentration range. This finding indicates that O 3 sensors have great potential to be employed in high-resolution sensor networks with minimal post-processing or correction of data. When O 3 concentrations were below 20 ppb, sensor readings were lower than the actual concentrations. In particular, at levels less than 10 ppb, the O 3 sensor did not detect O 3 . However, these low concentrations are not of interest from an air quality perspective; therefore, we excluded these data from our analysis (indicated by the brown rectangle in Fig. 2(c)). Table 4. Summary of the intercomparison results between sensors and FRM instruments (using 1-h-averaged unadjusted sensor data). Because the NO 2 readings were raw signals from the sensor (not factory-calibrated signals), the slope and yintercept are not very meaningful.  Although the NO 2 sensor and FRM instrument have positive correlations in each season, the 1:1 plots were inconsistent between summer and winter, with more widely scattered data compared to other pollutant sensors (R 2 = 0.41 in summer and 0.79 in winter). Because the signals from all NO 2 sensors were consistent with each other (Figs. S1 and S2), their seasonal inconsistency with the FRM instrument and wider scatter appear to have been caused by meteorological conditions and other interfering factors (e.g., O 3 ; Mead et al., 2013). However, after correcting for these factors, NO 2 concentrations from the sensor agreed much better with FRM concentrations, as discussed in "Post-processing of sensor data and uncertainties." Thus, to employ this NO 2 sensor in a sensor network, intensive post-processing of data is required.

PM Sensor
The PM sensor showed excellent consistency with the FEM instrument (R 2 = 0.955 for PM 2.5 and 0.917 for PM 10 ). In addition, the 1:1 relationship in summer and winter had the same trend (Figs. 2(d) and 2(e)). The summer and winter relationships with the FEM instrument differed slightly, but this was likely caused by differing ambient levels of PM 2.  Park et al.,Aerosol and Air Quality Research,xxxx 8 between the two seasons (summer range: < 10-23 µg m -3 ; winter range: 10-100 µg m -3 ). However, the sensor read concentrations 2.6 and 2.2 times those measured with the FEM instrument in summer and winter, respectively. PM 10 data were scattered more widely compared to PM 2.5 , particularly in summer (R 2 = 0.34; Table 4). This scatter was prominent when PM 2.5 was low and the PM 10 /PM 2.5 ratio was high (see the green rectangle in Fig. 2(e)). The same trend was also present in the time-series figure of the SCAQMD report (SCAQMD, 2017; Fig. S7), although the report does not discuss this trend and its possible causes. Thus, we conclude that although PM sensors can be applied to high-resolution sensor networks, the data should be calibrated carefully before deployment, and PM 10 is less reliable under conditions of low PM 2.5 and high PM 10 /PM 2.5 ratios. We do not discuss this issue in more detail because it is beyond the scope of this study to elucidate the causes of this feature.

ARTICLE IN PRESS
SCAQMD (2017) showed that commercially available PM sensors have a wide range of performance, and Morawska et al. (2018) noted that pre-testing or calibration under realistic use conditions is necessary. Our results also suggest that the characteristics and performance of the selected sensor should be carefully determined and the data should be corrected properly before deployment.

Post-processing of Sensor Data and Uncertainties
Data post-processing was conducted separately for each season to avoid additional complexities, such as sensor aging and long-term zero drift. All data post-processing was conducted with MATLAB 2017b (MathWorks ® ) due to the massive dataset from 30 sensor nodes. For practical application in massive mobile or stationary sensor networks, only zero and scale corrections were used based on linear regression results for CO, O 3 , and PM sensors. For NO 2 sensors, linear fitting did not produce satisfactory results (Table 4). Thus, an additional multivariate regression method was applied to improve the reliability and quality of sensor data. Because semiconductor gas sensors (e.g., NO 2 sensor) are known to be influenced by temperature, humidity, and interfering gases (Barsan et al., 2007;Kleffman et al., 2013;Mead et al., 2013), we used temperature, humidity, and O 3 as explanatory variables (NO 2 concentrations could be grouped by O 3 level; data not shown). The linearity with AQMS NO 2 improved significantly after correction with the multivariate regression method (equations are shown in Fig. S8), especially for summer data (R 2 = 0.37-0.80 for summer and 0.77-0.82 for winter; Fig. S8). The data post-processing (for both normalization to a reference and multivariate regression) were based on the entire observation data because these corrections were for diagnostic evaluation, not for prognostic purposes.
The time-series of post-adjusted air quality data from the sensors corresponded to those from the FRM/FEM instruments in both the magnitude and temporal variation (Fig. S9). The results of several statistical tests to evaluate the uncertainty of post-adjusted sensor data compared to AQMS data are also summarized in Table 5. The larger mean absolute error (MAE) of the CO sensor was caused by higher ambient CO concentrations (the mean relative error (MRE) of the CO Table 5. Summary of several statistical tests for uncertainty in corrected (post-adjusted) sensor data (numbers of data used in the tests are identical with those in Table 4).

Substance
Bias Application of Post-processing to the Other 29 Sensor Nodes' Data in a Sensor Network During the periods for which 1 representative sensor node was co-located with reference instruments for intercomparison tests, the other 29 sensor nodes were deployed in a nearby urban area with a spatial domain size of 800 m × 800 m to investigate spatiotemporal heterogeneity of air pollutant distributions in complex urban environments (Fig. S10). To compare directly air pollutant concentrations read by each sensor node deployed in a sensor network, we applied a twostage post-adjustment to the other 29 sensor nodes. At the first stage, we corrected the readings from 29 sensor nodes corresponding to the representative sensor node that was located with reference instruments based on the comparison results among sensors (Table 3; see "Intercomparison among sensors"). At the second stage, because all 30 sensor nodes agreed well with each other (Table 3), we applied the post-processing of sensor data that was derived from the intercomparison results between the sensor node and reference instruments to the other 29 sensor nodes (corrected at the first stage) to adjust sensor readings to those corresponding to the reference instruments. Fig. 3 shows the time-series of pollutant concentrations from the sensor node after being corrected with the two-stage post-processing method mentioned above as well as the concentrations measured at the nearby AQMS (37.565°N, 126.976°E). This AQMS is a different place from the site where the intercomparison tests were conducted between the sensor node and reference instruments. Thus, these AQMS data were independent of the post-processing of sensor readings. The sensor was located in the ground-level (about 3 m a.g.l.), and AQMS is located about 40 m from the sensor horizontally, at 15 m a.g.l. (Fig. S10). Considering the different positions of the sensor node and AQMS (horizontally and vertically), the time-series of the sensor node and reference instruments agreed well, demonstrating the potential of sensor measurements for air quality monitoring. The sensor node recorded high CO concentrations at midnight in the early period of monitoring in summer (tinted areas in Fig. 3(a)). At that time, construction equipment was operated beside the sensor.
The detailed results and discussion of the actual deployment of sensors in a network in complex urban micro-built environments will be discussed in a separate paper.

Temporal Variations in Sensor Characteristics and Stabilization Time
Electrochemical gas sensors and semiconductor sensors require time for stabilization prior to reading ambient pollution levels (Futata and Ogino, 1998;Burgues and Marco, 2018). For example, electrochemical sensors measure the current generated by reduction-oxidation (redox) reactions that occur on the electrode of the sensor and then convert these currents into pollutant concentrations. These redox reactions continuously occur, even when the sensor is not powered, and currents accumulate on the electrode. When the electrochemical sensor is powered, release of these currents to ambient levels takes time (Futata and Ogino, 1998). This means that when most cost-effective gas sensors (excluding non-dispersive infrared gas sensors; Dinh et al., 2016) are powered, we had to wait for the sensors to stabilize before recording the real signals (Fig. S11). However, we found that when power was momentarily disconnected (< 1 min), all sensors stabilized within 1 min (data not shown).
As expected, the stabilization time for CO and NO 2 sensors was significant and increased with the period during which sensors were not operated (resting time) (Fig. 4). When the resting time was less than 5 days, the stabilization time increased sharply, up to 1.7 h for the CO sensor and ~3 h for the NO 2 sensor. With more than 5 days of resting time, the stabilization time did not increase significantly (up to ~3 h for the CO sensor and no change in the NO 2 sensor). However, when the resting time was longer than 3 months, the stabilization time again increased sharply, up to ~5 h for CO and > 13 h for the NO 2 sensor. Of note, the stabilization time is a characteristic of the individual sensor, and each sensor may have a different stabilization time ( Fig. S11(b) and S11(c)). Thus, we consider the stabilization times presented here to be lower limits.
Unlike the CO and NO 2 sensors, the unstabilized O 3 sensor gave zero values ( Fig. S11(a)). The O 3 sensor also required time for stabilization but not as much as the CO and NO 2 sensors. Regardless of resting time, the stabilization time of the O 3 sensor appeared to be no longer than 10 min (based on limited experiments; data not shown). The PM sensor is based on a light scattering technique and did not require any stabilization time.
Although we do not need to consider the stabilization time once the sensor network has been operated continuously for a long time, it is important to be aware of this characteristic for discontinuous monitoring of air pollutants (e.g., episodic monitoring of air pollutants by smartphone users) to avoid recording and sharing inaccurate air pollution data.

CONCLUSION
The goals of using low-cost air pollution sensors vary widely, from warning residents of gas leaks or high-pollution events to providing high-density air quality monitoring in cities. This study focused on assessing the suitability of low-cost sensors for spatially dense air quality monitoring networks.
The sensors currently available on the market show great potential as air quality monitors for such networks in large cities containing various built environments, given that (1) the self-consistency among the 30 tested sensors was reasonably high, with R 2 values larger than 0.93 for all of the tested pollutants; (2) the consistency between the sensors, excluding that for NO 2 , and the FRM/FEM instruments was also reasonably high, with R 2 values larger than 0.87 (over the entire periods of comparison; Table 4); and (3) the consistency both among the sensors and between the sensors and the FRM/FEM instruments (except the NO 2 sensor) remained stable during two distinct meteorological scenarios (hot and humid in summer and cold and dry in winter), which approximate the extremes experienced in most cities.
However, several issues must be resolved prior to using low-cost sensors for spatially dense air quality monitoring networks. First, the differences between the unadjusted concentrations obtained by the sensors were non-negligible, reaching as high as 17%, 28%, 40%, and 73% for O 3 , PM 2.5 , CO, and NO 2 , respectively, indicating that the sensor data must be corrected when investigating the qualitative spatial distributions of air pollutants on a fine scale.
Second, the unadjusted concentrations recorded by the CO and PM sensors differed significantly from those of the FRM instruments (60-70% lower for CO and about 2.5-fold higher for PM), although the O 3 sensors agreed quite well with the FRM instrument (within 20% for the 15-70 ppb range). Furthermore, the NO 2 sensor required additional correction (in this study, we applied multivariate regression). Thus, to obtain reliable air quality data from sensor networks, the data must be carefully corrected through post-processing. Nonetheless, a simple two-step calibration process (i.e., linearity correction for the discrepancy among the sensors followed by scale correction for the discrepancy between the sensors and the FRM instruments) can produce reliable data from a spatially dense network.
Third, meteorological conditions and interfering materials may significantly affect some sensors (including the NO 2 sensor that we tested); hence, we must also correct for these effects, e.g., by applying multivariate regression analysis or neural network statistical analysis (Jianlin Hu at Nanjing University of Information Science and Technology, personal communication, February 2, 2018).
Fourth, some sensors must "warm up" prior to use. The time required to stabilize the CO and NO 2 sensors in this study, for example, increased significantly, from a few hours to more than half a day, after unpowered periods, and the sensors produced unrealistic data while warming up. Therefore, warmup time should be considered, especially for discontinuous monitoring efforts, such as mobile monitoring with handheld sensors by citizen scientists.
Unfortunately, this study could not quantify the degradation in sensor performance caused by long-term operation. However, considering the relatively short lifetimes of low-cost sensors (particularly electrochemical and metal oxide sensors), we should quantify and correct for this degradation. Cross et al. (2017) noted the brief lifetime (24-36 months) as one of the problems with low-cost sensors, and Bai et al. (2019) suggested that the performance of low-cost PM sensors gradually degraded over an 18-month operating period.
Despite the sensor's lifetime issue, we conclude that employing a spatially dense, massive air quality monitoring network using currently available low-cost sensors is feasible for short-term experiments (i.e., less than a few months in length, although this time frame could be substantially extended for PM and O 3 , based on longer-term experimental results reported in the literature, e.g., Jiao et al., 2016;SCAQMD, 2017). To obtain reliable data, we need to design the experiments properly, and for long-term or discontinuous monitoring, we also need to quantify the sensor's lifetime and degradation in performance over time as well as sensor's warm-up time. In addition, more systematic and generally applicable algorithms must be developed to correct for meteorological effects so that near-real-time air quality information on a fine scale can be provided. Statistical and/or machine-learning methods with large datasets may offer a solution to these issues.