Performance of Four Consumer-grade Air Pollution Measurement Devices in Different Residences

There has been a proliferation of inexpensive consumer-grade devices for monitoring air pollutants, including PM2.5 and certain gasses. This study compared the performance of four consumer-grade devices—the Air Quality Egg 2 (AQE2), BlueAir Aware, Foobot, and Speck—that utilize optical sensors to measure the PM2.5 concentration. The devices were collocated and operated for 7 days in each of three residences, and the PM2.5 mass concentrations were compared with those measured by established optical sensing devices, viz., the personal DataRAM and DustTrak DRX, as well as the filter-based Personal Modular Impactor (PMI). Overall, the Foobot and BlueAir displayed the strongest correlations with the direct-reading reference instruments for both the hourly and daily PM2.5 mass concentrations. Comparing the 1-hour averages obtained with the DustTrak DRX for all of the residences with those obtained with the Foobot, BlueAir, AQE2, and Speck, the Pearson’s correlation coefficients (R’s) were 0.80, 0.88, –0.028, and 0.60, respectively. Overall, the strength of the correlation depended on the specific residence, likely due to the differences in aerosol composition. The correlations with the PMI measurements were moderate, with R values of 0.44 and 0.56 for the BlueAir and Foobot, respectively. The correlation coefficients for the daily values obtained with the AQE2 and Speck were –0.59 and 0.70 compared to the PMI. According to a paired t-test, the average 24-h PM2.5 concentration data obtained using the consumer-grade monitors were statistically different (p > 0.05) from the mass values measured by the gravimetric filters. Overall, this study demonstrates the ability of consumergrade air pollution monitors to report PM2.5 trends accurately; however, for accurate mass concentration measurements, these monitors must be calibrated for a particular location and application. Further testing is needed to determine their suitability for long-term indoor field studies.


INTRODUCTION
Airborne particulate matter (PM) consisting of airborne particles of ≤ 2.5 µm in aerodynamic diameter is designated as PM 2.5 and also identified as fine particulate matter (Kim et al., 2015).There are numerous negative health effects associated with exposure to fine airborne particles, including respiratory, cardiovascular, neurological, and reproductive effects (Ezzati et al., 2004;Sundell, 2004;Northcross et al., 2013;Kumar and Gupta, 2016).
PM 2.5 outdoors is typically measured using integrated sampling or direct-reading instrument methods.Integrated samplers, such as those using gravimetric filter methods, collect a sample of particulate matter over a set sampling period for later analysis that must be completed in a laboratory (Hinds, 1999).Direct-reading instruments measure particulate matter indirectly, such as through light scattering by the aerosol particles or by using a tapered element oscillating microbalance (TEOM).Direct-reading data can be collected directly in the field and allow the observer to recognize changes in PM 2.5 mass concentrations over time (Hinds, 1999) without the need to wait for laboratory analysis.Among direct-reading instruments, photometers, e.g., the DustTrak DRX (DRX Aerosol Monitor 8534 and others; TSI Inc., Shoreview, MN), measure light scatter from particles at specific angles (90° for the DRX) (TSI Inc.;Hinds, 1999;Zhang et al., 2018); nephelometers, e.g., the personal DataRAM (pDR-1000; Thermo Electron Corp., Franklin, MA), measure light scatter from several angles (Hinds, 1999;Thermo Electron Corp., 2004;Benton-Vitz and Volckens, 2008).
Large PM 2.5 monitoring networks, such as SLAMS (State or Local Air Monitoring Stations) and NAMS (National Air Monitoring Stations), provide spatial and temporal distributions of PM 2.5 pollution levels across a given area, such as a neighborhood, county, state, or country.These and other networks are set up to meet the state and federal legislative mandates for public health and welfare, regulation or identification of pollutant sources, emergency response, and increased public awareness.The instruments used in such networks have to perform according to Federal Reference Methods (FRMs) or Federal Equivalent Methods (FEMs) (Hall et al., 2014;Williams et al., 2014;Jiao et al., 2016).While the location and density of stations within each network depend on its purpose and the distribution of pollution sources, very often the density of monitoring stations is limited by the cost of the monitoring technology and required maintenance (Castell et al., 2017).
The National Ambient Air Quality Standards (NAAQS) regulate the presence of ambient air pollutants, including PM 2.5 , and the regulatory values are based on the outdoor presence of pollutants (Sexton and Wesolowski, 1985;Wagner et al., 2018).The current NAAQS limit for PM 2.5 is 35 µg m -3 for 24-h averages; 12 µg m -3 and 15 µg m -3 are the primary and secondary NAAQS PM 2.5 limits for annual averages, respectively (Esworthy, 2015;U.S EPA, 2016).The World Health Organization (WHO)'s guidelines for average annual and 24-h PM 2.5 concentrations are 10 µg m -3 and 25 µg m -3 , respectively (WHO, 2005).While there currently are no standards regulating PM 2.5 concentrations indoors, exposure to pollutants indoors is a major concern because individuals spend about 90% of their time indoors (Klepeis et al., 2001), including homes, offices, and indoor places of business.Fueled by concerns over negative health effects of air pollution and by rapid technological advances, there is a strong and developing trend in consumer electronics to design and sell inexpensive (< $300) devices that could be used for monitoring various air pollutants, including various PM fractions and gaseous pollutants.While the implementation of such monitors outdoors on a large scale allows the creation of large and even global maps of air pollution (AirViz Inc., 2019;IQAir, 2019), most of these consumer-grade devices are designed for indoor use to inform users about air quality in their residences.
Consumer-grade airborne particulate matter monitors utilize light-scattering detection sensors to determine particulate matter concentrations, and most of them advertise measurement of PM 2.5 , and some also indicate the capability to measure PM 10 (Jiao et al., 2016;Sousan et al., 2017).Most of these consumer-grade devices also monitor temperature and relative humidity, while some also measure volatile organic compounds (VOCs) or other gasses.Still, the majority of the devices seem to be focused on measurement of PM 2.5 .For example, 31 of the 42 lowcost (e.g., consumer-grade) air monitors reviewed by the California South Coast Air Quality Monitoring District (SCAQMD) measure particulate matter (South Coast AQMD, 2019).Thus, the increasing use of these consumergrade PM 2.5 devices informs consumers about their personal air quality in an effort to minimize adverse health effects or at least increase awareness of the pollutant presence.Therefore, it is important to understand the performance of these monitors and their reliability in determining PM 2.5 mass concentrations in their advertised environments.Furthermore, such devices provide a low-cost alternative to research-grade equipment and could allow air pollution investigations using distributed networks of PM 2.5 monitors at spatial resolutions that previously would have been costprohibitive (Austen, 2015).Because consumer grade PM 2.5 monitors have been introduced into the market only in the last few years, their accuracy in measuring PM 2.5 mass concentrations has not yet been extensively tested in specific environments or for extended time periods.Some large-scale testing programs for consumer-grade devices include the U.S. Environmental Protection Agency (U.S. EPA)'s Sensor Performance Evaluation and Application Research (SPEAR) and SCAQMD's Air Quality Sensor Performance Evaluation Center (AQ-SPEC).These and similar programs utilize controlled laboratory environments and collocated outdoor studies at air quality monitoring sites to evaluate the performance of consumer-grade air monitoring devices (Holstius et al., 2014;Jiao et al., 2016;Sousan et al., 2017).However, few studies have evaluated the performance of consumer-grade air pollution monitoring devices in actual indoor environments (Curto et al., 2018), which is the primary measurement environment advertised for such products.Thus, the goal of this study was to use four consumer-grade PM 2.5 monitoring devices in several different residential indoor environments and compare their performance with that of established real-time PM 2.5 monitors as well as with gravimetric measurement of PM 2.5 mass concentrations.

Selection of Consumer-grade Monitors
Consumer-grade air pollution monitors for this study were selected based on specific criteria.First, monitors had to cost less than $300.The monitors also had to include the ability to monitor PM 2.5 mass concentrations, a data recording capacity, and a function or software to easily acquire the data (such as downloading from the device directly or retrieval from a cloud-based platform).Ideally, the devices should not have been extensively tested and published at the time of this study in 2016.Furthermore, we looked for monitors that had real-time monitoring capability, were able to measure at least a minimal set of meteorological parameters (such as temperature and humidity), were portable, and were easy to set up.The following consumergrade monitors were selected: the Air Quality Egg version 2: Particulate Pollution (AQE2; Wicked Device, Ithaca, NY), BlueAir Aware (BlueAir Inc., Chicago, IL), Foobot (Airboxlab US, San Francisco, CA), and Speck (AirViz Inc., Pittsburgh, PA).All selected devices met the primary selection criteria for cost and measurement capabilities as well as ability to record temperature and humidity (Table 1).
At the time of the study, these devices represented a new suite of products that entered the market and provided a means for consumers to monitor their indoor (or outdoor) air quality.Certain products, such as the BlueAir Aware, can also sync with other devices, such as air purifiers, to optimize the air quality in the home (Adgully, 2016).The selected consumer-grade devices are based on the same measurement principle and use an infrared light emitting diode (IRED) optical sensor and a photodiode detector to detect light scattered by particles passing its detection chamber.The light scattering data are then converted into PM 2.5 mass concentration using a calibration curve (Li and Biswas, 2017;Curto et al., 2018;Johnson et al., 2018).All four devices also measured temperature and relative humidity, while the BlueAir Aware and Foobot also measured total volatile organic compounds (TVOCs) and carbon dioxide (CO 2 ) levels using a metal oxide semiconductor (MOS) sensor (Hammes, 2016;Moreno-Rangel et al., 2018).

Protocol to Compare the Selected Low-cost, Consumergrade Monitors in Three Households
The AQE2, BlueAir Aware, Foobot, and Speck devices were collocated with the following reference instruments: two personal DataRAMs, pDR1 and pDR2 (pDR-1000; Thermo Electron Corp.), a DustTrak DRX (DRX Aerosol Monitor 8534; TSI Inc.), and two PM 2.5 Personal Modular Impactors (PMI; SKC, Inc., Eighty Four, PA).The DustTrak DRX is an active sampler, which "simultaneously measure[s] size-segregated mass fraction concentrations corresponding to PM 1 , PM 2.5 , Respirable, PM 10 , and Total PM size fractions" (TSI Inc., 2014).The DRX has been previously used in other published literature to measure particulate matter (Wang et al., 2016;Zhang et al., 2018).It should be mentioned that the pDR-1000 is a passive monitor-there is no dedicated air mover.It was included in the study because two of the consumer sensors-BlueAir and Foobot-are also passive devices.The PMIs collected PM 2.5 particulate matter onto pre-weighed 37 mm polytetrafluoroethylene (PTFE) filters with a 2.0 µm pore size (SKC, Inc.) using two AirChek XR air pumps (SKC, Inc.) that were set to 3.0 L min -1 using a TSI 4000 Series flow meter (TSI Inc.).The two PMIs were used to collect 24-h PM 2.5 samples every day for the duration of each monitoring period in each residence.The PMIs were included in this study because filter-based methods are the "gold standard" for determining PM 2.5 mass concentrations (Sousan et al., 2017).Because the mass collected on PMI filters represents an integrated mass concentration, 24-h averaged data from all monitors were also determined to be able to compare the results with the filter-based method.
Sampling in all residences occurred in 2016 during April and May, and all devices were run for 24-h periods over 7 consecutive days in each residence.Residences were visited over 8 consecutive days, first to set up the reference instruments and then to replace PMI filters after each 24-h interval and verify the PMI sampling flow rate once a new filter was loaded.During each visit to the residences, any notable activities in the past 24 hours, such as cooking and cleaning, were reported to the investigators by the residents.
The sampling conducted in the three residences offered the opportunity to measure real-life concentrations of PM 2.5 in three homes.Recruitment included a convenience sample of volunteers with interest in the project; no personal information was recorded or retained.The residences each had different characteristics and occupancy levels.Residence 1 was a single-occupant, one-bedroom apartment on the  first floor of a two-story building located within 200 m of a highway and 100 m from a 24-h restaurant.The living room had a semi-open floor plan and was located near the kitchen with access via the dining room.Residence 2 was a multistory, detached home with eight residents located in a suburban area.Residence 3 was a ranch-style house with three residents located within 100 m of a major roadway.
The living rooms of Residences 2 and 3 were both located directly adjacent to the kitchen.None of the residents in the study reported having pets in the home or smoking indoors.The residents did not report any use of incense or candle burning during the study period.
The devices were placed in the living room of each residence near the edge of a stable surface located away from walls and as close to the center of the room as possible (Fig. 1).Other considerations for equipment placement included access to electrical outlets, minimal drafts from exterior doors, and minimal disruption to access and utility of the room where monitoring was conducted.
The Foobot and BlueAir reported 5-min PM 2.5 concentration averages.Because of setting restrictions, the Speck and AQE2 recorded measurements at 1-min intervals, and 5-min interval data were not available.The reference instruments recorded data at 5-min intervals (DRX and pDR-1000), while the PMIs sampled for 24-h periods.For data analysis purposes, the data from all monitors were converted into hourly and daily averages.
A unique aspect of our study compared to other consumer-grade sensor comparison studies is the inclusion of gravimetric filter monitors.Filtration sampling of air allows for direct measurement of PM 2.5 mass concentration, which is only indirectly measured by monitoring instruments.Monitoring instruments instead use optical detection to measure the light scatter produced by the particles in the air and convert that information to a PM 2.5 mass concentration, which allows for real-time measurement (Amaral et al., 2015).
All filters prior to pre-and post-sampling weighing were equilibrated over a minimum of 72 hours in a weighing room with controlled temperature and relative humidity of 20-24°C and 30-40%, respectively.As a quality control measure, three PTFE filter calibration blanks were always kept in the weighing room and measured during each weighing session.The variability of their weight was typically 2-3 µg, or less than 0.004% of the mean filter mass.For field samples, 10% of filters were used as field blanks.
Statistical comparison was conducted to compare the different consumer-grade and reference monitors over 1-h and 24-h averages.The Pearson correlation coefficients (R values) were generated for comparison between all devices; the R values range from -1 to +1 and indicate the strength of a linear relationship and its direction between two datasets, with a value of 0 representing no linear correlation between the groups, +1 representing a perfect positive correlation, and -1 representing a perfect negative relationship (Sokal and Rohlf, 1995).Because the hourly results were not normally distributed, Spearman's correlation coefficients, a non-parametric, ranked method was also determined for each residence using pooled hourly data.Despite the non-normal distribution of the hourly data, the inclusion of the Pearson correlation was relevant because we focused on estimating the associations between the different devices; however, the significance of these associations cannot be determined (Sokal and Rohlf, 1995).
The Pearson correlation coefficient was calculated to compare each consumer-grade device against the others and against the reference instrumentation using the data pooled from all residences as well as from each individual residence.In order to further compare consumer-grade monitor performance between the measured residences, the 1-h PM 2.5 concentration data from each consumer-grade device were plotted against those from the DRX, and the resulting linear regression equation and the coefficient of determination (R 2 value) were calculated.In the linear regression equation a is the y-intercept, and b is the slope.The coefficient of determination is useful in describing the amount of variation in the dependent variable (y) that can be explained by the independent variable (x) (Sokal and Rohlf, 1995).Linear regression was used despite the hourly data not fitting the bivariate normal distribution requirement because the purpose of the regression was to determine how well the consumer-grade devices agreed with the reference devices (Sokal and Rohlf, 1995).When describing the hourly PM 2.5 results in this paper, we treat the DustTrak DRX as the primary reference instrument because this device includes real-time PM 2.5 mass concentrations.The personal DataRAM was also used as a reference instrument as a comparison passive sampler, but it does not measure PM 2.5 specifically.In addition, the 24-h average concentration data from each direct-reading instrument were paired with PM 2.5 concentration data determined by the gravimetric filters for all residences and analyzed using a Student's paired t-test, which can determine whether there is a difference between the means of matched pairs of data (Sokal and Rohlf, 1995).The pooled 24-h average concentrations were found to be normally distributed based on the Kolmogorov-Smirnov test.

Comparison of PM2.5 Mass Concentrations: Hourly Averages
The time-series of the average hourly PM 2.5 mass concentrations for each residence is shown in Fig. 2. Airborne PM 2.5 mass concentrations in Residence 1 were consistently low during the sampling period with few observed peaks and only one clearly identifiable concentration peak on April 7, 2016, when the reference devices and most consumer-grade monitors reported an hourly average concentration above 20 µg m -3 .Typically, the BlueAir Aware consistently reported the highest PM 2.5 mass concentrations, followed by the AQE2.The hourly mean concentrations and standard deviations were 23.31 ± 4.07 µg m -3 for the BlueAir Aware and 13.80 ± 1.17 µg m -3 for the AQE2 (Table S1).The average concentrations for each device and residence are reported in Table S1.Average hourly concentrations measured by all other devices ranged from 4.28 ± 1.80 µg m -3 (Foobot, Residence 1) to 37.43 ± 14.58 µg m -3 (BlueAir Aware, Residence 3).Residence 2 and especially Residence 3 experienced more frequent and more pronounced concentration peaks compared to Residence 1. Similar to our observation in Residence 1, the BlueAir Aware consistently reported the highest concentrations among the devices: a mean concentration of 33.76 ± 20.06 µg m -3 for Residence 2 and 37.43 ± 14.58 µg m -3 for Residence 3.All other devices followed each other much more closely.The DRX reported similar lower concentrations in Residence 1 (6.41 ± 2.19 µg m -3 ), which were much lower than those observed in Residences 2 and 3 (10.57± 20.02 µg m -3 and 21.36 ± 30.05 µg m -3 , respectively; Table S1).There is a data gap in Residence 2 from April 30 through May 1 for data from the DustTrak DRX and PMIs because access to the residence was unavailable.The consumer-grade monitors continued to collect data for Residence 2, which are presented in Fig. 2. All observed peaks were likely due to particle penetration from outdoors or cooking activities, such as frying fish, stir-frying, and cooking eggs and toast in a toaster oven, as reported by the residents.During the study, there were no reports of other potential indoor sources, such as cleaning, candle or incense burning, or pets.All residences reported having natural gas cooking appliances.
For the pooled hourly data, according to the Pearson correlation coefficient, the BlueAir Aware and Foobot were strongly correlated with the DRX (R = 0.87 and 0.80; Table 2).The Speck was moderately correlated with the DRX (R = 0.60), but the AQE2 exhibited a negative, weak correlation (R = -0.028)that was not statistically significant.Both pDR devices were strongly correlated with the Foobot (R = 0.97 and 0.96), BlueAir Aware (R = 0.94 for both), and Speck (R = 0.85 for both).All mentioned correlations, except that between the AQE2 and DRX, were statistically significant at the 0.05 level (2-tailed).When Spearman's ρ was used, the strongest correlation to the DRX was displayed by the Foobot (ρ = 0.76), while the BlueAir was only moderately correlated (ρ = 0.63).Similar but weaker correlations were indicated by the Spearman's correlations for the comparisons made between the different samplers from the compiled hourly data (Table 2).When compared to the pDR devices, the BlueAir, Foobot, and Speck were all strongly and significantly correlated with both devices.Given these results, the Foobot and BlueAir samplers were the consumer-grade devices that most closely followed the trend behavior of the DRX, despite both working as passive sampling devices.Although both the Speck and AQE2 are active sampling devices, the results from these samplers are only weakly to moderately correlated with both the passive (pDR) and active (DRX) reference instruments.
When the Pearson correlations of the hourly PM 2.5 mass concentrations were calculated separately for each residence, there was a clearly observable residence-toresidence variability.As shown in Table 2, the poorest, yet statistically significant, correlations between the monitors and the reference DRX were identified in Residence 1 (R = 0.31-0.75for the DRX; ρ = 0.18-0.51).Still, the pDR devices (R = 0.78 for both) and the Foobot (R = 0.75) were strongly correlated with the DRX, while the BlueAir (R = 0.52) and Speck (R = 0.45) were moderately correlated.When using Spearman's correlation, however, the Foobot (ρ = 0.51), AQE2 (ρ = 0.43), and Speck (ρ = 0.40) were all only moderately correlated with the DRX.In comparison, all of the consumer monitors in Residence 2 exhibited strong and positive correlations with the DRX (R ranged from 0.89 to 0.99) and moderate to strong correlations with the DRX in Residence 3 (R ranged from 0.63 to 0.95).Only the Foobot maintained strong Spearman's correlations for both Residences 2 and 3 (ρ = 0.79 and 0.82, respectively), although the BlueAir was the most strongly correlated with the DRX in Residence 2 (ρ = 0.82).As mentioned above, for the pooled data, the BlueAir Aware was the most strongly correlated with the DRX (R = 0.88), with the Foobot trailing behind (R = 0.80); however, the Foobot was the most strongly correlated in each individual residence with the DRX (R = 0.75 in Residence 1, R = 0.99 in Residence 2, and R = 0.95 in Residence 3).Correlations between all consumer-grade samplers were much weaker in Residence 1 than in Residences 2 and 3.However, the Foobot device consistently was among the most strongly correlated with the DRX in each residence.
The variability of Pearson correlation coefficients between residences may be due to the differing ability of the devices to detect and respond to different PM 2.5 compositions (Northcross et al., 2013).Differences in particulate matter composition in the measured residences may be due to differences in PM sources and particular activities of the occupants, such as cleaning, cooking, ventilation, and occupancy level.In addition, indoor particulate matter composition can be explained by building composition and materials and also proximity to traffic and other pollution sources (Huang et al., 2015).
The hourly results from each consumer-grade sensor to those from the DRX are illustrated in Fig. 3.The coefficients of determination are shown in Table S2 in Supplemental Information.The slopes and y-intercepts for each consumer-grade device, regardless of the residence, differed from those of the DRX (Fig. 3).Residence 2 had a strong, single concentration spike, viz., close to 200 µg m -3 , which, due to the nature of R 2 calculation, increased the R 2 value and resulted in the highest R 2 values for all devices in Residence 2. For Residence 2, the coefficient of determination ranged from 0.79 (AQE2) to 0.98 (Foobot).By contrast, the R 2 for Residence 1 ranged from 0.10 (AQE2) to 0.57 (Foobot).Overall, the AQE2 exhibited low correlations with the DRX, with slopes close to 0 for all residences (slope ranged from 0.029 to 0.160) and large offsets (y-intercept ranged from 4.35 to 12.8 µg m -3 ; Fig. 3).
The low slope demonstrates a low response of the AQE2 to increasing PM 2.5 concentrations observed by the DRX.The offset indicates the difference in the response of the AQE2 at concentrations close to 0, which differed for each household.The slopes observed for the BlueAir Aware in Residence 1 (b = 0.961) and Residence 2 (b = 0.914) were close to 1; however, the slope for Residence 3 was lower (b = 0.465), indicating the instrument's much lower response to increasing PM mass concentrations in Residence 3. The y-intercepts for the BlueAir Aware were also very high (ranging from 17.17 to 24.21 µg m -3 ), illustrating its high offset and overestimation of PM 2.5 mass concentrations, especially at low concentration levels.The Foobot had varying performance depending on the residence, with slopes ranging from 0.244 to 0.909; however, its y-intercepts were the smallest, ranging from -2.206 to 2.431 µg m -3 (Fig. 3).The Speck also had mixed performance in the different residences with slopes ranging from 0.078 to We separately compared the consumer-grade devices to the passive pDR, and the observed linear regressions are reported in Fig. S2.The results from the comparison with the passive pDR are similar to those observed by the comparison with the DRX: large offsets by the BlueAir, very low slopes from the AQE2, and mixed performance by the Foobot and Speck in the different residences.Since the monitors' performance relative to the DRX and pDR varied substantially among the residences, it is likely that there were substantial differences in PM 2.5 composition and possibly size distribution between the different residences.
Thus, other sensor comparison studies in actual indoor environments should include analysis of PM composition as well as particle size distribution measurements to understand the reasons behind the differences in sensor performance better.This is because particulate matter monitors, such as the DRX, pDR, and consumer-grade sensors, rely on optical sensors to detect PM 2.5 (Amaral et al., 2015), the composition and size of which affect light scattering (Zhang et al., 2018).Sensor response varies not only due to particle composition but also due to specific algorithms for determining the value measured by each device (Manikonda et al., 2016), both of which may account for the different responses observed by each of the monitors in this study.

Comparison of PM2.5 Concentrations: 24-h Averages
Determining the average 24-h PM 2.5 mass concentrations for each monitor allowed for a comparison of the different instruments' results to the gravimetric filter mass concentrations determined by the PMIs; the average daily mass concentrations are shown in Fig. 4, and the Pearson correlation coefficients among the devices for the pooled data are shown in Table 3.The 24-h average gravimetric filter PM 2.5 mass concentrations ranged from 5.09 to 29.07 µg m -3 for Residence 1, from 7.98 to 24.40 µg m -3 for Residence 2, and from 9.28 to 27.54 µg m -3 for Residence 3 (Fig. 4).Similar to the results reported for the average hourly data, the 24-h average mass concentrations reported by the BlueAir Aware were consistently higher than observations from the gravimetric filters and other devices, with the exception of April 5, 2016, in Residence 1, when the average filter mass concentration was 29.07 µg m -3 , higher than the measurements from all other devices.According to the Pearson correlation data presented in Table 2, the results from the Speck and AQE2 were not significantly correlated with the averaged gravimetric filter results.Both the Foobot and BlueAir data showed strong correlations with the gravimetric filter data, and their R values were 0.94 and 0.91, respectively.Interestingly, similar R values were shown by the reference instruments, the DRX (R = 0.83) and pDR (R = 0.55), when compared with the gravimetric filters.The Pearson correlation results illustrate that the Foobot and BlueAir, similar to the DRX and pDR, strongly follow the daily PM 2.5 trends observed by the gravimetric filters.Initial Student's paired t-tests compared the data from the two PMIs for all residences combined and found that there was no statistically significant difference between the two gravimetric filter datasets (p = 0.12; Table 4).The same test with pooled data was conducted to compare the two pDR devices, and no significant difference was found (p = 0.15).Therefore, the daily results for both the PMI filters and pDR monitors are reported as the average daily PMI ("PMI Average") and average daily pDR ("pDR Average"), respectively, in Fig. 4. The 24-h PM 2.5 concentrations measured by all consumer-grade monitors were found to be significantly different (p < 0.05) from the averaged mass concentration determined by the PMI filters (Table 4).The average daily mass concentrations measured by our reference instruments, the DRX and the pDR, were not significantly different from the average mass concentrations measured by the PMIs (p = 0.45 and p = 0.080, respectively).The DRX and pDR daily averages were also not significantly different from one another (p = 0.36).The paired t-test results validated the use of the DRX and pDR as reference instruments based on their performance in comparison to the gravimetric filters (Table 4).However, we would like to stress that the daily PM 2.5 mass concentrations measured by the consumer-grade sensors were significantly different from the gravimetric filter measurements (Table 4).The direction of the difference depended on the device: The daily PM 2.5 concentrations measured by the BlueAir were higher, while the concentrations measured by other devices were statistically significantly lower based on Student's paired t-tests (Table 4).
To determine whether there were significant PM 2.5 mass concentration differences between residences, the average 24-h values for each device were compared between different residences using ANOVA and Bonferroni post hoc analysis (Table 5).The 24-h average means for consumer-grade monitors AQE2 and BlueAir in Residence 1 were significantly different from those in Residences 2 and 3. Furthermore, Residence 1 was also significantly different from Residence 3 for the DRX and pDR devices and different from Residence 2 for pDR1 (Table 5).The daily concentrations between Residence 2 and 3 were not significantly different from one another for any of the sampling devices (Table 5).However, no significant differences among the PM 2.5 concentrations in different residences were observed for the Foobot, Speck, and gravimetric filter data from the PMI.The differences observed between Residence 1 and the other residences may be due to higher human occupancy in Residences 2 and 3. Similar to the observations from the hourly reported results, the consumer-grade monitors often performed differently in the different residences (although only Residence 1 was statistically significant), which can be attributed to different behaviors of the residents and PM sources.Manikonda et al. (2016) tested several consumer-grade devices in a laboratory study, including the Speck.The monitors were compared against a Grimm 1.109 (Grimm Technologies, GmbH, Ainring, Germany), TSI APS 3321 (TSI Inc., Shoreview, MN), and TSI FMPS 3019 (TSI Inc.) (Manikonda et al., 2016).The other consumer-grade PM devices included in the study were the Dylos DC1100-PRO-PC (Dylos Corp., Riverside, CA), Dylos DC1700 (Dylos Corp.), TSI AirAssure (TSI Inc.), and Airsense (SUNY, Buffalo, NY).Aerosols used in the study included cigarette smoke and Arizona Test Dust.The coefficient of determination (R 2 value) for the PM 2.5 mass concentration for the Speck was shown to be strongly associated with the Grimm for both the cigarette smoke (R 2 = 0.92) and Arizona Test Dust (R 2 = 0.96).The coefficient of determination in our study was observed to be much lower (R 2 = 0.359) when compared with the DRX (Table S2).Some of the differences compared to the results of our study may be due to control factors in place during the chamber experiment conducted by Manikonda et al. (2016), in contrast to the environmental setting of the current study, which included various particle sources and likely different particle types.

Comparison of Data with Other Studies
The Community Air Sensor (CAIRSENSE) Network Project tested the capability of low-cost air quality sensors at a regulatory monitoring site, followed by a field deployment outdoors over a 2 km 2 area (Jiao et al., 2016).The CAIRSENSE project compared several PM sensors, including the Air Quality Egg (version 1) (Wicked Device), Shinyei PMS-SYS-1 (Shinyei Technology Co., Kobe, Japan), Dylos DC1100-PRO-PC (Dylos Corp.), Airbeam (HabitatMap, Brooklyn, NY), and Aerocet 831 (MetOne, Grants Pass, OR).The study included collocated deployment of the devices at a regulatory air monitoring site for 30 days followed by an up to 7-month deployment as a wireless sensor network.Initial testing at the monitoring site included collocation with reference instruments, namely the MetOne BAM-1020 (MetOne).Compared to the FEM, the first-generation Air Quality Egg exhibited very poor correlations (R = -0.06 to 0.40) in the initial CAIRSENSE field testing (Jiao et al., 2016), which was similar to our findings for the comparison of the Air Quality Egg version 2: Particulate Pollution with the DRX (R = -0.028)for 1-h average mass concentrations (Table S2).Both this study and the CAIRSENSE project utilized comparisons with reference instruments in the field, although in different types of environments.Despite the updates made between the two generations of the Air Quality Egg, the observed correlation between this consumer-grade device and reference instrumentation was virtually non-existent.
The performance of several consumer-grade PM monitors was analyzed in a chamber study by Sousan et al. (2017) focusing on occupational aerosol exposures.Their study compared the Foobot, AirBeam, and Speck against the pDR-1500 (Thermo Electron Corp.), scanning mobility particle spectrometer (SMPS-C 5.402; Grimm Technologies, GmbH), and Aerodynamic Particle Sizer (APS 3321; TSI Inc.).Occupational aerosols tested included salt, welding fumes, and Arizona Road Dust.Their study found very strong Pearson correlations for both the Foobot (R = 0.99) and Speck devices (R = 0.92-0.99)for all aerosol types measured.However, they observed that both devices had different slopes based on the type of aerosol measured.In our study, when these devices were compared with the DRX for all residences combined, we observed similarly strong correlations for the Foobot (R = 0.95) but only good correlations for the Speck (R = 0.60).The study by Sousan et al. was conducted in a chamber containing individual occupational aerosol sources.By contrast, our study measured PM 2.5 mass concentrations in residential environments, which comprised a mixture of different particles, sources, and their interactions.
A study by Zikova et al. (2017) compared 66 Speck devices collocated with a Grimm optical particle counter (OPC Portable Aerosol Spectrometer 1.109; Grimm Technologies, GmbH) and carbon monoxide (CO) monitors in 1 residence and 2 outdoor campaigns.The indoor campaign focused on indoor air sources, including a wood stove, cooking, front porch reconstruction activities, and resuspension of particulate matter.The outdoor samplers were located on the front porch of the same residence ~15 m from the street.Their study distinguished combustion and non-combustion sources of PM using the collocated CO monitors.The data reported in their study included 1-min and 1-hour averaged concentrations.The coefficient of determination produced by the Speck exhibited overall low correlations when compared with the Grimm OPC (R 2 = 0.07-0.29 for 1-min and R 2 = 0.15-0.46for 1-h data).This was comparable to the weak correlations (R 2 = 0.36) between the Speck and the DRX observed in our study (Table S2).Both studies examined the use of consumergrade devices in residential settings, which may account for the similar results obtained by the Speck sensors.These two studies contradict the results on the Speck sensors observed in the occupational aerosol chamber study by Sousan et al. (2017) and suggest that there are limitations to using the Speck in indoor environmental settings.
Poor to moderate correlations of PM 2.5 mass concentrations measured by all devices, including both the consumer-grade monitors and the direct-reading reference instruments, clearly indicate that aerosol monitoring devices that rely on light scattering for particulate matter detection should be calibrated for a specific environment where they will be used if one desires to measure mass concentrations accurately and not just observe trends in PM presence.This is an important performance aspect because environmental and other air pollution standards are based on gravimetric measurements or equivalency with them if other methods are used.
Our consumer-grade PM 2.5 monitor study was unique because the devices were tested in multiple residential environments.One of the limitations was that a single unit for each consumer-grade monitor model was used.Despite this limitation, we were able to illustrate the different responses of several consumer-grade devices to changes in indoor PM 2.5 mass concentrations and how well their measurements follow the selected reference devices.A study incorporating multiple sensors would provide data on the precision of the same type of device, such as a study by Zikova et al. (2017).Furthermore, the use of multiple devices of the same make would offer the ability to detect spatial variability in PM 2.5 concentrations.
In this study, the BlueAir and Foobot devices exhibited strong Pearson correlations (R > 0.8) with the DustTrak DRX, the primary reference instrument used for the 1-h average PM 2.5 mass concentrations.However, the BlueAir Aware consistently overestimated PM 2.5 mass concentrations relative to the DRX for average hourly measurements and also for 24-h average concentrations compared to the gravimetric filter data.The BlueAir's high offset values are clearly illustrated by the linear regression curves in Fig. 3, which compare average hourly data between the consumer-grade monitors and the DRX.Although corrections may be made for the positive offset in mass concentration, this shift to higher PM 2.5 concentrations may unnecessarily alarm consumers.By contrast, the Foobot followed the mass concentrations of the reference instrumentation more closely in the monitored residential environments.We have shown that the Foobot was correlated the most strongly with the reference instrumentation-with the DRX for the hourly PM 2.5 concentrations in each individual residence and with the PMI for the daily average concentrations (R = 0.56) when data from all residences were combined (Table 3).

CONCLUSIONS
Monitoring the PM 2.5 levels in three different residences with different numbers of occupants and various occupant behaviors enabled us to evaluate the accuracy of several low-cost, consumer-grade devices in measuring PM 2.5 mass concentrations in different types of environments.We observed clear changes in the mass concentration due to cooking and other residential activities.Although our study assessed the performance of these monitors "out of the box," future studies may include a laboratory calibration to establish a baseline in a controlled environment prior to deploying the instrument in residential or other real-world settings.Further testing is recommended, including using replicate devices in a controlled chamber study and expanding residential sampling to include additional home pollutant sources, such as candles or incense; additional indoor environments, such as offices or public spaces; and long-term measurements to observe potential changes in performance over extended time periods.Although the Foobot exhibited the best performance in our study and may be useful (along with the BlueAir Aware to some extent) for observing trends in the PM 2.5 mass concentration for a residential environment, it does not necessarily report the actual concentrations and likely requires calibration prior to deployment in the field.Furthermore, several other lowcost devices, which have not yet been extensively tested in the laboratory or typical indoor environments, have entered the consumer market since this study was conducted.
Overall, due to their low cost, portability, and ease of use, consumer-grade PM devices hold great promise for democratizing air pollution measurements and enabling the creation of high-density monitoring networks.At the same time, users should be aware of these devices' limitations, including, as our study has shown, their variable performance, which depends on the location and, very likely, the particulate matter composition.For more accurate measurements, these monitors should be calibrated for a specific location and, if possible, the specific air pollutant(s) of interest.

Fig. 2 .
Fig. 2. Hourly averaged PM 2.5 mass concentrations for each residence.There is a gap in the DRX data in Residence 2, which occurred from April 30 to May 1.

Fig. 3 .
Fig. 3. Comparison of consumer-grade monitors to the DustTrak DRX using hourly averaged PM 2.5 mass concentration data for each residence on a logarithmic scale.

Table 1 .
Consumer-grade sensors used in this study.
Selected Consumer-grade PM

Table 2 .
Pearson correlation coefficients for hourly averaged PM 2.5 concentrations by residence.Pearson's R correlation coefficients are listed in the upper diagonal and Spearman's Rho (ρ) correlation coefficients are shown in the lower diagonal section for each residence.

Table 3 .
Pearson correlation coefficients for daily averaged PM 2.5 concentrations for all residences.

Table 5 .
Mean difference of daily averaged PM 2.5 concentrations (µg m -3 ) in each residence as measured by different devices.

Table 4 .
Student's paired t-test results for daily PM 2.5 concentrations for all residences.