Post-processing Method to Reduce Noise while Preserving High Time Resolution in Aethalometer Real-time Black Carbon Data

Real-time aerosol black carbon (BC) data, presented at time resolutions on the order of seconds to minutes, is desirable in field and source characterization studies measuring rapidly varying concentrations of BC. The Optimized Noisereduction Averaging (ONA) algorithm has been developed to post-process data from the Aethalometer, one of the widely used real-time BC instruments. The ONA program conducts adaptive time-averaging of the BC data, with the incremental light attenuation ( ATN) through the instrument’s internal filter determining the time window of averaging. Analysis of instrument noise and the algorithm performance was conducted using Aethalometer 1-second data from a soot generation experiment, where input BC concentrations were maintained constant and an optimal ATNmin value was defined. The ONA procedure was applied to four additional data sets (1 s to 5 min data), including cookstove emissions tests, mobile monitoring, continuous near-road measurements, and indoor air sampling. For these data, the algorithm reduces the occurrence of negative values to virtually zero while preserving the significant dynamic trends in the time series.


INTRODUCTION
Atmospheric black carbon (BC) is an important indicator of combustion emissions in ambient or indoor environments, may have direct impacts on health (e.g., Driscoll et al., 1996;Stoeger et al., 2006), and has been highlighted as a significant forcing agent for climate change (e.g., Ramanathan and Carmichael, 2008).Field studies around the world quantify atmospheric black carbon through two general approaches -off-line measurements where a sample is collected onto a filter and then measured in a laboratory setting; and online measurements where BC is continuously measured and reported on a time base of seconds to minutes.Online measurements are critical to research studies characterizing short-term variability in BC, such as measuring source emissions that change rapidly, quantifying outdoor air pollution levels while moving on a mobile platform or lofted in a balloon, comparing time-varying indoor air pollution levels with health indicators, or observing dynamic trends in ambient air quality.
Among the current approaches available to measure BC in an online fashion, filter-based optical techniques such as the Aethalometer TM (Magee Scientific), multi-angle absorption photometer (MAAP, Thermo Scientific), and the particle soot absorption photometer (PSAP, Radiance Research) are in widespread use due to their ease of operation, relatively low cost, and established development history.Recent innovations on the Aethalometer have extended the range of BC measurement applications, with new models (AE42, AE51) providing greater portability by reducing the instrument size and operating from internal battery power.Portable BC instruments have been carried by volunteers or placed in homes to conduct personal exposure monitoring, lofted in a balloon to perform vertical-profile sampling (Babu et al., 2011), and used on-board mobile sampling vehicles to measure in-cabin or outdoor air quality (Beckerman et al., 2008;Kozawa et al., 2009;Wang et al., 2009).
One challenge facing BC measurements via filter-based techniques is measurement sensitivity.Filter-based optical techniques all operate by the same general principalcontinuously drawing the particle-containing airstream through a filter while simultaneously measuring the attenuation of light (usually monochromatic) transmitted through the filter.The incremental change in light passing through the filter over a sample period (t i to t i+1 ) is directly converted to a BC concentration through numerical factors specific to each instrument.The measurement sensitivity is thus directly related to the loading rate of light-absorbing particles on the filter matrix, which is a function of sample BC concentration, air flow rate, and the size of the deposition area ("sample spot size") on the filter.The Aethalometer, which is the specific focus of this study and is widely used by the aerosol research community, bases its measurement on the relationship between the light attenuation value (ATN) and the surface density loading of BC particles on the filter (Hansen et al., 1984).While ATN should always be increasing, when sampling at very high data rates (e.g., 1 second) or in air streams of very low BC concentrations, the presence of instrumental optical and electronic noise can lead to periods when ATN values remain unchanged or even slightly decreased from one time period to the next.Because the calculation is based on successive differences, this noise may create an erroneous low value at one time point followed by a subsequent erroneous high value at the next; or vice versa.In situations where the true BC concentrations are relatively low compared to the magnitude of the noise, the instrument can report negative data.
Users of Aethalometer instruments who seek very high time-resolution data are often challenged by the occurrence of negative values in their BC data sets, which can occur at high frequency (e.g., >30%) when sampling at low concentrations and at a high time resolution.The simple removal of negatives is an inappropriate action to take, as this would disregard the corresponding positive fluctuations due to noise and would bias the final data high.While averaging the data over a longer period of time (e.g., hourly) generally reduces the noise in the signal, this can compromise the need for high time resolution data.There are numerous post-processing strategies that could be employed to reduce the noise in BC data, including a moving average (simple, weighted, exponential) or advanced mathematical techniques to separate the noise and reconstruct the time series (Kostelich and Schreiber, 1993 and references therein).However, these approaches do not take advantage of the additional information available in the ancillary data provided by the Aethalometer -namely, the light attenuation (ATN) values that relate to the internal loading rate of the filter -as well as knowledge on the successive difference nature of the Aethalometer.
The ONA method developed in this study is a simple approach to resolve the noise of real-time data from Aethalometer instruments, while maintaining the highest time resolution possible in the data set by dynamically adjusting the competing factors of averaging time versus noise.The post-processing algorithm was developed and tested using Aethalometer data collected from multiple measurement applications, including a soot generator, cookstove emission tests, indoor air sampling, outdoor near-road sampling, and sampling outdoor air while driving on a mobile platform.The approach presented in this manuscript may generally translate to other filterbased techniques -for example, a study in the remote Arctic applied a similar post-processing procedure to data from a PSAP -however the noise reduction performance and optimal selection of averaging timeframes were not determined (Hagler et al., 2007).This proposed postprocessing algorithm does not correct for filter-loading artifacts, in which the scattering of light by the particles embedded in the filter biases subsequent measurements (Weingartner et al., 2003;Arnott et al., 2005;Schmid et al., 2006;Virkkula et al., 2007;Coen et al., 2010;Park et al., 2010).While this study does not include a filter-loading artifact correction scheme into the post-processing algorithm, the approach presented here could be used in conjunction with a filter-loading correction if desired by the user.

METHODS
This study analyzed BC data collected from multiple measurement projects that utilized various Aethalometer models (AE21, AE42, and AE51, Magee Scientific) operating at time resolutions from 1 second to 5 minutes (Table 1).Key data for the analysis of noise reduction strategies were from a diffusion flame burner emissions experiment (SootGen), which kept a nearly constant BC concentration input to a "micro" Aethalometer (model AE51) and also had condensation particle counter (CPC, TSI) data collected at a high time resolution.A view of this data set reveals the irregular and high-amplitude fluctuations in the 1-second BC data, while the CPC data and the BC data if averaged at a 1-minute resolution confirm that the input concentrations were nearly constant (Fig. 1).Additional data sets used for testing purposes included 1-second sampling of biomass-burning cook stove emissions (Stove) for a two hour time period, 1-second sampling of BC levels on arterial roads and highways using a mobile platform (Mobile) for a two hour time period, 5-minute long-term monitoring of ambient BC levels adjacent to a major highway (Near-Road) covering a one-year time frame, and 1-minute monitoring of in-home BC concentrations (Indoor) covering a two-day time period.All of the test data were recently collected by researchers in the United States Environmental Protection Agency's Office of Research and Development (Table 1), with publications pending for each study.
The Aethalometer calculates the average the average BC concentration for time interval i as: where A s is the deposit cross-sectional area [L 2 ], Q is the sample stream volumetric flow rate [L 3 /t], E atn is the effective mass absorption efficiency [L 2 /M] of the deposited particles  in the filter matrix, and ∆ATN i is the change in attenuation (ATN) over the time interval ∆t i = t i+1 -t i .The Aethalometer reports data at an instrument (intrinsic) timebase ∆t that remains constant for the data set (i.e.∆t i = ∆t for all i) unless the operating timebase is changed by the user.BC is proportional to the rate of change of concentration with time, i.e. the pointwise slope of the ATN time series in Fig. 1(b).A challenge presented by high time resolution measurements is that even with high deposition rates of absorbing material, at short timebases ∆ATN can be sufficiently small to be significantly influenced by measurement noise.
The algorithm proposed here smoothes the BC time series through a user-specified minimum change in attenuation (∆ATN min ), which for a given BC concentration results in an adjusted timebase ∆t'.For sufficiently high BC and/or a long intrinsic timebase, ∆ATN i will be greater than ∆ATN min and the intrinsic time resolution will be preserved.However, for relatively lower BC concentrations and/or short timebases, ∆ATN i will be less than ∆ATN min and the time series will be smoothed over the time interval ∆t i ' > ∆t that is needed to reach ∆ATN min .A second constraint is that the ATN value at the end of the interval ∆t' must be the last occurrence of that value in the remainder of the time series for that sample spot, thus extending ∆t i ' to the final occurrence of that ATN value.The frequency of negative BC values are reduced using this constraint because a return to the same ATN value later in the time series means ∆ATN < 0 at that time step which results in a negative BC concentration (Eq.( 1)).An implication of the second constraint is that there is some degree of smoothing even if ∆ATN min is zero.
In principle the average BC concentration over the time interval ∆t i ' could be calculated from Eq. (1) and using ∆ATN i '/∆t i '.However, the light transmission measurement duration at high time resolution is very short and thus susceptible to noise.Furthermore, ATN values reported by the Aethalometer are truncated at 0.01 units which can add uncertainty to the reconstruction of BC using Eq. ( 1).In light of these issues, the average BC concentration over time interval ∆t i ' is calculated by averaging the set of BC values reported at the intrinsic timebase over that interval.
For many data sets, the ATN values increase to a threshold value (the user-specified maximum attenuation) and then reset to a low value upon the filter tape automatically advancing to provide a fresh filter spot.In order to handle filter changes in the algorithm, the processing window is confined to the region of a single filter spot and the final averaging period (adjusted timebase ∆t i ') immediately prior to the filter change can be truncated.The above data processing steps have been implemented in an algorithm we term the Aethalometer Optimized Noise-Reduction Averaging (ONA) method.The specific steps of the algorithm include the following: (1) Time series are imported for three parameters in the Aethalometer raw data -timestamp, black carbon concentration (BC), and attenuation (ATN).
(2) User specifies the minimum change in attenuation (∆ATN min ) for averaging the BC data with the default value of ∆ATN min = 0.05 based on the analysis provided below.
(3) Identify the locations of filter tape advances in the time series, if they exist.(4) For each time series corresponding to a single sample spot, start at the beginning of the time series (t (0) = t 0 , ATN (0) = ATN 0 ) and determine the shortest time interval ∆t' that meets the criterion ∆ATN (= ATN t(0)+∆t' -ATN 0 ) ≥ ∆ATN min .(5) Search the remainder of the sample spot time series following t (0) + ∆t' to check whether there any occurrences of ATN i ≤ ATN t(0)+∆t' .If so, update ∆t' to be the last such occurrence in the sample spot time series.(6) Average the raw BC over the window t (0) + ∆t' and apply this average BC value to each record at the timestamps within the window.This maintains the output being concentration values reported at the intrinsic timebase.The number of data points used in the averaging for that interval, ∆t'/∆t, is also recorded for each timestamp within the window.(7) Starting with the next record after t (0) + ∆t' in the raw data time series, which has timestamp t (1) = (t (0) + ∆t') + ∆t, successively repeat steps (4) through ( 6) until the end of the sample spot time series is reached.If t (k) + ∆t' for the last (k th ) interval extends beyond the sample spot time series, set ∆t' so that the end of the interval coincides with the last record in the sample spot time series.(8) Repeat steps (4) through (7) for each sample spot time series in the data set.The Aethalometer ONA program has been implemented in MATLAB (version R2010b, Mathworks, Inc.) and the m-file code and instructive comments are provided in Supplementary Information (SI).The code for ONA may also be evaluated for conversion to other programming languages.EPA has developed a version of the algorithm as a stand-alone program allowing for batch-processing of data files, which will be publically available by internet.
In order to evaluate the results of the noise-reduction algorithm, two metrics were established -a measure of noise in the original and post-processed data and a measure of time resolution in the post-processed data.Noise was quantified as the average absolute value of the instantaneous change in BC in the data set as follows, Noise (ng/m 3 ) = 1 0 where n + 1 is the number of records in the time series.This measure of noise should characterize instrumentbased fluctuations when the sample BC concentration is kept constant, such as the SootGen experiment, or when the sample concentration changes slowly relative to the sampling timebase.For situations where the sample concentration may be changing rapidly, such as the Mobile and Stove cases, the estimate of noise may be biased high although the relative changes in noise from the original to post-processed data are still informative.Time resolution is evaluated as a histogram or empirical cumulative density function (ECDF) of the weighted timebase or, for simplicity, as the weighted median timebase of the data (equivalent to the 50 th percentile of the ECDF).

Relationship between Light Attenuation and Noise Reduction
The SootGen experiment provided critical data to evaluate the relationship between the minimum ∆ATN value applied in ONA and the reduction of instrumental noise, as the sample BC level was maintained nearly constant for several hours while data was acquired at a 1-second resolution.The noise level in the original data set was estimated at 12,500 ng/m 3 , relative to an overall mean concentration of 27,500 ng/m 3 .Applying ONA with various ∆ATN min levels ranging from 0 to 0.1, ONA-processed data had final noise levels decreasing from 2,185 to 11 ng/m 3 , respectively.Even with ∆ATN min set to zero, ONA performed smoothing over periods of fluctuating ATN levels and reduced the noise level six-fold in the data (Fig. 2).Adding a minimum ∆ATN requirement reduced the noise level by an additional order of magnitude, until an asymptote is reached at approximately  2).∆ATN = 0.05.Therefore, the value of ∆ATN = 0.05 was applied to post-process other test data sets (Stove, Mobile, Near-Road, and Indoor) with the ONA algorithm.It should be noted that the analysis shown in Fig. 2 is appropriate for cases where BC levels are maintained relatively constant and would be more uncertain in conditions with variable input concentrations.
Another conclusion that may be drawn from the noisereduction versus ∆ATN analysis for the SootGen experiment is that it is possible to use the ∆ATN analysis to select an ideal constant averaging time base to minimize instrumentrelated noise.For studies desiring a set averaging period, Aethalometer BC data could be analyzed to determine the minimum averaging time window required such that ∆ATN over any time window in the BC data equals or exceeds 0.05.

ONA Performance for Test Cases
The ONA program results, with ∆ATN min = 0.05, indicate that the adaptive time-averaging approach based on lightattenuation signals in the instrument successfully resolves noise while retaining the significant trends in the data sets (Fig. 3 and Table 2).For all cases sampling at a rate of 1 minute or faster, ONA reduced noise in the data by approximately an order of magnitude (Table 2).For the Near-Road case, which reported data at a 5-minute interval, the longer sampling period already resolved much of the noise in the data, thus ONA only slightly lowered the noise for that case.
One consequence of instrumental noise while sampling at low concentrations is the appearance of negative values in Aethalometer data sets.The Stove and Indoor data had approximately 30% negative values in the original data, which ONA reduced to zero after processing.The alternative approach of applying running averages to these data sets can reduce the occurrence of negatives -for example, applying a factor of 60 longer averaging window to the Stove (60 s average from the original 1 s) and Indoor (60 min from 1 min), leads to 23% remaining negatives for Stove and 6% remaining negatives for Indoor.However, applying a fixed longer averaging period simply to reduce negatives can lead to a loss in the ability to detect real variations and trends in the data.For example, a 60 min running average for the Indoor case drops the central peak at 2:00 (Fig. 3(e)) by over 60%, while the peak is preserved in the ONA adaptive time-averaging approach.For the Stove and Indoor cases, several lengthy averaging windows, a maximum 8 hrs for the Indoor case and 2 hrs for the Stove case, were generated by the ONA algorithm for time periods with low concentrations and when BC values are seen to oscillate around zero (Fig. 3(b) and (e)).Meanwhile, at least 20% of the time covered by the Indoor data and Stove data had time averaging windows shorter than 20 min or 13 seconds, respectively.
An efficient way to assess the ONA-produced varying timebase of the example data sets is through plotting the empirical cumulative density function for each case (Fig. 4).Among the data sets sampling at a 1 second time resolution (Mobile, Stove, and SootGen), the Mobile data retains the highest time resolution of the group after postprocessing with ONA.While the BC levels measured in the Mobile case were much lower than that sampled for Stove and SootGen (Table 2), the Mobile data collection was performed using a model AE42 and operating at a flow rate of 4 L/min, while the Stove and SootGen experiments used a model AE51 operating at a flow rate of 0.05 L/min.Thus, the Mobile case generally had a more rapid increase in ATN (i.e., higher filter-loading rate of BC) throughout the time series, relative to Stove and SootGen.For the Near-Road case, ONA only alters about 15% of the 5-minute data, while ONA alters nearly 100% of the data for all of the other cases (Fig. 4).

CONCLUSIONS
Black carbon, produced through the incomplete combustion of biomass or fossil fuels, is a key air pollutant measured in combustion emissions studies or field studies evaluating combustion-related air pollution.While BC data is sometimes desired at the highest time rate possible (e.g., 1 second), instrument noise can lead to erroneous data recorded by the commonly used Aethalometer.The ONA algorithm capably reduces noise in Aethalometer data, with the time-averaging window determined by the algorithm based upon the loading rate of light-absorbing particles on the instrument's internal filter.Five unique Aethalometer BC data sets are analyzed using the ONA procedure -a soot generation experiment, biomass-burning cook stove emissions study, on-road mobile monitoring study, continuous measurement of a near-road ambient environment, and an indoor air sampling study.Data processed from the soot generation study reveal a significant relationship in noise reduction by ONA and the selection of the required minimum change in light attenuation, reaching a plateau in noise-reduction at ∆ATN min = 0.05.The ONA algorithm leads to significant noise reduction in all cases tested, while the ability to  29.0 0.0 a Computed as a weighted median and equivalent to the 50 th percentile of the ECDF shown in Fig. 3. b The noise metric was quantified using Eq.(2).detect temporal variations in the data is preserved.An important added advantage of the ONA algorithm is the reduction of the occurrence of negative data values in lower concentration sampling environments -for all cases tested, the ONA post-processed data have a near-zero occurrence of negative values.

Fig. 1 .
Fig.1.Black carbon (raw 1 Hz data and post-processed with a 1-minute running average) and particle count measured for a constant soot generation experiment (a).The corresponding filter attenuation raw signal for the Aethalometer (model AE51) is shown (b).Note that for certain Aethalometer models, the ATN baseline is not set to 0 for a new filter spot -the case shown has an arbitrary negative baseline that does not affect the BC values reported.

Fig. 2 .
Fig. 2. Comparison of estimated noise in the post-processed SootGen BC signal versus the input minimum change in ATN required by the post-processing program.Noise is estimated using Eq.(2).

Fig. 4 .
Fig. 4. Empirical cumulative density function of the timebase for the ONA post-processed example data sets, requiring a minimum ∆ATN = 0.05.The Stove, Mobile, and SootGen data had an original timebase of 1 s, the Near-Road original timebase was 300 s (5 min), and the Indoor original timebase was 60 s (1 min).

Table 2 .
Metrics for sample data sets.