Source Apportionment of PM 2 . 5 Using Positive Matrix Factorization ( PMF ) and PMF with Factor Selection

Personal exposure, indoor, residential outdoor and urban background particulate matter (PM2.5) samples were collected in parallel, for 30 participants and analyzed for their chemical content. Source apportionments for the separate microenvironments were performed using conventional positive matrix factorization (PMF), and for the combined dataset, applying a new PMF method with factor selection. Regional sources were the largest contributor to the sampled PM2.5 in all microenvironments and accounted for 69% in urban background; 55% and 54% in residential outdoor and indoor environment, respectively; and 40% of personal exposure. For personal exposure, personal activities accounted for 21% (2.2 μg/m), and constituted the main difference in total mass concentration between personal exposure and the other microenvironments. The PMF method with factor selection was found to be a useful tool in the PMF analysis of multiple microenvironments, since ambient contributions to indoor and personal exposure are less likely to be distorted or misinterpreted. The possibility to more correctly estimate the source contributions will increase by combining the datasets for the different microenvironments into a larger dataset and using the PMF with factor selection method.


INTRODUCTION
People are exposed to air pollution which originates naturally or due to anthropogenic activities.Air Pollutants can travel thousands of miles from its source (Huneeus et al., 2011), their concentrations vary spatially and temporally (Ito et al., 2004).Moreover, daily activities (traffic, road dust, domestic wood burning, and industrial activities) play an important role in the concentration and distribution of PM (Tang et al., 2004).
Numerous epidemiological studies have found associations between mass concentrations of fine particulate matter (PM 2.5 ) and adverse health effects (Samet et al., 2000;Pope et al., 2002;Brook et al., 2010;WHO, 2013).There are many studies investigating relationships between personal exposure and indoor as well as ambient levels of particles mass concentration.These studies were mostly focused on PM 2.5 and sometimes also on black smoke, and performed in Europe and North America.Some studies have shown that ambient concentrations do not always reflect personal exposure (Oglesby et al., 2000;Sarnat et al., 2005;Sorensen et al., 2005;Johannesson et al., 2007).Some studies have also characterized the chemical composition of the PM 2.5 in personal exposures (Yakovleva et al., 1999;Oglesby et al., 2000;Kinney et al., 2002;Lai et al., 2004;Larson et al., 2004;Janssen et al., 2005;Molnár et al., 2005;Molnár et al., 2006;Zhao et al., 2006;Johannesson et al., 2011).The infiltration of ambient air indoors can be calculated using elements with no indoor sources (e.g., S or Pb), and thus, the influence of ambient air on indoor air can be assessed (Hänninen et al., 2011).Additional information obtained from chemical analysis makes it possible to estimate the contributions from different sources and their influence on personal exposure, using statistical methods like principal component analysis (PCA) (Koistinen et al., 2004) or positive matrix factorization (PMF) (Yakovleva et al., 1999;Larson et al., 2004;Zhao et al., 2006).There are, however, few studies that have data obtained from simultaneous collection of personal, indoor, and outdoor samples.Parallel personal, residential indoor and outdoor sampling is more time consuming and labor intensive compared to a fixed monitoring station study.Therefore, personal exposure datasets are often rather small and less suitable for the application of standard PMF methods.However, it is important to apply PMF on personal exposure data and it is therefore desirable to develop methods for smaller datasets.
In this paper we test a new factor selection script for the multilinear engine (ME-2) program on a small dataset with 30 parallel samples in four different microenvironments (personal exposure, residential indoor, residential outdoor and urban background).The results are discussed and compared to the four separate microenvironment PMF analyses.
The aims were to (1) compare the two different approaches (separate and combined PMF) of source characterization; (2) characterize and quantify sources to personal exposure, indoor, residential outdoor and urban background concentrations of PM 2.5 and investigate differences between the different microenvironments.

Study Design, Participants, and Sampling Equipment
The present study took place in Gothenburg, Sweden between April 2 nd -June 7 th and September 26 th -November 6 th 2002, and in 2003 (March 27 th -June 12 th and October 7 th -30 th ) to cover spring and autumn seasons, respectively.Personal exposure, indoor, and residential outdoor measurements of PM 2.5 for 30 participants were performed in parallel with PM 2.5 measurement at a stationary outdoor urban background station (UB) (Johannesson et al., 2007).The participants lived within 0.8-15 km from the urban background station (median distance 3.3 km).The sampling time was 24 hours.
Identical sets of equipment were used for personal exposure and indoor and residential outdoor sampling, namely, a GK2.05 (KTL) cyclone for PM 2.5 connected to a BGI 400S pump (BGI Inc., Waltham, MA, USA) with a flow rate of 4 L/min.For PM 2.5 measurements at the urban background station, an EPA-WINS impactor (PQ100 EPA-WINS Basel PM 2.5 (BGI Inc., Waltham, MA, USA), with a flow rate of 16.7 L/min was used.We used 37 mm Teflon filters (Pall Teflo, R2PJ037) in the cyclones and 47 mm Teflon filters (Pall Teflo, R2PJ047) in the WINS impactor.Information about the weather (temperature, wind speed, wind direction, rain, etc.) was provided by the Environmental Agency in Gothenburg.More detailed information regarding the study design, the participants, and the sampling can be found in Molnár et al. (2006) and Johannesson et al. (2007).The study was approved by the Ethics Committee at the University of Gothenburg.
One participant had a significant workplace exposure, and this personal sample was excluded in the analysis.Another participant was found to have a very high Ni concentration in the personal sample (46 ng/m 3 vs. the median personal exposure of 2.6 ng/m 3 ).This outlier is believed to be a contamination, and was replaced by the personal exposure median value and the uncertainty was increased accordingly.The total number of samples were; 29 personal, 30 indoor, 29 residential outdoor, and 28 UB.

Analytical Techniques
All filters were weighed before and after exposure using a CAHN C-30 microbalance placed in a temperature-and humidity-controlled room (Johannesson et al., 2007).An energy-dispersive X-ray fluorescence (EDXRF) spectrometer at the Department of Chemistry, Atmospheric Science, University of Gothenburg, was used to analyze the elemental composition of all filter samples (Molnár et al., 2006).The EDXRF spectra were processed and quantified using the Quantitative X-ray Analysis System (QXAS) and the Analysis of X-ray spectra by Iterative Least-square fitting (AXIL) (Bernasconi et al., 2000).The uncertainty is calculated as three times the standard deviation of the fitted peak area, for each element individually.Information about calibration and quality control has been presented elsewhere (Molnár et al., 2005).The mean analytical precision was 5%, as calculated from repeated analysis (N = 5) of two randomly selected filters, one having a low and the other a high mass loading.
After the elemental analysis the filters were examined for black smoke using an EEL 43 smoke stain reflectometer (Johannesson et al., 2007).The reflectance of the filter was transformed into an absorption coefficient, a, according to the international standard (ISO9835, 1993) and afterwards converted to black carbon concentration according to the equation presented by (Quincey, 2007).

Source Apportionment by Positive Matrix Factorization
Source apportionment analysis was carried out by positive matrix factorization (PMF) applying the multilinear engine (ME) technique using the ME-2 program (Paatero, 1999).PMF is a multivariate receptor model concept that estimates the source profiles and their contributions based on a weighted least square approach (Paatero and Tapper, 1994;Paatero, 1997).The task of the PMF model, Eq. (1), is to obtain the unknown matrices, G and F, by the iterative treatment of a least square method: where X is the data matrix (size m × n) consisting of n chemical components analyzed in m samples, G is the source contribution to each sample (size m × p) for p factors, and F is the matrix of source profiles (size p × n).The matrix E is the residual.The main task of the iteration is to minimize the Q-value, which is defined in Eq. ( 2) as the sum of squares of residuals (e ij ) weighted inversely with the error estimates (s ij ) of data points.
In the present study, black carbon (BC) and the following elements were included in the PMF analysis: S, Cl, K, Ca, Ti, V, Mn, Fe, Ni, Cu, Zn, Br, Pb, as well as PM rest .In all the analyses, the elemental mass concentrations have been recalculated to their mean oxidized mass concentrations, when applicable.PM rest is the mass of the unknown part of the sampled PM 2.5 and includes the major ionic species nitrate and ammonium, as well as silicon and organic species.PM rest was calculated by subtracting the identified concentrations (as listed above, running from BC to Pb) from the total mass concentration.
The different types of microenvironments, ambient (i.e., urban background (UB) and residential outdoor), indoor, and the personal exposure were analyzed separately by regular ME-2 to investigate the source contributions without taking into account the other microenvironments.

PMF with Factor Selection
In the analysis combining the four microenvironments UB, residential outdoor, indoor, and personal exposure into one dataset, a new script developed for ME-2 by Pentti Paatero (personal communication) called "PMF with factor selection" was used.In PMF with factor selection, henceforth termed FS, the dataset is complemented with a new selection variable with the values 1 for the ambient samples (UB and residential outdoor), 2 for indoor, and 3 for personal exposure, besides the 15 base variables.In the factor selection script, a factor selection matrix, with three rows (representing the three different types of microenvironments, ambient, indoor, and personal) is used.The columns are the total number of factors in the model.In the matrix, a "1" denotes that the factor should be fitted for the type of sample and a "0" that it should be ignored.An example matrix of a model with four factors for the ambient samples (first row, F1-F4), six factors for the indoor samples (second row, F5 and F6 are indoor specific factors), and seven factors for the personal samples (third row, F7 is the personal activities factor) is presented in Table 1.This means that the whole dataset is used when determining the first four factors, both indoor and personal samples when determining the two indoor factors, and the personal samples only for the personal factor.

Error Estimates and Error Models Used in PMF
The combined dataset contained a number of values below the limit of detection (LoD).For each of these values, the LoD was specified instead of the actual concentration value.For these values, the following modeling procedure was used.The LoD value was used as data X ij , to be fitted.The uncertainty was set to 0.5 LoD.Additionally, the error model used for values < LoD was set so that the PMF model accepts all fitted values between 0 and X ij = LoD as a perfect fit, giving rise to zero Q contribution.This corresponds to the spirit of "detection level," which essentially means that the true value should be somewhere between 0 and LoD, but no preference exists for any value between 0 and LoD.

RESULTS AND DISCUSSION
Mean PM 2.5 concentrations for personal exposure, indoor samples, residential outdoor samples, and samples at the urban background station were 11.0, 9.7, 7.8, and 10.1 µg/m 3 , respectively (Johannesson et al., 2007).The data on the elemental concentrations can be found in Molnár et al. (2006).The chemically undetermined part of PM 2.5 , PM rest , accounted for, on average, 59% of the total PM 2.5 .

Conventional PMF of the Different Microenvironments Urban Background and Residential Outdoors
A four-factor model (long range transport + ship emissions (LRT + Ships), local combustion, traffic, and sea salt + resuspension) identified the major sources for PM 2.5 at the UB and residential outdoor sampling sites (see Fig. 1 and Table A1 in the supplement material for the factor profiles).
LRT + Ships dominated the contribution both at the UB and residential outdoor locations, (69% and 55%, respectively).The high contribution from LRT pollution is in accordance with the results presented by Forsberg et al. (2005), who estimated that on average, only 29% of ambient PM 10 in Sweden was of local origin, and the majority was from regional sources.The local combustion factor accounted for 20-23% for UB and residential outdoor PM 2.5 .Fossil fuel and biomass burning for residential heating were sources in this factor, as were emissions from industries and the refinery in the harbor area.The other two sources made only minor contributions (less than 1 µg/m 3 on the average).The small contribution from traffic was likely in part due to the fact that the UB station is at an elevated position (at "the medical hill" on the campus area of the faculty of medicine) and not close to any major traffic routes.For the residential outdoor, the majority of the homes participating in the study were also not close to the major roads.Sea salt + resuspension together formed the third ambient factor.Since westerly winds (from the ocean) are common and wind speeds are usually higher from this direction, both transport of sea spray and windblown dust are part of this factor.
Similar average levels between UB and residential outdoor were found for the factors LRT + Ships and traffic, as expected, since both affect the whole city (and major traffic routes tend to be located away from residential areas).The levels of similarities for the other factors; sea salt + resuspension and local combustion were very low.The reason is probably that these sources are more local and therefore affect the different places (i.e., the locations of the homes) differently due to dispersion.

Indoor
For the indoor measurements, six different sources/factors could be identified (Fig. 2(a)).Indoor resuspension had the highest contribution to the indoor levels followed by traffic.The remaining four factors (marine, indoor Cu, soil Table 1.The Factor Selection matrix with seven factors in total.Four ambient factors, two additional indoor factors, and another one for personal exposure.
Fig. 1.The resulting factor contributions to PM 2.5 for the combined urban background and residential outdoor datasets.

Personal Exposure
The personal exposure were best represented by seven factors (Fig. 2(b)), with strong influence from indoor activities (mainly indoor resuspension) as was the case for indoor measurements.Indoor burning, indoor Cu, LRT, soil, and the marine factor, together with the personal activities factor, were minor contributors.Factors that contributed the most were associated with various activities such as movements indoors that cause resuspension of settled dust, and burning candles, cooking/frying, and use of household appliances (the Cu factor).
The resulting source factors from the PMF modeling of the different microenvironments, ambient air (UB and residential outdoor), indoor, and personal exposure, can be found in Tables A1-A3, in the supplement material.

Time Series Comparison
Strong correlations were found between UB and residential outdoor for the factors LRT + ship and the traffic (r p = 0.80 and 0.75, respectively).For the two other factors, sea salt + resuspension, and local combustion, weaker correlations were found between UB and residential outdoor (0.43 and 0.23, respectively).The correlations between the factors in the indoor model vs. the factors in the UB and residential outdoor models were weak for all combinations of factors, r p < 0.5.The elemental source profile for the indoor resuspension factor indicated that it consisted of a combination of outdoor sources (LRT, oil combustion, traffic, and soil), but was not correlated to these factors in the ambient models.This leads to the conclusion that although the indoor resuspension factor was consisted of a considerable quantity of particles that were of outdoor origin, the day-to-day variation indoors was mainly governed by indoor activities and not by the fluctuation of outdoor concentrations.
For personal exposure, there were strong correlations to the corresponding indoor factors indoor/personal Cu, marine, and indoor resuspension (r p = 0.93, 0.81, and 0.78, respectively), and moderate correlation to the LRT factor (r p = 0.57).The day-to-day variations were therefore most likely directly connected to the corresponding indoor activities.
When comparing the time series contribution for the personal exposure with the UB and residential outdoor, two groups of factors stand out.Firstly, the personal LRT factor was moderately to strongly correlated with the UB and the residential outdoor LRT + Ship and traffic factors (r p = 0.66 and 0.83 compared to the UB time series and r p = 0.44 and 0.55 compared to the residential outdoor time series).Secondly, there was a moderate correlation between the marine factors, r p = 0.35-0.44.For all other factors, no or very weak correlations were found.The time series factor contribution for the different microenvironments can be found in Fig. A1, in the supplement material.

Combined Analysis, PMF with Factor Selection
Analysis of the full dataset in one combined model using the factor selection script for the ME-2 resulted in a model with four outdoor factors (LRT(FS), local combustion(FS), traffic(FS), and sea salt + resuspension(FS)), two additional indoor factors (indoor resuspension(FS) and indoor heated sources(FS)), and a personal activities(FS) factor.Combined model factors are indicated in the text with a (FS) in the end of the name to distinguish from the factors obtained by the separate PMF models.The source profiles are presented in Fig. 3.
The proportion of each factor in the different microenvironments is presented in Fig. 4. The contribution from LRT(FS) was the main source and dominated the ambient environments, UB (69%) and residential outdoor (55%), and was also the largest single source indoors (54%) and for personal exposure (40%).The local combustion(FS) factor was stronger for residential outdoor than UB, but contributed only to a lesser extent indoors and for personal exposure.The contribution from the traffic(FS) factor was small in all environments.The contribution from the sea salt + resuspension(FS) factor to UB was small, probably due to the elevated position of the sampling site, and like the traffic(FS) factor, contributed more to residential outdoor and personal exposure than to UB. Regarding the two indoor factors, indoor heated sources(FS) was stronger than indoor resuspension(FS) and the mass contributions were nearly equal for these factors for both indoor and personal exposure.The factor contributions for the indoor and personal exposure were similar (excluding the personal activities(FS) factor).In the personal exposure, the personal activities(FS) factor accounted for 2.2 µg/m 3 (21% of the total personal exposure).The estimated mean PM 2.5 mass for personal exposure was 10.6 µg/m 3 , and 7.6, 8.0, and 8.6 µg/m 3 for UB, residential outdoor, and indoor, respectively.All factors had fairly similar contribution for personal exposure compared to indoor and residential outdoor.Consequently, the personal activities(FS) add an extra mass contribution, resulting in a higher personal exposure compared to the other microenvironments.Similar results have been shown in previous studies (Wallace, 2000;Williams et al., 2003;Ito et al., 2004;Wallace et al., 2006).
The correlations between the microenvironments in the FS model for the different factors were generally high (Table 2).For the traffic factor however, significant correlations were only found between residential outdoor and indoor and for residential outdoor and personal exposure, and no correlations were found between UB and any of other microenvironments.This is likely due to the fact that traffic is a local source and varies over the city.The two indoor sources, indoor resuspension(FS) and indoor heated sources(FS), were strongly correlated to the corresponding personal exposure (r p = 0.94 and 0.65, respectively).

The Factor Selection Model versus Separate PMF Models
The four outdoor factors resolved in the FS model were strongly correlated with the corresponding factors in the separate four-factor models for both UB and residential outdoor (r = 0.88-0.98).This suggests that the chosen FS model does not introduce additional ambiguity, at least not regarding the ambient sources.
The separate indoor and personal exposure models, however, deviated more from the FS model for several factors.Since the separate indoor model does not take into account the fact that parts of the sampled concentrations origin from outdoor sources, some sources that had partly the same composition, for example, the factors connected to combustion, LRT, and traffic, showed different correlation patterns.The local combustion(FS) factor was correlated to the indoor model factors LRT and traffic (r p = 0.69 and 0.62, respectively) while the LRT(FS) factor was not correlated to any factor in the indoor model.The factors indoor heated sources(FS) and indoor resuspension(FS) were correlated to the indoor Cu and indoor resuspension factors in the separate models (r p = 0.90 and 0.86, respectively).That is, the separate indoor and personal exposure models may not be able to correctly identify ambient sources.For example, the day-to-day variation of the ambient sources and the resuspension of these at later times, due to indoor activities, will not be separated into different factor types in the separate indoor and personal exposure models.

Comparison with Other Studies
There are only a limited number of studies that have measured personal, indoor and residential outdoor PM simultaneously and then performed source apportionment using the PMF technique on the datasets (Yakovleva et al., 1999;Larson et al., 2004;Zhao et al., 2006).
In the Riverside, CA, study involving 178 participants, Yakovleva et al. (1999) measured PM 10 outdoors (O), indoors (I) and personal exposure (P), as well as PM 2.5 outdoors (O 2.5 ) and indoors (I 2.5 ).The following seven factors (and in which environments they contribute) were found; ambient soil (O), resuspended indoor soil (P), indoor soil (I, P), personal activities (P, I), sea salt (O), nonferrous operations and motor vehicle exhaust (O, O 2.5 , I, I 2.5 , and P), and secondary SO 4 (O, O 2.5 , I, I 2.5 , and P).
In a study involving 20 participants (83 personal samples), in Seattle, WA, Larson et al., (2004) found the following outdoor PM 2.5 sources: vegetative burning, secondary sulfate, mobile emissions, fuel oil, crustal, and Cl rich.The indoor PM 2.5 sources were vegetative burning, secondary sulfate,  In the study by Zhao et al. (2006) on PM 2.5 conducted in Raleigh and Chapel Hill, NC, involving 38 participants, four ambient/residential outdoor factors (secondary SO 4 , motor vehicle, soil, and secondary nitrate) were identified.For indoor/personal, four additional factors (cooking, personal care and activity, environmental tobacco smoke (ETS) and its mixture, and a Cu factor mixed with indoor soil) were identified.
In the present study and the studies mentioned above, there were several factors of similar type/origin in the different environments.Typical ambient/outdoor sources were regionally emitted sources, traffic-related sources, crustal material, and marine influence (when relevant).Several of these sources also contributed in various degrees to the indoor and personal exposure.Common indoor and personal sources were indoor resuspension of soil, indoor activities such as cooking, and other personal activities.The absolute (or relative) contribution of these, and other sources, may differ between the studies due to local and regional conditions, for example, size of the city, vehicle fleet composition, building types and ventilation, climate, season, and industries nearby the sampling stations.It also depends on which measured substances were used in the models.It is therefore hard to make quantitative comparisons between studies from different locations, but qualitative comparisons can be helpful especially during the factor identification process.

Potentials and Limitations of the Factor Selection Model
An advantage of the FS script is that by combining several microenvironments, all observations are put into one large model.This can be important in studies involving time-consuming sampling strategies (e.g., parallel outdoor, indoor, and personal exposure sampling), where the number of sampling days is a limiting factor.A similar approach has recently been reported in combined two-site PMF analysis (Beuck et al., 2011;Molnar and Sallsten, 2013).
For many of the chemical species, strong correlations were found between UB and residential outdoor and between indoor and personal exposure (Molnár et al., 2006;Johannesson et al., 2011).However, the correlations between ambient (UB and residential outdoor) and indoor, and between ambient and personal exposure were much weaker.Nevertheless, the FS script managed to determine source factors that were realistic, and the correlations between the different factors were generally strong when comparing the different microenvironments.
However, there is a principal limitation with regard to the application of PMF on indoor and personal data.Indoor activities vary between different homes.Thus the actual composition profile of indoors-generated aerosol is strongly variable from residence to residence, and even from day to day within one residence.This variation violates the basic assumption of PMF modeling that source profiles are stable over time, and such variation cannot be correctly modeled by two or three factors.For this reason, the modeling of indoor and personal data must be regarded with caution.However, it should be noted that earlier PMF-based analyses of indoor and personal data have not been free of this problem either.
In an analysis of the personal exposure variability in this dataset, the within-person variance component dominated the total variability for nearly all investigated chemical species (Johannesson et al., 2011).

CONCLUSIONS
The new PMF with factor selection model was found to be a useful tool in the PMF analysis of multiple microenvironments, since ambient contributions to indoor levels and personal exposure are less likely to be distorted or misinterpreted.By combining the datasets for the different microenvironments and using the PMF with factor selection method, more correct estimates of the source contributions to personal exposure can be accomplished.In particular, resuspension of settled ambient particles due to indoor activities could only be identified with the factor selection model and not with the separate microenvironment models.For personal exposure to PM 2.5 , personal activities accounted for 21% (2.2 µg/m 3 ), and constituted the difference in total mass concentration between personal exposure and the other microenvironments.
Supplement material for the manuscript: Source apportionment of PM 2.5 using Positive Matrix Factorization (PMF) and PMF with factor selection Table A1.Contributions (µg/m 3 ) for the four factor model for ambient air (urban background and residential outdoor).

Fig. 2 .
Fig. 2. The resulting factor contributions to PM 2.5 for indoor (a) and personal exposure (b) sampling.

Fig. 3 .
Fig. 3.The seven-factor model using factor selection in ME-2 for the whole dataset, with the settings four factors outdoors, two additional factors indoors, and one additional factor for the personal exposure.Concentrations (µg/m 3 ) are shown on the left Y-axis and percentage of each species apportioned in the factors (diamonds) on the right Y-axis.Note that the scales on the left axis are different for the factors LRT and sea salt + resuspension.

Fig. 4 .
Fig. 4. The contribution from the different factors in the FS model to each microenvironment.

Fig. A1 .
Fig. A1.Time series of the UB/O four factor models for Urban Background (top graph) and the Residential outdoor, the six factor model Indoors, and the seven factor model for Personal exposure.Units are in µg/m 3 .

Table 2 .
Correlations between the different microenvironments and corresponding factors in the factor selection model.Correlations marked in bold are statistical significant (p < 0.05).

Table A2 .
Contributions (µg/m 3 ) for the six factor model for the indoor samples.

Table A3 .
Contributions (µg/m 3 ) for the seven factor model for the personal exposure.