Indoor Black Carbon Concentrations and their Sources in Residential Environments: Validation of an Input-adaptive Proxy Model

Exposure to black carbon (BC) in the residential environment was found to be positively associated with elevated blood pressure and cardiovascular disease. However, BC has been under-measured and under-studied compared to other common indoor gaseous and particulate pollutants. Representative indoor mass concentrations of equivalent black carbon (eBC) and the sources' contribution from indoors and outdoors in real-life residential environments in 40 German households were evaluated and presented in this work. During the 500 measurement days, the mean indoor eBC mass concentration was 0.6 µ g m –3 , which is less than half of the outdoor concentration in the urban background in Germany. However, common indoor sources contributed differently to indoor eBC, which also resulted in higher mass concentrations in the cold season than in the warm season. Indoor pollutant measurements are often performed with only a limited number of instruments and pollutant data. To fill in the missing knowledge of indoor BC, a proxy model was developed. This proxy model can predict indoor eBC concentrations based on existing indoor databases or in cases where direct measurements of indoor eBC concentrations are not available. Due to the complex influence of climate and indoor activities, the model separated six scenarios for weather (including warm and cold seasons) and indoor activities (burning, non-burning, and other activities) for typical urban residential environments in Germany. In this study, indoor eBC mass concentrations were found to be best estimated by indoor and outdoor PM 1 . For different scenarios, the model achieved a satisfactory to good coefficient of determination (0.49 < R 2 < 0.77). With the aid of this model, a more accurate prediction of indoor eBC mass concentration and the resulting exposure and health risk assessment can be achieved for households under similar climatic conditions and activity habits of the occupants, e.g., in Central Europe.


INTRODUCTION
Aerosol particles, or particulate matter, have attracted concerns for public health because of their association with respiratory and cardiovascular diseases (Pope and Dockery, 2006;Brook et al., 2010).Exposure to black carbon (BC), has been found to trigger inflammatory reactions in the airways, which is a major cause of asthma and lung dysfunction (Jansen et al., 2005;Bell et al., 2009;WHO, 2012).
In urban areas, the typical combustion sources of ambient BC are traffic and domestic wood burning (Helin et al., 2018;Rönkkö and Timonen, 2019).In the combustion process of carbon-based fuel, BC is produced in the flame, and it is then released to the atmosphere as carbon agglomerates.BC from traffic has been found at the particle size of ~100-150 nm whereas BC from biomass combustion has been detected at ~300 nm (Saarikoski et al., 2021).Besides, BC is capable of being transported over a long distance, and this could also contribute to the urban BC source (Järvi et al., 2008).Due to the limited atmospheric lifetime and unevenly distributed sources, atmospheric BC is characterized by large spatial and temporal variations (Bond et al., 2013).In the atmosphere, BC particles can change during the aging process via particle growth and surface reactions (Timonen et al., 2019).Although BC has yet been considered as a regulated air quality parameter by WHO (2021), many national environmental institutions have their own monitoring network of ambient BC (Kutzner et al., 2018;Guo et al., 2020;Ciupek et al., 2021;Luoma et al., 2021).
In recent years, more studies began to focus on indoor air exposure and reported a positive association between higher residential exposure to BC and an increase in systolic blood pressure and cardiovascular disease mortality (Hvidtfeldt et al., 2019;Rabito et al., 2020).In modern society, people spend about 90% of their time in different microenvironments.Many studies have examined the indoor concentrations and sources of gas and particle pollutants in residential environments, but data on indoor BC are surprisingly scarce.Outdoor pollutants can be transported indoors, and at the same time, indoor combustion processes can also lead to high indoor BC levels.It is well known that smoking and traditional stoves that use solid biomass for cooking or heating cause anomalously high concentrations of BC indoors.Although these sources do not generally exist in homes in developed countries, there are other indoor sources of BC, such as burning candles, cooking with electric stoves, baking, and the use of fireplaces.LaRosa et al. (2002) conducted measurement of equivalent black carbon (eBC) in a house in the USA and reported the peak concentration of one burning candle event of ~12.8 µg m -3 , authors also concluded that the evening increases in BC mass concentrations in cold seasons were mainly due to wood burning.In the study by Isiugo et al. (2019), BC data for 23 homes were analysed, and the average indoor BC mass concentration was higher (by ~0.4 µg m -3 ) in homes where candles were used than in homes where candles were not used.However, data were collected with filters for ~48 hours, so detailed source contributions could not be captured.The authors also pointed out that the sample size (45 observations) was not large enough to detect seasonal differences.Sankhyan et al. (2021) conducted measurements in a test house on four days under "day in the life" and reported a mean concentration during cooking activity of up to 0.8 µg m -3 .
In the real world, the activity habits of residents change at different times of the year, e.g., people tend to perform more activities involving combustion processes such as lighting candles during the cold season (Bekö et al., 2013).In addition, indoor airborne pollutant concentrations are further affected by processes such as ventilation, building envelope infiltration, and deposition (Nazaroff and Cass, 1986;Hussein and Kulmala, 2008).All these processes and parameters influence the temporal variation and spatial differences of BC in each indoor environment.Since measuring indoor air BC in every home is far from feasible in the near future, it will be beneficial to be able to simulate indoor BC based on available indoor air pollutant data.
According to Wei et al. (2019), the most popular statistical models to estimate indoor air were artificial neural networks, multiple linear regression, partial least squares, and decision trees.The concentrations of indoor particulate matter (PM2.5 and PM10) were the most frequently studied parameters, followed by carbon dioxide (CO2) and radon (Rn).the infiltration factor has been also suggested to be a good input parameter, which could be estimated by using the in-to-outdoor sulfur ratio as a proxy (Tang et al., 2018).In particular, indoor BC has been demonstrated to be best estimated by outdoor BC and home characteristics (WeMott et al., 2019).Elevated outdoor BC and indoor burning scenarios (in particular burning candles) could lead to an increased level of indoor BC (Isiugo et al., 2019).
However, research on statistical models estimating indoor BC is scarce compared to those for outdoor BC.Various outdoor BC models have been developed and evaluated using white-box and black-box statistical models (Zaidan et al., 2019(Zaidan et al., , 2020;;Fung et al., 2021;Rovira et al., 2022).
White-box models have been demonstrated to show high transparency and accountability in their model architectures while black-box models manage to give more accurate results.Among all models, white-box models input-adaptive proxy (IAP) and least absolute shrinkage and selection operator (LASSO) have been recommended due to their flexibility and efficiency (Fung et al., 2021).These two models have also demonstrated their high transferability and replicability to upscale BC concentrations from one environment to another (Fung et al., 2024).Since indoor BC has been shown to highly correlate with outdoor BC concentration (Isiugo et al., 2019), these models for outdoor BC would be good alternatives to be deployed for indoor BC.
The objectives of this work are to fill the knowledge gap on the temporal variation and source contribution of indoor BC mass concentration in multiple real-use homes and to develop a statistical model that predicts indoor BC mass concentration based on commonly measured indoor air pollutants.Given that large-scale residential exposure studies still use outdoor pollutant concentrations for health risk assessment due to a lack of data (Hvidtfeldt et al., 2019), the results presented in this paper will enable a more accurate health risk assessment of indoor BC exposure in the residential environment.

Airborne Pollutants Measurement Data
A large-scale residential indoor and outdoor air pollution project was conducted in two cities in Germany (Leipzig and Berlin).The measurement data was collected in 40 non-smoking households over more than 500 days (December 2016-March 2019).Each household was measured twice for about a week at different times of the year.These households were located in regions covering both urban and rural areas.Detailed information on the type of house, location, and measurement duration in the different seasons of each household can be found in our previous work Zhao et al. (2019).
Within the project, the indoor eBC mass concentration was measured by a micro-aethalometer (AE51) in 5-minute intervals in each household.Particle mass and number concentrations were also collected in each home indoors and outdoors with the same temporal resolution.The particle mass concentrations (PM10, PM2.5, and PM1) were measured by optical particle size spectrometers (OPSS Grimm, Model 1.108).Particle number concentration and size distribution were measured by TROPOS-designed mobility particle size spectrometers (TROPOS-MPSS, particle size range 10-800 nm, 69 size bins).The measurements were performed under high-quality requirements for indoor-outdoor aerosol measurements (Zhao et al., 2018(Zhao et al., , 2019)).Both TROPOS-MPSS used identical hardware (see Wiedensohler et al. (2012)) and were frequently calibrated with reference instruments under laboratory conditions in the World Calibration Center for Aerosol Physics (WCCAP) (Wiedensohler et al., 2018).Indoor CO2 concentration was measured with an infrared CO2 sensor (GMP252 Vaisala).Residents' activities were also logged during the measurement.The indoor measurement system was located in the living/dining room, where people spent the most time.Therefore, the recorded contribution from indoor sources such as cooking, cleaning, burning candles, and fireplaces should represent the value to which people are exposed.It should be noted that in homes where the kitchen and living room are separated by doors, the exposure to particles caused by cooking may be underestimated for people working in the kitchen.The measured indoor and outdoor particle number and mass concentrations and CO2 concentration data, as well as meteorological data such as temperature and relative humidity, can be found in Zhao et al. (2019).
Based on ambient temperatures measured by weather stations, seasons were defined as cold, warm, and transitional, with 187 days falling in the "cold season" and 138 days in the "warm season".The 40 measured households were mainly naturally ventilated.In our previous work, the ventilation rate was calculated based on experimental CO2 data using the decay method (Mahyuddin and Awbi, 2012;Turanjanin et al., 2014;Persily, 2016).The mean ventilation rate during the entire measurement period was 0.6 ± 0.7 h -1 .Due to the longer opening time of the windows, the ventilation rate was higher in the warm season with a mean value of 0.8 h -1 than in the cold season with a mean value of 0.5 h -1 .The definition of "seasons" and the classification of ventilation rates are the same as in our previous works (Zhao et al., 2019(Zhao et al., , 2020) ) and were described in detail there.

Data Pre-processing
From the particle number size distribution data, we classified the particle number concentration into four groups based on their size ranges: particle number concentration of the entire size range (PNall), nucleation mode (10-25 nm, PNnuc), Aitken mode (25-90 nm, PNait) and accumulation mode (90-800 nm, PNacc).
We synchronised all indoor and outdoor data on an hourly basis and gap-filled the data in case of missing values using simple linear interpolation.The data distribution of each variable was plotted and tested for its normality.The data were then converted to a logarithmic scale and normalised with a mean of 0 and a standard deviation of 1 as a statistical model is sensitive to variable scales.Due to the autoregressive properties of the measurement, we partitioned the first 80% of the time series as the training set with five-fold cross validation, and the last 20% as the testing set.Similar procedures were also carried out in Fung et al. (2021).
In addition to the above procedures, six scenarios were classified based on seasons (winter and summer) and types of residents' activities (combustion, non-combustion, and other sources).Based on the residents' activity diary, about 900 activity events were divided into 10 categories (see Table 1).The most common indoor activities included opening windows, baking, frying, toasting, other cooking activities (e.g., boiling, heating food, stewing), burning candles, using fireplaces, and vacuum cleaning.The cases where several activities were performed at the same time were categorised as "mixed" activities.The rarely recorded activities such as children playing, ironing, and washing clothes were grouped under "others".These activities were further characterized as three types of residents' activities, "burning" (i.e., combustion source) includes burning candles, frying, baking, toasting, other cooking, and using the fireplace, while "non-burning" (non-combustion source) represents ventilation, vacuum cleaning, mixed, unknown and other activities."Others" imply situations where no activities were marked in the period 00:00-06:00.We chose this 6-hour window as "Other" to ensure that neither the influence of "burning" nor "non-burning" activities took place.

Model Description
To estimate BC, we will utilise a statistical model which learns from the datasets described earlier.Based on the evaluation of black carbon models made by Fung et al. (2021), input-adaptive 2.0 a the ac�vity dura�on is calculated from the beginning of each source event un�l the �me when the indoor par�cle number concentra�on reaches its maximum.Note that when using fireplaces, the par�cle number concentra�on o�en only increased during the ligh�ng phase and no significant increase could be detected a�er closing the fireplace door.proxy (IAP) and least absolute shrinkage and selection operator (LASSO) are the best in terms of input flexibility, output accuracy, and structure transparency, among the other white-box and black-box statistical models.We initially estimated indoor BC using both IAP and LASSO.We included only the results by IAP over LASSO in this study because LASSO has been reported to have issues with reliable estimation of the regression coefficients and interpretation of the contribution of individual variables (Ranstam and Cook, 2018).
IAP first examined Pearson's correlation coefficient of the output variable with the input features.It pre-selected the most correlated input features and created sub-models with a maximum of three input features with ordinary least-squares (OLS) linear regression.The model applied an extra regularisation by using the 'bisquare' weight function, which depends on the residuals, leverages from OLS fits, and the estimates of the standard deviation of the error terms.This alternative would make the model more robust when data are contaminated with outliers which often takes place in field campaigns.Regression was performed using a five-fold cross-validation.Five-fold cross validation would allow sufficient training points to avoid overfitting for scenarios "warm burning" and "warm non-burning", which contained only a few hundred data points.Each submodel was then evaluated and ranked by its performance in descending order.All submodels with high variance inflation factor and residual heteroskedasticity were discarded to avoid feature multicollinearity and model autoregression.Eventually, IAP would find the best valid combination (sub-model) of the three input features, and in turn, the most important feature to estimate indoor BC.The model structure has been described in detail in previous studies (Fung et al., 2020(Fung et al., , 2021)).
In the second stage, the coefficients obtained from the mathematical procedures were further optimised by manually correcting the order of the variables to the nearest integer or half-integer.In this way, the models could be transferrable and replicable in other indoor environments.The model was then run again to get the modified coefficients.The difference in the model performance before and after the treatment was then compared.The indoor BC model in all six scenarios would be in the form below: where  eBC is the estimated indoor black carbon concentration; x1, x1, and x3 denote the three input variables; a, b, c, and d are the coefficients in the model; ε is the error of the model.

eBC Concentration Levels in German Households
During the 500 measurement days, the mean eBC mass concentration was 0.6 ± 0.9 µg m -3 .eBC indoors accounted for only a small part of the indoor particle mass concentration.The measured mean indoor PM1 level was 10.4 µg m -3 , with eBC accounting for about 6 %.Indoor eBC mass concentrations in real European households have rarely been reported.Isaxon et al. (2015) showed a one-week measurement of indoor eBC in Sweden, where the concentration varied between 0.03 µg m -3 and 0.80 µg m -3 .Aquilina and Camilleri (2022) carried out measurements for three months in a house in an urban area on the island of Malta, and determined a mean indoor eBC concentration of 1.00 µg m -3 .Baxter et al. (2007) and Isiugo et al. (2019) reported comparable indoor eBC levels to ours in US homes, with mean black carbon concentrations of 0.5 µg m -3 and 0.3 µg m -3 , respectively.Sun et al. (2019) reported the data of long-term observations of the German Ultrafine Aerosols Network (GUAN), according to which the mean eBC value in the urban background in Germany is about 0.9 µg m -3 .The outdoor eBC concentration is much higher than the indoor concentration in the homes we studied.However, it is important to note that indoor eBC mass concentrations show large variations, with the 1 st and 99 th percentiles of eBC mass concentrations at 0.05 and 2.93 µg m -3 , respectively.Furthermore, the measured indoor eBC mass concentration shows large variations between individual households and at different times of the year.Fig. 1 shows the boxplot of the eBC mass concentration in individual households during the cold and warm seasons.The eBC mass concentration was significantly higher in the cold season (mean ± sd = 0.8 ± 1.2 µg m -3 ) than in the warm season (mean ± sd = 0.4 ± 0.3 µg m -3 ).Overall, considering the lower ventilation rates in the cold season, the contribution of outdoor air is likely to be less significant, indicating strong contributions of indoor combustion sources to black carbon.Among all households, 16 cases have data available for both the cold and warm seasons.Although for some households the difference between the median eBC concentration in the cold and warm seasons is not significant, most households show a higher 75 th percentile in the cold season.One exceptional case is house B7, where the eBC concentration is significantly higher in the warm season than in the cold season.This could be due to occasional contamination from outdoor sources, as the concentration of outdoor particles also increases significantly at the same time.

Indoor Sources Contribution to BC Concentration
As shown in Table 1, the eBC mass concentration at the beginning of most activities was about 0.5 µg m -3 , which can be considered as a background concentration.15 minutes after the start of the activities, the concentration increased to about double.On average, the eBC mass concentration increased by 2.0 µg m -3 at peak time for all the activities.Of all activities, burning candles and using fireplaces showed the strongest increase with 0.9 and 2.0 µg m -3 , respectively.Peak concentrations even rose up to 2 and 11 µg m -3 , respectively.In addition, there is a strong increase in concentration at peak times during baking and frying.The strength of emissions during cooking is highly dependent on factors such as the food ingredient, the type of oil, and even the cooking temperature (Abdullahi et al., 2013).With longer cooking times, food can be expected to be "burned" more easily, releasing more soot and other harmful compounds such as heterocyclic aromatic amines (Gibis et al., 2015;Peng et al., 2017).

Correlation of explanatory features with indoor eBC
With the measured indoor eBC data and pre-processed particle number and mass concentrations, model development for indoor BC proxy under the procedure described in Section 2.3.The first step was to examine the Pearson correlation coefficient of the output variables with the input characteristics.Fig. 2 illustrates the Pearson correlation (r) of indoor black carbon concentration the boxplot of the eBC mass concentration in individual households during the cold and warm seasons.The eBC mass concentration was significantly higher in the cold season (mean ± sd = 0.8 ± 1.2 µg m -3 ) than in the warm season (mean ± sd = 0.4 ± 0.3 µg m -3 ).Overall, considering the lower ventilation rates in the cold season, the contribution of outdoor air is likely to be less significant, indicating strong contributions of indoor combustion sources to black carbon.Among all households, 16 cases have data available for both the cold and warm seasons.Although for some households the difference between the median eBC concentration in the cold and warm seasons is not significant, most households show a higher 75 th percentile in the cold season.One exceptional case is house B7, where the eBC concentration is significantly higher in the warm season than in the cold season.This could be due to occasional contamination from outdoor sources, as the concentration of outdoor particles also increases significantly at the same time.

Indoor Sources Contribution to BC Concentration
As shown in Table 1, the eBC mass concentration at the beginning of most activities was about 0.5 µg m -3 , which can be considered as a background concentration.15 minutes after the start of the activities, the concentration increased to about double.On average, the eBC mass concentration increased by 2.0 µg m -3 at peak time for all the activities.Of all activities, burning candles and using fireplaces showed the strongest increase with 0.9 and 2.0 µg m -3 , respectively.Peak concentrations even rose up to 2 and 11 µg m -3 , respectively.In addition, there is a strong increase in concentration at peak times during baking and frying.The strength of emissions during cooking is highly dependent on factors such as the food ingredient, the type of oil, and even the cooking temperature (Abdullahi et al., 2013).With longer cooking times, food can be expected to be "burned" more easily, releasing more soot and other harmful compounds such as heterocyclic aromatic amines (Gibis et al., 2015;Peng et al., 2017).

Correlation of explanatory features with indoor eBC
With the measured indoor eBC data and pre-processed particle number and mass concentrations, model development for indoor BC proxy under the procedure described in Section 2.3.The first step was to examine the Pearson correlation coefficient of the output variables with the input characteristics.Fig. 2 illustrates the Pearson correlation (r) of indoor black carbon concentration with different variables during the measurement period.The linear correlation of indoor black carbon with room volume (Vol), indoor temperature (T), indoor relative humidity (RH), and indoor CO2 were relatively low (|r| < 0.3) in all six scenarios.Note that as the ventilation rate was estimated based on measured indoor CO2, it is not included as a variable in the proxy model.
The eBC correlation with the total indoor particle number concentration was moderate (0.32 < r < 0.54), with a stronger correlation with PN of accumulation mode (0.66 < r < 0.83) and weaker with particles of smaller size (-0.03< r < 0.44).As for their mass concentrations, the eBC correlation was higher with indoor PM1 (0.69 < r < 0.87), followed by PM2.5 (0.66 < r < 0.86) and then PM10 (0.47 < r < 0.67) in all the six scenarios.In general, the eBC correlation of indoor PM1 had a stronger correlation than outdoor PM1 (0.28 < r < 0.73).eBC in the two burning scenarios correlated better with CO2 and PNait but worse with outdoor PM1 than in the non-burning scenarios.This is plausible because the contribution by indoor primary emission outweighed that by the particle transport from outside.

Evaluation of the indoor BC proxy
The optimised indoor BC proxy equations for all six scenarios are listed below (corrected to 2 decimal places): where  eBC (µg m -3 ) is the estimated indoor equivalent black carbon mass concentration.The input variables include PM1_out, PM1_in, PM2.5, and PM10 (µg m -3 ), which are the mass concentration of particulate matter smaller than 1 µm (outdoors), 1 µm (indoors), 2.5 µm, and 10 µm, respectively.PNait and PNacc (m -3 ) are the particle number concentration of the Aitken mode and accumulation mode, respectively.RV (m -3 ) and RH (%) are room volume (indicating the room characteristic) and relative humidity (representing meteorological conditions), respectively.
Table 2 entails the parameter coefficients of the developed models in the six scenarios.The best three input parameters to estimate indoor eBC.In all the scenarios, either indoor PM1 or PM2.5 is selected to be one of the three input parameters by IAP because they both have an equally high correlation with indoor eBC.Although indoor PM1 had the highest correlation with indoor eBC in all the scenarios, if we consider their multiple interactions (Table 3), it is still possible that PM1 is not one of the most important variables, for example in the "warm non-burning" scenario.This is because indoor PM1 might have a strong correlation with another input parameter, such as PM2.5 in this case, which would generate a strong multicollinearity when both are considered.In this example, PM2.5 was selected as one of the parameters over PM1 in a non-burning scenario, which is plausible as emissions in burning scenarios (dominated by indoor cooking) account much more for PM2.5 than eBC, as demonstrated in Aquilina and Camilleri (2022).In addition, outdoor PM1 is also included as one of the best three input variables in most scenarios, except for "cold burning".Under burning scenarios, indoor eBC mostly comes from the indoor combustion source rather than transport from outside.
Despite the generally low indoor eBC correlation with home characteristics and meteorological conditions, they occasionally ranked among the top three important input variables.For example, in scenarios "cold others" and "warm burning", room volume was indicated as an important input variable because this data source is totally independent of other air pollutants, which might moderately correlate with each other.As the selection of input parameters and the model accuracy differed in different scenarios (varying seasons and activity classes), it demonstrated that developing separate models in different scenarios is needed.
Fig. 3 illustrates the scatter plots of the estimated eBC with optimised coefficients versus measured indoor eBC in all six scenarios.The results before and after rounding off coefficients are similar (Table 2).In scenarios classified as "others", the models with optimised coefficients could explain a similar portion of the variability of the indoor eBC (R 2 from 0.62-0.78 to 0.63-0.77).The rounding-off procedure appeared to influence more on the burning and non-burning scenarios such that R 2 dropped from 0.61-0.77to 0.49-0.76.Because of the similarity of the results before and after the optimisation, we only discuss the plots of optimised values in the rest of the paper for simplification.Regardless of the number of data points, they all except the "warm burning" scenario obtained satisfactory to good coefficient of determination (0.63 < R 2 < 0.77) with high Pearson coefficients of correlation (0.80 < r < 0.88).This satisfactory to good accuracy for estimating indoor eBC is comparable to the results conducted for ambient eBC concentrations (Fung et al., 2024).The exception "warm burning" had a relatively low accuracy (R 2 = 0.49) presumably because the particles emitted through burning processes during the warm season were not well captured by the measured aerosol variables.The correlations of indoor eBC and all the other aerosol metrics in the "warm burning" scenario are slightly lower than in other scenarios, which indicates that there might potentially be other physical and chemical processes governing the level of indoor BC concentrations.Another reason might be that the burning activities were quite diverse: from domestic cooking to the use of fireplaces.The size fractions of eBC from food "burning" and wood combustion would be different; therefore, the other burning scenario "cold burning" also had a slightly lower R 2 as compared to "non-burning" scenarios.That being said, most of the estimated data fall along the 1:1 line.The distributions of data points are more scattered in the scenarios classified as burning or non-burning than the ones classified as others due to the number of data points.

Model limitation
This model IAP has limited the number of input parameters to a maximum of three.This is considered to be a trade-off between model accuracy and practicality.While increasing the number of input parameters could improve the model's accuracy, it also jeopardises the possibility of missing data.By default, a regression model would eliminate any cases with missing data on any of the variables.This would substantially reduce the sample size and the model's statistical power.In other words, traditional regression models would result in non-responses as one or more input variables are not available, but IAP still manages to select the best available input variables to complete the estimation process.Likewise, the fine-tuning procedure of the model coefficients is also a trade-off between model accuracy and practicality.The model could lead to a better result in terms of higher R 2 , but the coefficients of the input parameters would be a number with infinite decimals, which are impossible to explain by physical properties.
Furthermore, as this study focuses on the contribution to indoor eBC, one of the most relevant factors would be outdoor eBC and should have been included as an input parameter.However, during the measurement, the eBC concentration was only measured indoors in the 40 households.And the measurement of ambient eBC can only be obtained from government air quality monitoring stations, which are not located close to each household.The distance of the eBC reference station from some households could be greater than orders of kilometres.As ambient BC concentrations can be strongly influenced by local sources, the regional background may not be representative of real-time BC concentrations in the vicinity of households.To eliminate this uncertainty, we decided not to include outdoor eBC in the model.

CONCLUSION
Through the large-scale "indoor and outdoor air pollution project" (Zhao et al., 2019), we were able to evaluate and capture the representative indoor BC concentration in real-life residential environments in Germany.Results filled the gap of knowledge on the temporal variation and source contribution of indoor BC in multiple real-use homes.Although the mean outdoor concentration in the urban background was much higher than indoors of our studied homes, the large concentration deviation in each home indicates the strong influence of indoor sources under different living habits.This also explains the higher indoor BC concentration in the cold season than in the warm season.In the cold season, the contribution of outdoor air is lower due to the lower ventilation rates, while more indoor combustion sources such as burning candles and using fireplaces occur.Apart from these two activities, other common indoor activities can also increase indoor BC concentrations, including opening windows, baking, frying, toasting, other cooking activities, and vacuum cleaning.Nevertheless, the combustion sources have significantly higher contributions.
Together with the measured particle mass concentrations (PM10, PM2.5, and PM1) and number size distributions in the "indoor and outdoor air pollution project", a proxy model was developed to predict indoor BC concentrations based on commonly measured indoor air pollutants in this work.For a more accurate prediction covering typical indoor scenarios, we developed separate IAP models for warm and cold seasons, as well as burning, non-burning, and other activities.The resulting models were further optimised in such a way that the models could be transferrable and replicable in other indoor environments.In this study, we found that indoor eBC was best estimated by indoor and outdoor PM1.The selection of input parameters and the model accuracy slightly differed in different seasons and activity classes (R 2 = 0.49-0.78).This result demonstrates the need to develop separate models under different scenarios.One of the limitations of this work is the lack of outdoor eBC measurements near the households.Most of the model variables were chosen from indoor settings, as this work focused mainly on commonly measured indoor air pollutants and parameters.Nonetheless, the result of the study provides a reliable tool for predicting indoor eBC concentrations in different indoor scenarios where only particle number and mass data are available.
The results of this work provide further insight into the contribution of indoor and outdoor sources to indoor BC in residential environments.With the aid of the indoor BC proxy model developed in this work, more accurate prediction of indoor BC and the consequence exposure and health risk assessment can be achieved for households under similar climate conditions and residents' activity habits e.g., in Central Europe.

Fig. 1 .
Fig. 1. eBC mass concentration in the cold and warm seasons of each household.The boxplots show the median, 25 th and 75 th percentile.The diamond shape marks the mean concentration for both seasons.Ranking in the median concentration of each household.The household index refers to the households located in Leipzig (L1-L20) and Berlin (B1-B20).

Fig. 1 .
Fig. 1. eBC mass concentration in the cold and warm seasons of each household.The boxplots show the median, 25 th and 75 th percentile.The diamond shape marks the mean concentration for both seasons.Ranking in the median concentration of each household.The household index refers to the households located in Leipzig (L1-L20) and Berlin (B1-B20).

Fig. 2 .
Fig. 2. Heatmap for Pearson correlation (r) of a single variable (room volume, indoor T, indoor RH, indoor CO2, indoor PN of all size ranges, nucleation mode, Aitken mode, and accumulation mode, indoor mass concentration of PM10, PM2.5 and PM1, and outdoor mass concentration of PM1).

Fig. 3 .
Fig. 3. Scatter plots for indoor eBC versus multiple variables (models of tuning with rounding off orders of parameters).

Table 1 .
eBC mass concentra�on of each source group at different ac�vity �mes.

Table 2 .
The coefficients of the models with parameters before and after tuning with rounding off (upper and lower rows, respectively).The variables with tuning are indicated by an asterisk (*).

Table 3 .
Table showing the ranks of relative importance in different scenarios.The normalization was done by dividing the highest usage number.Rank Cold burning Rel imp Cold non-burning Rel imp Cold others Rel imp Warm burning Rel imp Warm non-burning Rel imp Warm others Rel imp