Long-term air pollution exposure impact on COVID-19 morbidity in China

Where, Y represents the event count at geographic unit i during a certain period of time, y denotes a realization of Y. λ is an event occurrence intensity parameter calculated using the total number of events divided by the total number of geographic units. In a Poisson process, it is assumed that the expected value and the variance of the response variable equal its mean, thus Var(Y) = E(Y) = λ (2) Nonetheless, since events are usually unevenly distributed across a region, some studies suggest the intensity parameter λ should be adjusted for this spatial heterogeneity, thereby allowing each geographical unit to have its own intensity parameter λi (Cameron and Trivedi, 1986; Beck and Tolnay, 1995; ). Under the conditional Poisson assumption, the probability that the ith area will experience y number of events is


INTRODUCTION
In December 2019, the severe acute respiratory syndrome coronavirus (SARS-CoV-2), a novel virus identified as the pathogen of 2019 novel coronavirus (COVID-19) pneumonia, was detected . As COVID-19 continued to take a heavy toll on countries around the world, the World Health Organization (WHO) declared it a Public Health Emergency of International Concern (WHO, 2020).
Recent virological assessments suggest that the pathophysiological mechanisms of SARS-CoV-2 are similar to that of SARS-CoV, where there is evidence to prove that both viruses predominantly infect airway and alveolar epithelial cells, vascular endothelial cells, and macrophages of humans Zhou et al., 2020). Some of the most common symptoms of COVID-19 are fever, cough, fatigue, and myalgia Goyal et al., 2020;Young et al., 2020).
According to several reports on the clinical characteristics of the patients affected by COVID-19 in China and the United States, patients receiving invasive mechanical ventilation or presenting more severe symptoms are more likely to be male, to have obesity or to have some coexisting diseases such as hypertension, diabetes, chronic obstructive pulmonary disease, coronary heart disease, and cerebrovascular disease. Goyal et al., 2020;Young et al., 2020).
Besides, a few retrospective cohort studies point out that other risk factors, including but not limited to older age (≥ 65 years old), smoking history, education level, and household income level, are likely to be associated with the clinical outcomes of COVID-19 (Engin et al., 2020;. China has suffered from extensive air pollution and subsequent severe outcomes in recent years. According to Song et al. (2017), in the year 2014 (2015, 2016), only 7% (14%, 19%), 17% (27%, 34%), 51% (67%, 70%), and 88% (97%, 98%) of the population lived in areas where the annual PM2.5, PM10, NO2, and SO2 levels were below the guideline values specified by the Chinese Ambient Air Quality Standard (CAAQS) Grade II standards. Another research conducted by Zheng et al. (2018) concludes that SO2, NOx, PM10, and PM2.5 emissions are predominantly generated from industrial activities (i.e., 60%, 38%, 57%, 50%, and 53% of total emissions), whereas the residential activities account for almost 40% of CO emissions. In addition to the primary pollutants, the intensive chemical reactions between oxides of NOx and volatile organic compounds (VOCs) contribute to the mass amounts of secondary PM and O3 products. Fortunately, since the year 2013, with the implementation of China's Clean Air Action as well as a series of other effective strategic plans (e.g., desulfurization of coal-burning power plant, restrictions on private vehicle ownership, development of electric vehicles) (Jin et al., 2016), the year 2017 had seen 62%, 17%, 27%, 38%, and 35% emission reductions of SO2, NOx, CO, PM10, and PM2.5, respectively, compared to the annual average emissions in 2010 (Zheng et al., 2018).
There are considerable epidemiological and pathophysiological evidence that link long-term air pollution exposure to many premature deaths and increased morbidity due to respiratory and cardiovascular diseases. A review by Hoek et al. (2013) summarizes the impacts of long-term exposure to PM2.5, PM10, nitrogen dioxide (NO2), and elemental carbon on all-cause mortality and mortality induced by cardiovascular disease and respiratory disease. Their pool effect estimation based on meta-analysis exhibits that a 10 µg m -3 increase of PM2.5 may lead to 6% (95% CI: 4-8%), 11% (95% CI: 5-16%) and 3% (95% CI: -6-13%) rise of the percent excess risk for all-cause, cardiovascular and non-malignant respiratory mortality, respectively. Further, Pope et al. (2004)'s research based on the data of Cancer Prevention Study II collected by American Cancer Society confirms that a 10 µg m -3 growth of PM2.5 is associated with 8% to 18% increases in death risk of numerous cardiopulmonary diseases, e.g., ischemic heart disease, dysrhythmias, heart failure, and cardiac arrest. Lipfert (2018) assessed 417 long-term studies on the association of morbidity with air pollution (exposure duration varies between 60 days and 35 years) and found that 224 of them report significant exposure-outcome relationships and 220 reveal adverse effects. Besides, Lipfert's study suggests that PM2.5, O3, NO2, and PM10 are more likely to cause cardiovascular, respiratory, neurological, and birth/pregnancy outcomes, respectively.
While substantial heterogeneity of the impacts that air pollution exerts on human health is found across various studies, there is a consensus that suspended fine particles, nitrogen dioxide, ozone, carbon monoxide and sulfur dioxide may incase the risk of acute respiratory disease, or expedite the development of atherosclerosis and cardiovascular disease through multiple pathophysiological pathways (Brook et al., 2004;Pope et al., 2004). Also, we found that the effect estimation of pollutants can be modified by a broad spectrum of factors, such as population characteristics, human behavioral patterns, methodology applied for exposure-outcome assessment, regional-specific variations, and confounding effect, etc. (Hoek et al., 2013;Lipfert, 2018;. Moreover, since the outbreak of COVID-19, several epidemiological studies have been initiated globally, utilizing various investigation methods such as cross-sectional analysis and time-series analysis. A research team found that a 1 µg m -3 increase of PM2.5 is associated with a 15% increase of COVID-19 mortality rate, as revealed by a cross-sectional analysis using county-level data in the United States . Also, Ogen (2020)'s spatial analysis shows that the hot spots of high-level NO2 concentrations correspond to areas with high fatality rates in Europe. Furthermore, Zhu et al. (2020) investigated the impacts of short-term air pollution exposure on COVID-19 infection, under the assumption that SARS-CoV-2 can be transmitted by aerosols and may pose an inhalation threat. Their results suggest a 10 µg m -3 increase of PM2.5, PM10, NO2 and O3 is associated with a 2.24% (95% CI: 1.02-3.46), 1.76% (95% CI: 0.89-2.63), 6.94% (95% CI: 2.38-11.51), 4.76% (95% CI: 1.99-7.52) increase of COVID-19 morbidity rate, respectively.
In a nutshell, the above observations support the hypothetical correlation between COVID-19 infection and prolonged air pollution exposure. However, to what extent does the long-term air pollution exposure affect COVID-19 morbidity is unclear. Since persisted exposure to various air pollutants can induce airway inflammation, worsen pulmonary function and gas exchange and cause cardiovascular dysfunctions (Grunig et al., 2014;Jiang et al., 2016;Contini and Costabile, 2020), in this study, we hypothesize that these mechanisms will also have detrimental impacts on the vulnerability of the population to COVID-19. Hence, the overarching goal of this study is to understand the potential relationships between COVID-19 morbidity count and long-term air pollution exposure in the prefecture level. Accordingly, we employed several statistical methods to examine the systemic effects of air pollution exposure on COVID-19 morbidity, and applied sensitivity analyses to verify our conclusions' robustness.

STUDY AREA AND DATA
Our study area includes 326 prefectures in mainland China (Fig. 1). A few prefectures in Tibet and Xinjiang were omitted from our investigation because some confounder variables are not available in these regions (e.g., census data, air quality data). As of April 21, 2020, 82,788 confirmed cases were reported by the Chinese Center for Disease Control and Prevention (China CDC), including the number of COVID-19 infections in Hong Kong, Macao, and Taiwan.
The COVID-19 morbidity data was obtained from the public reports published by China CDC. While the air quality data, consisting of PM2.5, PM10, NO2, SO2, CO and O3, was derived from national air quality monitoring site network of the Ministry of Environmental Protection (MEP) (MEP, 2020). Although MEP began to grant access to air quality data since 2013, the data from 1494 sites in 367 prefectures became publicly available until 2015. Moreover, since these monitoring stations are mandated not to be in the direct vicinity of noticeable emission sources, such as major thoroughfares, the data is believed to be a good indication of the general air quality level in each prefecture. The air pollution exposure level was therefore estimated by averaging the daily concentrations of PM2.5, PM10, NO2, SO2, CO, and O3 based on the monitored observations from 2015 to 2019.  To account for the effects of confounding factors, we collected the demographic information from 2010 census results, such as population size, percentage of male/female population, percentage of people over 65 years old, and percentage of residents with a college degree (NBS, 2012). Also, a few socio-economic variables that represent the income level (i.e., per capita GDP, average wage) of the whole population in different cities in year 2017, were obtained from China City Statistical Yearbook 2018 (NBS, 2018). Since medical condition and health risk factors can determine whether people are susceptible to cardiopulmonary diseases, we retrieved the prefecture-level counts of doctors from the Statistical Yearbook and derived the smoking population data from a health survey conducted by Wang et al. (2019) in 31 provinces in 2010. Moreover, we averaged the daily air temperature, daily dew point temperature, and daily wind speed for summer (June to August) and winter (December to February) during the period 2015-2019 using record temperature and wind speed data provided by National Centers for Environmental Information of the United States (NCEI, 2020), and weather variables from 397 ground stations were then aggregated into prefecture-level values for the analysis.
Furthermore, to elaborate on the influences of mobility on the spread of the disease, we collected the population outflow data of Wuhan city from January 1 to January 22 before the city's lockdown. The data was obtained from the Baidu migration server, an online platform that traces the daily travelers in and out of different cities using their digital footprints (Baidu Inc, 2020). Additionally, the migration server offers the intra-city mobility index that measures the intensity of traveling activities within different cities' administrative boundaries. We calculated two cumulative mobility indicators from the migration data: one is the total population migrated from Wuhan before January 23, and the other represents the cumulated intra-city movement in each prefecture before the lockdown.

METHODS
In this analysis, we fitted the negative binomial regression using prefecture-level morbidity count data as the dependent variable and prefecture-level long term air quality indicators as the exposure.
The negative binomial regression model (NB model) is one of the most popular regression methods applied for over-dispersed count data. The NB model can be considered as a generalized form of the Poisson regression model, despite that it supposes the conditional variance of the dependent variable is larger than its mean (Hilbe, 2011). The NB model allows one to describe a dependent variable using any number of independent variables and can be written as where µ(x) is the expected value of the response, βi is the estimated coefficient for the explanatory variable xi and can be derived through a maximum likelihood approach. β0 denotes the intercept, and ε is the residual term. More details about the NB model can be found in supplementary materials.
We excluded the Wuhan city from our samples since Wuhan was the epicenter of the outbreak and had witnessed overwhelmed COVID-19 cases. That said, the inclusion of the confirmed cases in Wuhan might over-represent the real risks of chronic exposure to ambient air pollution. Moreover, a few covariates were introduced into the model to adjust for the confounding effects of our model estimation. The selection of confounders was based on the existing literatures that report causal influences of extraneous variables on both exposure and response (Hoek et al., 2013;Lipfert, 2018;.
Our NB regression model is therefore formulated as The definition of independent and dependent variables can be found in nomenclature (Table 1).
Denote Log E(Ci) as the log transform of confirmed cases in prefecture i, hence, for each unit increase in a predictor, the expected log count of the COVID-19 infection cases is changed by the respective regression coefficient, holding other predictors as constants. The COVID-19 incidence rate ratio (IRR) is a natural exponential function of the regression coefficient βi (expβi), which can be interpreted as the relative increase of COVID-19 morbidity rate associated with one unit increase of the pollutant (see supplementary materials). The impacts of five air pollutants on COVID-19 morbidity were examined separately using 5 single-pollutant models, because previous studies indicate air pollutants can react with each other and their concentrations are often highly correlated Chen et al., 2020), thereby inducing serious multicollinearity problems that may inflate the estimation of regression coefficients.
We carried out several sensitivity analyses to verify the robustness of our regression outcomes. First, we excluded all prefectures in Hubei province since their daily counts of COVID-19 were remarkably higher than others. Second, we computed the biasing effect of confounding on point and confidence interval estimates by individually omitting the following variables from our regression model: weather factors, income level, percent of population over 65 years old, number of doctors in each prefecture, and mobility indices. Third, we modelled the exposureoutcome relationship using the Poisson regression model as an alternative to evaluate the impact of modelling choice. Finally, instead of single-pollutant models, we employed the two-pollutant models to examine the significance of pollutants, in which two air pollutants were incorporated into the regression model to account for their covariance. All analyses were performed with R software, version 3.6.2.

Characteristics of the COVID-19 Morbidity Distribution and Mean Air Pollution Levels in China
A total of 29,936 confirmed cases were reported in 326 prefectures (excluding Wuhan) up to April 21, 2020. In Fig. 1, we plotted the geographic distributions of all cases registered in our study area and found that most COVID-19 patients were either located in the east coast or in areas that shared geographical proximities with Wuhan (e.g., Chongqing, Zhengzhou, and other cities in Hubei province). Similarly, people experienced the burden of ambient air pollutions Intra-city movement before January 23 disproportionately (Fig. S1), with most of them living in cities that encompassed a large portion of manufacturing industries (e.g., Beijing-Tianjin-Hebei region). The mean and standard error for the five-year air pollution concentration levels are reported in Table 2, of which the average concentrations of PM2.5, PM10, O3, SO2, NO2, and CO were 43.53 µg m -3 , 77.68 µg m -3 , 18.56 µg m -3 , 29.39 µg m -3 , 60.04 µg m -3 , and 0.95 mg m -3 , respectively. The thresholds for annual mean concentrations of PM2.5, PM10, and NO2 were specified as 10 µg m -3 , 20 µg m -3 , and 40 µg m -3 , respectively by WHO in its air pollution risk assessment report (only short-term concentration thresholds are available for O3, SO2, and CO, so we did not include them here), to mitigate detrimental health impacts on humans (WHO, 2006). Based on the fiveyear air quality observations, almost all air quality indicators had exceeded WHO's guideline values for individual substances. Therefore, we believed the deteriorated air quality had already posed significant threats to the well-being of the population in China.

Examination of Confounding Effects
We omitted several confounders individually from the predictor set to elucidate their impacts on coefficient estimation. As demonstrated in Fig. 2, the effect estimation results remained consistent with our main analysis after separately controlling for various socio-economic, meteorological and demographic factors. However, interestingly, we observed a 1 µg m -3 increase of PM2.5 and NO2 would lead to much lower COVID-19 morbidity (IRR for PM2.5 was 1.01, 95% CI: 1.00-1.02, and IRR for NO2 was 1.03, 95% CI: 1.01-1.05) when historical weather data was omitted from the analysis, which suggested weather variables were strong confounders that could markedly alter our statistical inferences. Another disparity was found when prefectures in Hubei were removed from the sampling dataset. The IRR changes associated with PM2.5, PM10, and NO2 moderately reduced from 1.95%, 0.55% and 4.63% to 1.43%,0.47% and 4.09% when high leverage daily counts were excluded, implying the impacts of influential outliers. Further, our results indicated that the magnitude of regression coefficients was somewhat confounded by the presence of mobility variables, specifically the outflow population from Wuhan before the city's lockdown.

Regression Analysis Results for Two-pollutant Models
The two-pollutant models illustrated the exposure-outcome relationship when two pollutants were synthetically considered in our regression analysis ( Fig. 3 and Table S2). Specifically, the association between PM2.5 and COVID-19 morbidity count was positive and significant, except when adjusting for NO2. For PM10, a 1 µg m -3 increase of the particulate matter was often associated with minor increase of COVID-19 morbidity, however, this relation was reversed when NO2 and PM2.5 were added to the model. The effect of NO2 was amplified and its correlation with COVID-19 morbidity remained significant in the presence of all the other pollutants. Comparing to the single-pollutant model result, the negative correlation between O3 and COVID-19 morbidity in the two-pollutant model retained its statistical significance, with little fluctuation of regression coefficients. Not surprisingly, when adjusting for other pollutants, we found generally positive but insignificant correlations between SO2, CO and COVID-19 morbidity.

Examination of Modeling Choice
Our statistical analysis showed that the prefecture-level morbidity count data had a variance of 99332.13 and a mean value of 91.83 (Table 2), and the estimated overdispersion parameter for the quasi-Poisson regression was 108.46 in the single-pollutant model for PM2.5, all of which suggested a compelling sign of overdispersion.
Additionally, we regressed the COVID-19 morbidity counts on air pollution exposure using the Poisson model, and the results indicated that the Poisson model tended to inflate the regression  (Table S1 and Fig. 2). The regression outcomes also showed the Akaike information criterion of the Poisson model (AIC = 22294) was considerably larger than the AIC of the NB model (AIC = 2906), which proved that the NB model was a better fit to our data (Table S3).

DISCUSSIONS
Although the existing research has revealed that the long-time exposure to ambient air pollution increases the death risk of COVID-19, our analysis offers scientific understanding to demonstrate that people are more susceptible to COVID-19 if they are chronically exposed to excessive air pollution.
Among all air pollutants that are commonly monitored in Chinese cities, PM2.5 and NO2 are found to be strongly positively correlated with the incidence of COVID-19. This inferential conclusion agrees with many previous studies that seek to attest the statistical relationship between exposure and disease morbidity. Nevertheless, our analysis results show that human O3 exposure is negatively correlated to COVID-19 morbidity. There have been many conflicting conclusions on how O3 exposure affects the occurrence of cardiac and respiratory disease. For instance, a parallel study performed by Wong et al. (2002) shows that while short-term human O3 exposure level is positively associated with daily hospital admissions for cardiovascular disease in London, a negative association is reported in Hong Kong. Consequently, they postulate that Hong Kong residents are more adapted to ozone exposure because their diet contains a high level of antioxidants and air conditioners are often used in compact, poor-ventilated spaces. On the other hand, seasonal variations of the ozone level may have distinct impacts on human health. A cohort study conducted in the Midwest and the Eastern United States demonstrates a negative but nonsignificant association of O3 concentration with mortality (Dockery et al.,1993), while a re-analysis of the data reveals a positive and significant exposure-mortality correlation during warm seasons (Krewski et al., 2003).
Another interesting fact that SO2 and CO do not share significant associations with COVID-19 morbidity may be attributed to the declining SO2 and CO emissions in China. This speculation is well supported by empirical observations that find weak correlations between SO2, CO exposure and cause-specific mortality when people are exposed to moderate SO2 and CO concentrations (Townsend and Maynard, 2002;Filleul et al., 2005;Beelen et al., 2008;Dong et al., 2012).
There are a few limitations worth noting in this study: First, the use of observational data from weather stations and pollution monitoring stations may introduce biases since it only demonstrates the small-scale meteorological and air pollution dynamics. Instead, satellite data may provide credible estimates concerning the area-wide air quality levels and weather variables. Second, we recognize that our scientific findings may only accumulate to the epidemiological knowledge of COVID-19 within a restricted geographic context, that said, the statistical inferences may vary with widely varying environments in different countries, being altered by their economic status, long-term air pollution exposure levels, etc. Third, giving that the annual mean concentrations of some pollutants are negatively or insignificantly correlated with COVID-19 morbidity, we believe that more importance should be attached to the specification of long-term air pollution exposure. Presumably, for the long-term effects of air pollution, the cumulative dose, seasonal-specific indicators, or composite air pollution indicators may be more decisive in influencing the outcomes of COVID-19. Fourth, our effect estimation could potentially suffer from unmeasured confounding errors. For example, many lifestyle and health risk factors (e.g., body mass index) are often not measured in China at the prefecture level. This means additional analyses might be needed to verify whether unmeasured confounding creates bias to the analysis. Last but not least, the most important caveat of this study is the ecological nature of the analysis. For this reason, our interpretation of the findings should not be extrapolated to individual levels. Besides, since correlational studies often bear endogeneity problems, it is not legitimate to deduce a causal relationship between long-term air pollution exposure and COVID-19 morbidity unless many more credible laboratory evidences become available.

CONCLUSIONS
Overall, our study indicates that NO2, PM2.5, PM10, and O3 have the strongest correlation with the infection risk of COVID-19. We estimate that a 1 µg m -3 decrease of the prolonged annual average concentrations of NO2, PM2.5, PM10 and O3 are associated with 4.63%,1.95%, and 0.55% decrease and 2.05% increase of infection cases. Moreover, our results remain insensitive to adjustment of confounding variables and to different types of regression models applied. Though we have not found any evidence that supports a significant association between SO2, CO and the morbidity of COVID-19, future investigations may be needed to lay insights on the underlying mechanisms of their interactions. Moreover, we believe it is necessary to incorporate laboratory investigations to reveal further how long-term air pollution exposure affects the severity of COVID-19. And finally, we argue that although mandatory temporary control of air pollution emissions can be an effective measure to contain COVID-19 infections, persistent pollution reduction strategies that are executed regularly are much more needed to abate severe consequences of a pandemic of such scale in the future.