Yelim Choi, Bogyeong Kang, Daekeun Kim This email address is being protected from spambots. You need JavaScript enabled to view it.

Department of Environmental Engineering, Seoul National University of Science and Technology, Seoul, Korea

Received: September 22, 2023
Revised: January 26, 2024
Accepted: May 13, 2024

 Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||  

Cite this article:

Choi, Y., Kang, B., Kim, D. (2024). Utilizing Machine Learning-based Classification Models for Tracking Air Pollution Sources: A Case Study in Korea. Aerosol Air Qual. Res. 24, 230222.


  • A case study in Korea to classify air pollution sources using machine learning.
  • 91% accuracy achieved by random forest model.
  • Hydrogen chloride and acetaldehyde found as critical variables
  • Effective simplified random forest model enabled by nine variables.


Urbanization and industrialization pose significant challenges in promptly identifying and managing air pollution sources. The application of machine learning technology offers a promising solution to solve the issue. By analyzing multidimensional datasets containing a wide range of air pollutants, a machine learning approach has the potential to significantly improve air pollution management and facilitate source tracking. This study aims to comprehensively evaluate machine learning-based emission source classification models to provide insights into air pollution source tracking and management. Using 972 datasets consisting of five emission sources and 27 air pollutants, different classification models were implemented and subsequently compared: Random Forest (RF), Naïve Bayes Classifier (NBC), Support Vector Machine (SVM), Artificial Neural Network (ANN), and K-Nearest Neighbors (K-NN). The RF model was found to have better predictive performance than the other four models, achieving an accuracy of 0.9691 and a kappa value of 0.9537. Hydrogen chloride and acetaldehyde were the most important variables for classifying emission sources. The findings suggest the potential of machine learning techniques in addressing air pollution challenges, and the classifier model implemented in this study shows great promise for effective emission source identification.

Keywords: Machine learning, Emission sources, Air pollutants, Classification

Share this article with your colleagues 


Subscribe to our Newsletter 

Aerosol and Air Quality Research has published over 2,000 peer-reviewed articles. Enter your email address to receive latest updates and research articles to your inbox every second week.

77st percentile
Powered by
   SCImago Journal & Country Rank

2022 Impact Factor: 4.0
5-Year Impact Factor: 3.4

Aerosol and Air Quality Research partners with Publons

CLOCKSS system has permission to ingest, preserve, and serve this Archival Unit
CLOCKSS system has permission to ingest, preserve, and serve this Archival Unit

Aerosol and Air Quality Research (AAQR) is an independently-run non-profit journal that promotes submissions of high-quality research and strives to be one of the leading aerosol and air quality open-access journals in the world. We use cookies on this website to personalize content to improve your user experience and analyze our traffic. By using this site you agree to its use of cookies.