Machine Learning Applications to Dust Storms: A Meta-Analysis

11 12 Dust storms are natural hazards that affect both people and properties. Therefore, it is 13 important to mitigate their risks by implementing an early notification system. Different methods 14 are used to predict dust storms, such as observing satellite images, analyzing meteorological data, 15 and using numerical weather prediction model forecasts. However, recent studies have shown 16 that machine learning algorithms have higher capacities to predict dust storms in less time and 17 with fewer processing operations compared to numerical weather models. This paper conducted a 18 meta-analysis review to examine studies that addressed the areas associated with the application 19 of machine learning to dust storm prediction. It aims to compare the applied models and the types 20 of data used in the literature under study. Given that the location of a dust storm event is 21 essential, the properties of dust storms are discussed in relation to the region. The output classes 22 and the various performance metrics observed in each reviewed paper are also summarized. 23 Subsequently, the present paper offers a detailed analysis highlighting the capabilities of machine 24 learning models in predicting dust storms. The analysis shows two main categories: early 25 detection and dust storm prediction. Most models used for dust storm early detection from 26 satellite images are support vector machines (SVM). In contrast, the most used models for dust 27 storm prediction are SVM and random forests that predict the occurrence of dust storms from 28 meteorological data. Finally, the paper highlights the challenges and future trends in the field, 29 illustrating the potential directions for applying deep learning algorithms and providing long-30 range predictions with assessments of dust storm duration and intensity.


INTRODUCTION
Dust storms are natural hazards that have harmful impacts on human health.Additionally, they reduce atmospheric visibility, leading to frequent traffic accidents.They are also damaging to agriculture and can negatively impact economic activities (Notaro et al., 2013).Generally, dust storms result from strong winds blowing soil particles up from the earth's surface kilometers high into the atmosphere (Squires, 2001).Most occur in arid and semi-arid areas where pressure gradients and dry, loosely packed soil particles cause instability in the lower atmosphere (Zhang et al., 2017).The terms "dust storm" and "sandstorm" are frequently used interchangeably, as the difference between them is almost negligible (Shepherd et al., 2016).Some researchers distinguish between a sandstorm and a dust storm based on the difference in the particle size of the soil.The phenomenon is called a sandstorm if the soil particle size is in the range of 0.6-1 mm, while it is said to be a dust storm if the particles are smaller than 0.6 mm.The most common type of storm that occurs in deserts is the dust storm, in which clay and silt particles that are up to 0.5 mm in size are transported by the wind (Warner, 2009).
Since dust storm prediction is an urgent challenge in today's world, machine learning has opened various opportunities for studies in this field.A paper by Li et al. (2021) reviewed a wide range of studies on dust storm detection using multispectral sensors.The paper covers many dust storm detection algorithms, such as those based on machine learning and empirical physical methods.However, the paper focused on data collected only from multispectral sensors, while our current study provides a detailed review specifically for applying machine learning algorithms to various sources, such as meteorological data from weather stations and satellite data.The two main approaches to collecting data about dust storms are ground observations, which is the direct way of monitoring dust storms in a small area, and using data acquired from satellite remote sensors to observe broad areas (Akhlaq et al., 2012).

A C C E P T E D M A N U S C R I P T
solutions.The main idea of using machine learning is to identify a common pattern in the problem space to enhance the ability of the machine to detect, classify, and predict future changes (Lecun et al., 2015).
Machine learning is used to predict different weather phenomena, such as hurricane trajectory and intensity (Boussioux et al., 2020), using novel multimodal combined gradientboosted trees, encoders, convolutional neural networks (CNN), and transformer components.In addition, machine learning is used in air-quality forecasts (Lin et al., 2018), short term precipitation prediction (Chen and Wang, 2021), and for improving precipitation prediction in numerical weather prediction (NWP) systems (Singh et al., 2021).
Therefore, machine learning provides the potential to understand past patterns of dust storms to predict future events.Many studies have applied machine learning algorithms to dust storm detection and prediction, including artificial neural network (ANN), support vector machine (SVM), random forests, CNN, logistic regression, and naïve Bayes (Kh Zamim et al., 2019;Lee et al., 2021;Nabavi et al., 2018).
The present paper aims to conduct a comprehensive review of all sub-areas of machine learning in the study of dust storm challenges.The challenges are categorized into dust storm prediction and dust storm early detection methods.Through a meta-analysis, we explore and review the extent to which the application of machine learning methods in previous studies has improved dust storm prediction.The paper is structured as follows.Section 2 outlines the methodology.Section 3 provides the results for evaluating machine learning applications to dust storms.Section 4 offers an elaborate discussion of the results.Section 5 concludes the paper with future directions.The studies' selection was based on the alignment with the review goal to select papers that apply machine learning in dust storm prediction.This effort resulted in 31 papers.We reviewed and conducted a meta-analysis of these papers.The following data are obtained from each article: addressed challenges, machine learning-applied model, event location, sources, and types of data were used, classes were used to classify the outputs, and performance metrics were selected to validate the method.

RESULTS
A simple statistical analysis was conducted to explore the current status and trends using the above data.Table S1 presents the details of each paper reviewed, including the research citation, study target, data source, machine learning methods, and prediction type.With the availability of satellite data and global coverage, most of the research was conducted on dust storm detection to detect dust aerosol in real time or near-real time from remote sensing images with different spatial and temporal resolutions.The main challenge was to provide the detection with reasonable processing times and running speeds.Some papers provide algorithms to detect dust aerosol by classifying dust storm features based on a threshold of the pixel's brightness temperature (Shahrisvand and Akhoondzadeh, 2013;Xiao et al., 2015).The threshold-based method is related to the physical properties of dust aerosols, which vary between different regions.Machine learning show better performance when compared to Thermal Infrared Integrated Dust Index especially over brightness surfaces as it can distinguish between dust storm and sandy land (El-ossta et al., 2013).

A C C E P T E
A study by Shi et al. (2019) explored new combinations of bands to enhance the dustdetection method compared with traditional threshold-based methods, such as the normalized difference dust index (NDDI) and brightness temperature difference (BTD).The enhanced dust storm detection method can distinguish dust features from bright surfaces, such as deserts.Differentiating between dust and cloud is also a challenge addressed by different studies (Hou, Guo, et al., 2020;Jiang et al., 2022;Ma et al., 2015;Qing-dao-er-ji et al., 2020;Wang et al., 2022).Ma et al., (2015) used transfer learning, where the model trained first on dust sources and then applied to samples from the place where the dust crossed over.Using the geometric information of satellite remote sensing data also enhances the ability to detect dust, as proposed by Hou et al. (2020), because dust particles are non-sphere particles.Their reflections differ based on the image acquisition angle, such as the solar zenith, view zenith, and solar azimuth.
Improving the detection method at night is addressed by study (Berndt et al., 2021), they used GOES-16 data and then train the random forest model on detecting dust and none dust pixels.

Methods used for dust storm predictions
Dust storm prediction techniques are commonly used for early warning applications (Akhlaq et al., 2012).Early work on automatic dust storm prediction was conducted by Lu et al. (2006) who used the SVM method to present a dust storm forecasting model.When categorizing dust storm cases, the majority are classified as no-occurrence cases.Studies were conducted to consider a larger number of rare cases of dust storms by applying algorithms based on SMOTE (Xie et al., 2015;Zhang et al., 2015).The majority of the models built were based on a historical training dataset, which comprised meteorological data collected over the previous years, to generate forecasts of up to 24 hours to provide a daily prediction (Ali et al., 2019;Murayziq et al., 2017;Shaiba et al., 2018), while in Iranmanesh et al. (2017), they provide a daily prediction and then calculate the frequency of dust occurrence in the next 15 days.Tiancheng et al. (2019) employed the more enhanced approach of using satellite cloud images of dust to analyze the intensity of the dust storm and combined that with meteorological data to predict dust storm occurrence and intensity.They used two machine learning techniques to predict dust storm occurrence: CNN to study the influence of atmospheric motion from

A C C E P T E D M A N U S C R I P T
satellite cloud images and a naïve Bayesian classification algorithm to predict dust storms from meteorological data.The improved naïve Bayesian-CNN classification algorithm introduced in the study shows higher accuracy than when each algorithm is used alone.Two studies (Ebrahimikhusfi, et al., 2021a;Ebrahimi-khusfi, et al., 2021b) used an enhanced vegetation index, such as the moderate-resolution imaging spectroradiometer MODIS satellite, and meteorological data to find their impacts to predict dust storms temporal variations for the warm and cold months using various machine learning methods.The two studies were conducted in the semi-arid region of Iran, and the results show that an enhanced vegetation index impacts dust storm prediction during the warm months.
Another study predicted dust storm frequency seasonal and annual and its association with air temperature and participation in 25 stations over Pakistan (Dar et al., 2022).Recent studies by Aryal (2022aAryal ( , 2022b) also developed a machine learning model based on air temperature and participation; however, the prediction was for monthly and seasonal fine and coarse dust (PM2.5, PM10).The findings of these two studies demonstrate that dust storms can be predicted based on limited climatic data with high accuracy.
Satellite images have many types related to the remote sensor technology that is used within each satellite.Some types are visible images, thermal infrared images, microwave images, and radar images (Li et al., 2021).Satellite images have three basic resolutions: spatial, spectral, and temporal.Spatial resolution is the pixel size measured on the ground.Spectral resolution is the number of bands that the remote sensor can capture.The temporal resolution is for the time of the satellite to revisit the same area.
Following the above, the approach of predicting dust storms is based on detecting dust particles from the image, and the researchers investigated the suitable bands that capture the dust, where each spectral band captures a different wavelength range (Li et al., 2021).However, this

A C C E P T E D M A N U S C R I P T
MACHINE LEARNING APPLICATIONS TO DUST STORMS 9 threshold-based method is related to the physical properties of dust and how it reflects light.
Therefore, the thresholds are computed using different bands to find a more suitable approach for the study area.Furthermore, the thresholds could vary between different locations because other factors, such as the cloud, water bodies, or whether the dust needs to be detected over desert or ice, affect these bands.
There are different processing levels of these satellite images that are usually provided as separate products.Level 0 is for raw telemetry data.Level 1 presents more data after it processed and applied radiometric and geometric corrections.Level 1 is most used for the threshold-based method, while the more advanced processing Level 2 is where the data receive more correction and different products are produced.An example of Level 3 data is the MODIS Aerosol Optical Depth Product (AOD).Processing the data for advanced levels requires more time and computational capabilities, and it may take days to get the final product such as AOD, which could delay dust storm detection.
Satellite images are used mainly for short-term forecasts; another way is to use NWP model forecasts and reanalysis products that can be used for long-range periods.NWP model is the third approach to monitoring and predicting dust storms involves simulation and numerical model studies (Aryal, 2022a(Aryal, , 2022b;;Nabavi et al., 2018;Xiao et al., 2015).The NWP model uses computer models to forecast and process the current weather state and to forecast the future state of the weather.The implementation of such a model depends on studying the physical parameters associated with the dust cycle (Cuevas agulló, 2013).In general, the model must include all the information needed to simulate the dust storm process, starting from dust emission to dust transport and, finally, the dust's dry or wet deposition, such as the NOAA HYSPLIT model and ERA-5 (Cuevas agulló, 2013).

A C C E P T E D M A N U S C R I P T
MACHINE LEARNING APPLICATIONS TO DUST STORMS 10

Event Location
Event locations reported in the literature include China (including Ningxia and the northwestern provinces), the southwestern United States, Texas, Mexico, the Middle East, East Asia (including the Gobi and the Taklimakan deserts), Central Asia (including the Aralkum desert), West Asia, Saudi Arabia (including Riyadh, Jeddah, and Dammam), the Arabian Desert, Iraq, Cairo, coast off North Africa, Pakistan, Iran and some studies used data collected globally.
In some of these works, the data collected for certain events ranged from 3 to 41 events for dust storm detection, with a spectral resolution of around 1 x 1 km.For dust storm prediction, they used meteorological data that ranged from 5 to 30 years.The spatial coverage of the 31 papers includes 9 applied to a global domain, where the data were collected from different countries, and 22 applied to a single country as a local domain.
( in addition to land cover classification (Wang et al., 2022).

Performance Metrics
The performance metrics used varied among the studies, based on the model used (Fig. 3).
Therefore, Table S2 summarizes the most used metrics among the papers with the size of training and testing set used in each paper.Accuracy, which was used in 15 papers, was the most popular metric.Four papers included a precision metric, and three papers included a recall metric.Two papers included the model's running time to measure the performance, especially for real-time detection.Five of the 15 papers that included accuracy as a metric demonstrated accuracy above 90%.The highest level of performance was achieved using SVM in the study of Wang et al. (2022), with values of 98%.Of the three papers that used recall, one study by Zhang et al. (2015) achieved a recall higher than 99%.The lowest precision was observed in Rivas-perea et al.
(2010) with maximum likelihood, resulting in a value of 52.55%.

Fig. 3. Overview of performance metrics used to evaluate the results of machine learning models
As shown in Table S2, among the various methods for predicting dust storm category, the highest levels of performance (above 95%) were achieved using a hybrid machine learning

A C C E P T E D M A N U S C R I P T
technique by combining the SMOTE algorithm with the AdaBoost and random forest algorithms (Zhang et al., 2015) as well as by using decision tree or naïve Bayes techniques (Ali et al., 2019).
For the dust storm detection category, the highest level of performance (above 95%) was observed by Wang et al. (2022) who used SVM and Xiao et al. (2015), who used the multi-layer perception (MLP) neural network.Table S3 shows all the performance metrics used in each paper.
Regarding the application of using machine learning to predict fine dust (PM2.5) and coarse dust (PM10), there are two studies conducted to compare different models for each dust type (Aryal, 2022a(Aryal, , 2022b)).These two studies show that non-linear models perform better than linear regression to predict both types of dust.However, the highest performance for the prediction is by applying the random forest model to fine dust, where the studies show that machine learning models better predict fine dust than coarse dust.

Machine learning techniques
From the analysis of the literature, it was found that 28 machine learning techniques have been implemented in total.More specifically, 19 different techniques used for the dust storm prediction category where the most used methods were based on SVM, and random forests (RF).
In the dust storm detection category, 16 techniques were used, and the most popular method is SVM.In Fig. 4 all the machine learning methods applied to each category are presented.In examining the machine learning techniques that have been applied, it was observed that 296 most studies show superior performance.However, when comparing the performance between 297 different techniques, it is essential to adhere to the same experiment, including datasets and 298 performance metrics.Thus, our comparison has been limited to machine learning model's main 299 characteristics shown in each paper.300 Another advantage of the SVM model is its effectiveness in classifying dust storms pixels over land and oceans and even in identifying dust density from the mass of the storm (Lu et al., 2006;Shahrisvand and Akhoondzadeh, 2013;Wang et al., 2022).

A C C E P T E
Distinguishing clouds, smoke, and land from dust aerosols pixels from satellite images; can be effectively addressed by applying Maximum likelihood and probabilistic neural network (PNN) models where the PNN model shows higher performance than Maximum likelihood.In this regard, a PNN approach is also considered more suitable for cases where near-real-time processing is required, such as in an emergency alarm system (Rivas-perea et al., 2010).
Although the decision tree method has been used to predict dust storms effectively by training a machine learning model on meteorological data (Ali et al., 2019), it can also detect dust storms' main areas from satellite images.However, it cannot effectively distinguish lowerdensity dust aerosol pixels from the central mass compared to SVM and MLP (Shahrisvand and Akhoondzadeh, 2013).In other studies, the random forest classifier was applied to extract cloudy pixels to detect even thin dust aerosols (Berndt et al., 2021;Souri and Vajedian, 2015).The random forest also shows high performance in predicting fine and coarse dust from meteorological data compared to MLR, SVM, BRNN, and cubist (Aryal, 2022a).
Predicting dust storms using historical cases with the CBR model is effective when considering future similar events; however, to address the uncertainty of ecological systems, this model was integrated into Bayesian networks in Murayziq et al. (2017).The naive Bayes approach provides high accuracy and speed when applied to large meteorological databases

A C C E P T E D M
A N U S C R I P T (Shaiba et al., 2018;Ali et al., 2019).Also, the support vector regression (SVR) approach yields good results in large-scale satellite image classification problems (Rivas-perea et al., 2015).An Adaptive neuro-fuzzy inference system (ANFIS) integrates information from several sources, handles large amounts of noisy data, and deals with non-linear relationships between the model inputs and outputs.However, it could be better in generalization if it is applied with large datasets to enhance the validation (Aryal, 2022b;Ebrahimi-khusfi et al., 2021b) ANN better performs classifying dust pixels than the threshold-based method over brightness surfaces as it can distinguish between a dust storm and sandy land (El-ossta et al., 2013).Also, applying the ANN technique is advantageous when handling internal relations to identify the effects of different meteorological variables without requiring feature selection (Kh zamim et al., 2019;Lee et al., 2021).However, it leads to uncertainty and ambiguity due to differences in the hyperparameters, such as learning rate, the number of hidden layers, and the number of neurons (Chacon-murguía et al., 2011;Lu et al., 2006).
The convolutional neural network (CNN) is used to detect and extract dust storm intensity from satellite images, improving the detection over various surfaces (Jiang et al., 2022;Tiancheng et al., 2019).Improving CNN performance requires increasing the window size of pixels (Lee et al., 2021).Hybrid techniques based on the SMOTE algorithm are used to predict dust storms where dust storms are a minority class from actual weather data in which dust storm events are infrequent (Xie et al., 2015;Zhang et al., 2015).
Finally, transfer learning utilizes the previous domain knowledge to solve a related problem, such as using samples of dust storm emissions from dust sources as a source domain.
The target domain is the samples of places dust storms crossed after a few days to detect dust particles based on their physical properties.Transfer learning enhances training speed and generalization ability (Qing-dao-er-ji et al., 2020;Ma et al., 2015).

A C C E P T E D M A N U S C R I P T MACHINE LEARNING APPLICATIONS TO DUST STORMS 16
We also observed that all papers trained and tested their models on actual dust storm events data, not simulated data.Finally, as discussed in Section 3.2, studies used different datasets for training their models, which reduces the confidence in the overall findings, although there have been indications that the machine learning models seem to generalize well, with only a slight reduction in performance.

CONCLUSION AND FUTURE DIRECTIONS
In this paper, we have summarized through a meta-analysis the research efforts on the application of machine learning to dust storms.We have identified 31 relevant studies, examining the area they focus on, machine learning techniques, sources of data used for the model input, the output classes, and overall performance metrics used by each paper.We then compared most used machine learning models in terms of their performance.Our findings indicate that using machine learning offers accurate results, involving classifying the dust aerosols from satellite images or predictions by using metrological data.
This review hopes to motivate researchers to experiment with machine learning to solve related dust storm challenges on classification and predictions, related to satellite images or meteorological data analysis.The overall advantages of machine learning are encouraging for its further use towards monitoring natural hazards, early warning, and sustainable land management.
Observing the review studies which list various existing applications of machine learning in the prediction of dust storms, we can see that only daily prediction of dust storm occurrence and real-time detection has been conducted using machine learning.We expect further research to provide predictions of dust storms for more than three days.Another opportunity for future directions is to apply machine learning to measure dust storm characteristics, for example, forecasting the duration and intensity, which are essential information for warning systems.

A C C E P T E D M A N U S C R I P T
The data's spatial or temporal coverage limitations could be solved in future work by integrating various data resources, such as meteorological data, which is available on the regional scale and provide measurements of the atmosphere near the surface.Simultaneously, remote sensing images cover a wide area.Integrating these data sources can also enhance distinguishing dust, clouds, and ice by combining object detection with observed meteorological data, such as wind speed, direction, and air quality measurements.
More approaches to adapting deep learning models are expected in the future because deep learning is very good at detecting complex patterns of phenomena.Also, exploiting recurrent neural network (RNN) models in processing the time series would perform well in prediction.Future work requires exploring the dust storms emissions, transport, and depositions and researching the potential advantages of leveraging novel deep learning models.

A C C E P T E D M A N U S C R I P T
This research followed a meta-analysis study design.Meta-analysis was used to address dust storm challenges with machine learning algorithms.The research samples are journal and conference papers, book chapters, and review articles published between January 2000 and August 2022.The search covered the online scientific databases SpringerLink, ScienceDirect, IEEE Xplore, and ACM digital library, as well as the web-based scientific index Google Scholar, which contains articles from other online sources, such as academic publishers and online repositories.As a search query, we used ("dust storm" OR "sand storm") AND "machine learning" within the full text.The initial result had 349 papers, followed by a review through the title of the articles and then reviewing the abstract, and then the full text.The studies were excluded if (a) irrelevant subject, (b) did not use machine learning technique, (c) the information of the results was insufficient for analysis, and (d) duplicate studies.
learning methods applied to dust storms have been published since 2006, and two principal categories have been identified.The first area of the study concerns dust storm detection using satellite data.The second area relates to dust storm forecasting and prediction using meteorological data.The distribution by year shown in Fig.1and Fig.2captures a high number of dust storm detection studies.

Fig. 1 .
Fig. 1.The number of publications by year Fig. 2.The distribution of publications into two study areas ) added cloud information to the classifier and the output classes are {dust with no cloud, dust, dust with cloud, others}.Another study added cloud classifications {thick cloud, cloud-cirrostratus, Thin Cloud } and dust classifications {water dust, thick dust, thin dust},

Fig. 4 .
Fig. 4. The total number of machine learning methods by sub-categories