Benchmarking microbarom radiation and propagation model against infrasound recordings: a vespagram-based approach

This study investigates the use of a vespagram-based approach as a tool for multi-directional comparison between simulated microbarom soundscapes and infrasound data recorded at ground-based array stations. Data recorded at the IS37 station in northern Norway during 2014− 2019 have been processed to generate vespagrams (velocity spectral analysis) for five frequency bands between 0.1 and 0.6 Hz. The back-azimuth resolution between vespagrams and the microbarom model is harmonized by smoothing the modelled soundscapes along the back-azimuth axis with a kernel corresponding to the frequency5 dependent array resolution. An estimate of similarity between the output of the microbarom radiation and propagation model and infrasound observations is then generated based on the image processing approach of the mean-square difference. The analysis reveals that vespagrams can monitor seasonal variations in the microbarom azimuthal distribution, amplitude, and frequency, as well as changes during sudden stratospheric warming events. The vespagram-based approach is computationally inexpensive, can uncover microbarom source variability, and has potential for near-real-time stratospheric diagnostics and atmospheric model 10 assessment.


Introduction
Microbaroms are infrasound waves with frequencies typically between 0.1 and 0.6 Hz generated by nonlinear interaction between counter-propagating ocean waves. Once gen-erated, microbaroms penetrate the atmosphere where vertical wind and temperature gradients are responsible for the presence of waveguides or sound channels (Brekhovskikh, 1960;Diamond, 1963). Waveguides duct the infrasound between the ground and different atmospheric layers and are usually classified into tropospheric, stratospheric, and thermospheric (Hedlin et al., 2012;de Groot-Hedlin et al., 2010). Seasonal variations in the zonal stratospheric wind (eastward -westward) are crucial for the stratospheric waveguide, which is of particular interest in this study. These variations, combined with an increase in temperature in the stratosphere, make the effective sound speed higher than at the surface. This causes the refraction of the infrasound waves back to the ground. The low-frequency microbaroms can be ducted over long distances due to the weak attenuation, which is proportional to the frequency squared. Hence, there is a potential to exploit microbaroms to probe the dynamics of the stratosphere, where the representation of the atmospheric dynamics in model products is often poorly constrained (Polavarapu et al., 2005;Rienecker et al., 2011;Smith, 2012;Amezcua et al., 2020).
The term "microbarom" was established by Benioff and Gutenberg (1939), who described quasi-continuous pressure fluctuations with periods of 0.5-5 s recorded by two electromagnetic barographs installed by the Seismological Laboratory, California Institute of Technology, Pasadena, USA. Following Benioff and Gutenberg (1939), several microbarom studies were performed by scientists around the globe. Joint observation of microbaroms and microseisms (quasi-continuous fluctuations of the ground displacement generated by ocean waves) in California, USA (Gutenberg and Benioff, 1941), Christchurch, New Zealand (Baird and Banwell, 1940), Fribourg, Switzerland (Saxer, 1945(Saxer, , 1954Dessauer et al., 1951), and New York, USA (Donn and Posmentier, 1967), demonstrated that the microbarom signals originate from the ocean.
Thereafter, efforts were made to develop theories to explain the physical mechanisms of microbarom generation (Brekhovskikh et al., 1973;Waxler et al., 2007). A recent model proposed by De Carlo et al. (2020) unifies aforementioned theories of microbarom generation, taking into consideration both the finite ocean depth and the source radiation dependence on elevation and azimuth angles. This model can predict the location and intensity of the source when coupled with an ocean wave spectrum model. However, for comparison with infrasonic observations at distant groundbased stations, it is necessary to consider the influence of the atmospheric structure on the microbarom propagation and ducting. This can, for example, be estimated using a semiempirical, range-dependent attenuation model in a horizontally homogeneous atmosphere (Le Pichon et al., 2012) or a wave propagation simulation using 3-D ray tracing (Smets and Evers, 2014). Details on our suggested vespagram-based comparison approach to microbaroms modeled by a state-ofthe-art microbarom radiation theory (De Carlo et al., 2020) are presented in Sect. 2.2.
In array signal processing, velocity spectral analysis (vespa) is an approach which analyzes recorded signals in terms of signal power as a function of time (Davies et al., 1971;Rost and Thomas, 2002;Schweitzer et al., 2012). The power is evaluated either at a fixed slowness, i.e., a constant apparent velocity with varying back azimuth -corresponding to a circle in the slowness space -or at a fixed back azimuth with varying apparent velocity -corresponding to a line in slowness space. The vespa power estimate can, therefore, be visualized as an image, called a vespagram, with time on one axis and either back azimuth (for a fixed apparent velocity) or apparent velocity (for a fixed back azimuth) as the other axis. Lonzaga (2015) used a phase diagram approach to demonstrate that infrasound arrivals from stratospheric ducts typically have apparent velocities between 340 and 380 m s −1 . In the current work, the main focus is on microbaroms. These low-frequency waves have an apparent velocity of around 350 m s −1 , as found by Rind et al. (1973). As such, timeback azimuth vespagrams estimated from infrasound array data at an apparent velocity of 350 m s −1 are used in the current study. Histograms of the apparent velocity statistics for the current data set are provided in Appendix A, which also support the use of this apparent velocity when generating the vespagrams. For a given frequency band, such vespagrams can be compared in a straightforward manner to microbarom soundscapes modeled for a station location, after applying a smoothing kernel which harmonizes the resolu-tion given by the array response function main lobe and the microbarom model output. Both the vespagram and the microbarom model provide power estimates as a function of time and back azimuth. These can be displayed as an image, and we utilize an image comparison approach, based on the mean square error, to benchmark the model against vespagrams. In this study, 6 consecutive years of infrasound observations between 2014 and 2019 at a ground-based infrasound array located in Bardufoss,Norway (69.07 • N,18.61 • E), denoted as IS37 or I37NO (Fyen et al., 2014), are considered. An overview of the station configuration, data, and analysis methods is provided in Sect. 2.1.
The proposed vespagram-based approach is computationally low cost and can monitor microbarom source variability over a year (Sect. 3.1), as well as detect changes during extreme atmospheric events such as sudden stratospheric warmings (Sect. 3.2). It might be further refined for applications such as near-real-time diagnostics of ocean wave and atmospheric models and for a long-term assessment of model product uncertainties, particularly when applied to data from a global network of infrasound stations. A key aspect of this approach is that benchmarking between model and infrasound vespagrams considers all back azimuth directions rather than just the direction of the dominant microbarom source, as done in several previous studies (Garcés et al., 2004;Hupe et al., 2019;De Carlo et al., 2019, 2021Smirnov et al., 2021). The microbarom soundscape at a station is typically a sum of components stemming from a wide spatial distribution of ocean regions, and recently, den Ouden et al. (2020) demonstrated that an iterative decomposition of the array spatial covariance matrix, using the CLEAN algorithm (Högbom, 1974), can be exploited to resolve the back azimuth and trace velocity of the most coherent wavefront arrivals.
A long-term ambition is to exploit microbarom infrasound data sets to enhance the representation of stratospheric dynamics in atmospheric model products and hence increase the accuracy of both medium-range weather forecasting and sub-seasonal climate modeling (Büeler et al., 2020;Dorrington et al., 2020;Domeisen et al., 2020a, b). In addition to prospective numerical weather prediction improvements, the suggested vespagram-based approach may be applied in multi-technology studies of atmospheric dynamics, for example initiatives building on the Atmospheric dynamics Research InfraStructure in Europe (ARISE) projects . These aim at harvesting from synergies between ground-based infrasound observations, radar, and lidar systems, as well as airglow and satellite observations to monitor the middle atmosphere (Chunchuzov et al., 2015;Le Pichon et al., 2015;Hupe et al., 2019;Smets et al., 2019;Hibbins et al., 2019;Assink et al., 2019;Le Pichon et al., 2019).
The study is organized as follows. The data and methods are described in Sect. 2. The main results are presented in Sect. 3, followed by the discussion in Sect. 4. The infrasound array, denoted as IS37 or I37NO, is located in Bardufoss, Norway (69.07 • N, 18.61 • E), and is equipped with 10 MB3-type (MB2005 prior to 2016) microbarometers over an aperture of 2 km ( Fig. 1a; Fyen et al., 2014). This station is part of the International Monitoring System (IMS) which verifies compliance with the Comprehensive Nuclear-Test-Ban Treaty (CTBT) (Dahlman et al., 2009;Marty, 2019). The station was certified by the CTBT Or-ganization on 19 December 2013 and is operated by NOR-SAR, Kjeller, Norway (Schweitzer et al., 2021). Besides being included in the IMS, IS37 is also part of a regional network of European infrasound stations (Gibbons et al., 2007(Gibbons et al., , 2015(Gibbons et al., , 2019) that resolves significantly smaller events than the global IMS network (Le Pichon et al., 2008). In the framework of the regional network, data from IS37 have been used for multi-station studies characterizing European infrasound sources (e.g., . The IS37 station routinely detects microbaroms within 0.1-0.6 Hz originating from the North Atlantic, the Barents 518 E. Vorobeva et al.: Benchmarking microbarom radiation and propagation model Sea, and beyond. An analytical expression for a plane wavefront incident on the IS37 array was used to characterize the array's integrated, frequency-dependent response in 0.1 Hz wide frequency bands from 0.1 to 0.6 Hz. The wavefront was representative of a microbarom signal from the Atlantic Ocean, with a back azimuth of 225 • and an apparent velocity of 350 m s −1 , typical of the stratospheric regime (Garcés et al., 1998;Whitaker and Mutschlecner, 2008;Nippress et al., 2014;Lonzaga, 2015). The base resolution of the array was taken to be the 1σ beamwidth of the Gaussian fitted to the array response at a constant velocity of 350 m s −1 (dashed line in Fig. 1b) for each frequency band. The resulting resolution was found to be 35, 23, 16, 13, and 10 • for 0.1-0.2, 0.2-0.3, 0.3-0.4, 0.4-0.5, and 0.5-0.6 Hz band, respectively. It should be noted that this estimate is based on the homogeneous medium plane wave time delays between the array elements only and does not take into account meteorological conditions at the station, noise, or other coherence loss mechanisms that may result in a wider beamwidth.
In array signal processing, the separation of coherent from incoherent parts of the recorded signal and the separation between different simultaneous arrivals are important concepts. When analyzing the wave field in terms of a given horizontal slowness vector (e.g., described in terms of apparent velocity and back azimuth), delay-and-sum beamforming (Ingate et al., 1985) is usually applied in combination with the underlying plane wave model assumption. This method applies time delays to the array sensor traces to focus on wavefronts arriving with a specific horizontal apparent velocity and a specific back azimuth direction, hence amplifying wave field components with the horizontal slowness of interest, while suppressing other components. However, the slowness vector models are not always accurate (Gibbons et al., 2020). In particular, the actual shape of the wavefront arriving at infrasound arrays may differ from a theoretical plane wave due to meteorological conditions and turbulence at the station, which make the underlying assumption of a locally homogeneous effective sound speed invalid. In this case, the beamforming is less efficient and the reduced array gain results in lower stack amplitude and signal distortion (Rost and Thomas, 2002).
To determine an unknown slowness vector component and to study the spatial structure of the wave field over time, one can use vespa (velocity spectral analysis) processing. This not only enhances the signal as the beamforming does but also allows one to determine either the direction or apparent velocity of the incoming signal. The vespa method estimates the power of the signal either for a fixed apparent velocity with varying back azimuth or for a fixed back azimuth with varying apparent velocity. The result of the vespa processing is usually presented as an image displaying the power of incoming signal as a function of time and back azimuth (or apparent velocity) called a vespagram. Although the vespa is widely applied in seismological array data studies (e.g., Davies et al., 1971;Kanasewich et al., 1973;Muir-head and Datt, 1976;McFadden et al., 1986), it has not previously been exploited in peer-reviewed microbarom infrasound studies.
The vespa-processing procedure described below is applied to each analyzed time window and frequency band as follows: 1. We extract, for each sensor n of an array, the signal trace x n (t) for the time window of interest. The analysis is done for a 1 h moving time window evaluated every 30 min. In general, the time series recorded at sensor n at the location r n can be written as follows: where y(t) represents a plane wavefront, and s hor is the horizontal component of the slowness vector 2. We remove the mean.
3. We apply a Butterworth bandpass filter to recordings. Calculations are performed for five equally spaced frequency bands that are within the microbarom frequency range (see Fig. 1b).
4. We compute beam traces or delay-and-sum traces of an array with N sensors as follows: In this study, classical linear vespa processing (Davies et al., 1971) is applied, where the noise suppression is proportional to the square root of N (Rost and Thomas, 2002). A beam is generated at each 1 • in back azimuth for the fixed apparent velocity of 350 m s −1 . That allows us to estimate signals coming from all directions and from approximately the same height corresponding to stratospheric altitudes.
5. We calculate the mean squared pressure (power) of each beam to obtain an estimate of the incoming signal strength as a function of back azimuth and time.
Steps (1)-(5) are applied to all years of data analyzed. After the vespa processing, we apply a quality check based on the vespagram spectrum properties to exclude noisy data. At time windows when the vespa processing yields a directional spectrum with the power almost equal in all directions (the minimum exceeds 70 % of the maximum), data are ignored in our further analysis.

Microbarom source and propagation modeling
In this section, we summarize the approach applied to obtain the directional spectra of microbarom soundscapes as a function of time.
Ocean wave model. Hasselmann (1963) demonstrated that the nonlinear interactions between counter-propagating ocean waves can be presented as follows: where H (f ) is the interaction term spectrum known as the Hasselmann integral, E(f w , θ ) is the power spectral density of the surface elevation, θ is the direction of the ocean wave propagation, f w is the wave frequency, and f is the acoustic frequency (f w = f/2). Ocean wave models estimate E(f w , θ ) and its variations in space and time based on the surface wind fields. In this study, the WAVEWATCH III ® (The WAVEWATCH III ® Development Group, 2016) ocean wave model is considered (hereafter WWIII). The model grid resolution is 0.5 • in latitude and longitude and 3 h in time.
Among the WWIII output parameters is the p2l(f w ) that represents the spectral density of the equivalent surface pressure that forces microbaroms as follows: where ρ is the water density, and g is the gravity acceleration. Hence, the Hasselmann integral needed for further calculations can be obtained from the WWIII model using Eq. (4). Studies on microseisms (e.g., Hillers et al., 2012) have demonstrated the limitations of a model that does not account for coastal reflection. These limitations have been accordingly raised in the context of microbaroms (Landès et al., 2014). Therefore, in this study, the parametrization used to run the WWIII model accounts for fixed reflection coefficients of 10 % for the continents, 20 % for the islands, and 40 % for ice sheets (Ardhuin et al., 2011). Microbarom source model. A microbarom source model is basically a model transforming an ocean wave model output into acoustic radiation spectrum in the atmosphere. Here, calculations are based on the model by De Carlo et al. (2020), taking into consideration both the finite ocean depth and the source radiation, depending on elevation and azimuth angles. This microbarom model allows the prediction of the location and intensity of the microbarom sources when applied to the Hasselmann integral. The output of this step is an acoustic spectrum for each cell of the wave model.
Microbarom propagation in the atmosphere. The next step is to account for the atmospheric influence on the microbarom propagation and ducting. For example, 3-D ray tracing or full-waveform approaches would provide an accurate simulation of the infrasound propagation; however, these methods involve a large computational burden (De Carlo et al., 2021). Instead, the semi-empirical attenuation law by Le Pichon et al. (2012) is used in the current study. The law is applied to the microbarom source spectra obtained in the previous step. This law accounts for the distance between the source and the station, as well as for the frequency, but assumes a horizontally homogeneous atmosphere. The atmospheric conditions at the station location are taken into account via V eff-ratio , which is the ratio of the effective sound speed in the propagation direction at 50 km altitude and the effective sound speed in the same direction on the ground. The atmospheric wind and temperature needed to determine the V eff-ratio are derived from the European Center for Medium-range Weather Forecasts (ECMWF) High-Resolution (HRES) forecast model (http://www.ecmwf.int, last access: 7 June 2021, Integrated Forecasting System -IFS; cycle 38r2). The atmospheric wind and temperature at the station location are extracted from a 0.5 • × 0.5 • model grid using bilinear interpolation. The temporal resolution of the ECMWF HRES forecast model is 6 h, and the assumption of the constant atmospheric wind and temperature over this time period is made to avoid a possible discrepancy caused by interpolation in time. The output of this step is an acoustic spectra attenuated to reflect what would be seen by the station.
Summation of sources. To obtain the directional spectrum at the station, all attenuated spectra from model cells within 1 • azimuth band and less than 5000 km away from the station are summed. The distance limitation comes from the attenuation law definition. Although this attenuation law is widely used for propagation over very long distances (Smirnov et al., 2021;Pilger et al., 2019;Hupe et al., 2019;De Carlo et al., 2019), it was designed for distances up to 3000 km only. The choice of the maximum distance depends on the location of the station and the main sources, as well as on how realistic spectrum is needed for a specific task. Recently, De Carlo et al. (2021) provided a multi-station comparison between progressive multi-channel correlation (PMCC)processed microbarom data and the microbarom model by De Carlo et al. (2020), which is also used in the current study. Results for 45 IMS stations in (De Carlo et al., 2021) demonstrate that integrating microbarom sources at distances up to 5000 km provides realistic spectra. Thus, all sources that are more than 5000 km away from the IS37 station are excluded from this study.
After applying these steps, and integrating over the frequency bands, we obtain an estimate of the microbarom power spectral density as a function of time and back azimuth, just as vespagrams. However, vespagrams cannot be directly compared to the modeled microbarom soundscapes since the latter do not take into account the frequencydependent array resolution. Therefore, we smooth the modeled microbarom soundscapes by convolving with a Gaussian kernel at each time step, taking into account cyclical nature of back azimuth when smoothing near 360 • /0 • . Kernels are normalized to the unit area, and their standard deviations (width) decrease with frequency (see Sect. 2.1). 520 E. Vorobeva et al.: Benchmarking microbarom radiation and propagation model

Similarity index
This section introduces an approach to benchmark the microbarom model against the infrasound data. Both data sets need to be of the same temporal resolution to assess a similarity at each time step. In the current study, in order to avoid interpolating the model output in time, the vespa processing output is sub-sampled to match the 3 h microbarom model grid. Hereinafter, all results are presented with a temporal resolution of 3 h. Figure 2 presents differences in the direction of the maximum power between the vespagram and either the model or the smoothed model. Both medians and uncertainty ranges are estimated based on the back azimuth difference at the maximum power only. Uncertainty values falling into the 25 to 75 percentile range are an objective assessment of the discrepancy between the model and vespagrams. These values originate from the wintertime when atmospheric conditions are favorable for the eastward ducting (Sect. 3.1). In summer, atmospheric conditions are not so stable (due to, e.g., increased cyclonic activity), and there are several factors that can cause model-vespagram discrepancies (Sect. 3.1) accompanied by an increase in the uncertainty range. Note that, after the smoothing (Sect. 2.2), there is a better agreement between the model and the vespagram, leading to a decrease in the median and the uncertainty ranges (Fig. 2).
A similarity index (SI), inspired by comparison approaches in image processing, is introduced as follows: where MSE is the mean squared error (or difference) between the normalized smoothed model output, P model (t, θ ), and the normalized vespagram, P vespa (t, θ ), calculated at each time step, where θ is back azimuth, and t is time. This SI metric is, hence, insensitive to the total microbarom power but instead provides information on how accurate the model reproduces the directional pressure spectrum in the recorded data.
A SI equal to one indicates a full match between model and infrasound vespagram in terms of the back azimuthal power distribution.

Comparison for full seasons
This section presents a multi-year model-vespagram comparison, focusing on microbarom characteristics over different seasons. We start with a detailed look at 2016 results, followed by an analysis of all 6 years. Figures 3 and 4 present benchmarking microbarom model and vespa-processing images (vespagrams) for two frequency bands, namely 0.1-0.2 and 0.5-0.6 Hz, for 2016. These figures contain eight panels each, which are discussed in more detail below. Figures 3a and 4a show the maximum amplitude per time step over 1 year, i.e., the dominant signals in the azimuthal spectra. Enhanced ocean source activity during winter is accompanied by the eastward stratospheric wind favorable for ducting infrasound over long distances (Le Pichon et al., 2006). This results in a maximum of microbarom pressure amplitude both in model and vespagrams, regardless of the frequency band. The microbarom radiation model by De Carlo et al. (2020), combined with the semi-empirical wave attenuation law (Le Pichon et al., 2012), generally reproduces infrasound amplitudes accurately. An exception is between days 200 and 210 in 2016 (Fig. 3a), when the modeled amplitude is much lower than the amplitude obtained via the vespa processing. This discrepancy may indicate an overestimation of the attenuation caused by errors in the atmospheric model wind or due to the range-independent simplification.
Figures 3b-d and 4b-d display microbarom soundscapes as predicted by the model, smoothed model and vespa processed recordings, respectively. Here the soundscapes are plotted in terms of the base 10 logarithm of the amplitude, in order to allow the showing of the weaker summertime microbarom amplitudes in the same display as the stronger wintertime signals. Note that the vespagrams are noisier during summer (gray fields in Figs. 3g and 4g), especially for the 0.1-0.2 Hz band.
Due to the strong seasonal variability in the microbarom amplitude, it is difficult to compare the direction of winter to summer detections on an absolute amplitude scale. Thus, we normalize Figs. 3b-d and 4b-d by the maximum amplitude at each time step (see Figs. 3e-g and 4e-g; right) and estimate the directional distribution of the dominant signal in 10 • bins (see Figs. 3e-g and 4e-g; left). For the frequency band of 0.1-0.2 Hz, the North Atlantic is the dominant source direction throughout the year (the main peak in Fig. 3e-g; left). However, the maximum power is sometimes also observed from northeasterly and southeasterly directions in summer. To generate infrasound at such low frequencies, the source needs to be of a substantial spatial extent, and therefore, there is a limited number of possible oceanic sources. After comparison with the model maps, we interpret these arrivals as microbaroms generated in the Pacific and Indian oceans, respectively. The stratospheric summertime westward wind could guide the infrasound waves towards the IS37 station. Unlike what is seen in the vespa-processed recordings, the model does not predict microbaroms originating from the Indian Ocean direction. A plausible reason for this is that the distance between the station and the Indian Ocean source region is ∼ 7000-8000 km, which is much greater than the maximum distance of 5000 km included in the modeling (see Sect. 2.2). Looking at higher frequencies, there is a pronounced change in the dominant direction of the source from the Atlantic in winter to the Barents Sea and the Greenland Sea in summer (peaks in Fig. 4e-g; left). This is associated with the change in wind direction in the stratosphere from eastward to westward. An analysis of a 6-year data set in terms of the dominant source direction indicates three prevailing microbarom source regions associated with the North Atlantic, the Greenland Sea, and the Barents Sea. These appear at the vespagram (model) back azimuths of 266 ± 14 • (265 ± 16 • ), 339 ± 14 • (345 ± 8 • ) and 26 ± 6 • (34 ± 7 • ).
Figures 3h and 4h present values of SI obtained over a year. In winter, SI for lower frequencies is stable and has values close to one, with exceptions corresponding to increased noise level in vespagrams or to sudden stratospheric warming (SSW) events that are discussed in Sect. 3.2. Relatively low SI for higher frequencies can be explained either by spurious apparent sources corresponding to array response function side lobes (Fig. 1b) or by the presence of local sources in the vespagram that are missed or not well reproduced in the model. In summer, SI values are quite variable and unstable but never fall below 0.5. Such behavior is typical regardless of the year or frequency band (see Fig. 5 for a multiyear comparison). One possible explanation is the changing weather conditions present at the station throughout the year. For example, Orsolini and Sorteberg (2009) have shown an enhancement in the number and intensity of summer cyclones in the Arctic and northern Eurasia. This would result in additional wind and rain noise in the infrasound recordings that would especially be enhanced at the lower frequencies.
Another possible contribution would be the poor resolution of the array at low frequencies that can mix stratospheric sig-nals with those from higher altitudes. These sometimes dominate at IS37 in summer  but are not included in the model. The relative stability of the model's results in Fig. 4e and f relative to the vespagram would indicate that there are additional sources of variability, either atmospheric, source region, or propagation path, that are not well characterized in the model.
As indicated by high SI values, especially in winter, the infrasound data processed in the framework of the vespa approach are in a good agreement with modeled microbarom soundscapes in both time (seasonal variations) and space (directional distribution). The similarity estimation proposed allows the detection of inconsistencies between the microbarom model and the vespa processing, which might be used for identifying biases in atmospheric models. This is especially promising for low frequencies where side lobes of the array response do not appreciably affect the analysis.

Examination of major sudden stratospheric warmings
Although this is not the main objective of the current study, in this section we examine the ability of the vespagrams to detect extreme atmospheric events, such as SSWs, and compare the model and the vespa processing for six selected events. SSWs usually occur in wintertime and are, in general, associated with a sudden and short increase in stratospheric temperature and mesospheric cooling at high and middle latitudes (Shepherd et al., 2014;Butler et al., 2015;Limpasuvan  The right-hand side panels in (e-g) are the same as panels (b-d) but after normalization by the maximum amplitude at each time step. The left-hand side panels in (e-g) show the normalized directional distribution of the dominant signal (10 • bins). Panel (h) shows the similarity index between panels (f) and (g) (right) and the normalized distribution (left). Panels (b-d) have a common color bar, as do panels (e-g). Gray fields indicate periods where infrasound data are disregarded due to noise and an indistinct directional spectrum. Panels (b-g) are visualized using the Turbo color map (Mikhailov, 2019(Mikhailov, ). et al., 2016Zülicke et al., 2018). SSWs are often classified into minor and major warmings, depending on whether there was a weakening or reversal of the zonal wind (Butler et al., 2015). During the period of our consideration, three major and three minor SSWs occurred. Major SSWs took place with onsets on 5-6 March 2016 (Manney and Lawrence, 2016), 11 February 2018 Lü et al., 2020), and 1 January 2019 (Rao et al., , 2020, while the minor events occurred with onsets on 4 January 2015 (Manney et al., 2015;Mitnik et al., 2018) and 1 and 26 February 2017 (Eswaraiah et al., 2020). Note that there can be an error of up to several days in determining the SSW onset day, since there is no single way to define the onset, and different authors use different definitions. A prime example is the first SSW in 2017. According to the definition of the World Meteorological Organization, this event is classified as minor, but in a number of studies, it is referred to as major (Xiong et al., 2018;Conte et al., 2019). Vertical dashed lines in Figs. 5-6 correspond to the onset days listed above when SSW criteria were met.
The infrasound signature reported by Donn and Rind (1971) and Evers and Siegmund (2009), which showed a sig- nificant change in direction of the infrasound arrival due to a change in favorable stratospheric waveguide, can be seen in Fig. 6 for all SSWs under consideration and in Figs. 3e-g and 4e-g for the 2016 SSW. The change in direction from the North Atlantic to the Barents Sea is clearly pronounced in both model and vespagrams around SSWs onset days. Figure 3f and g demonstrate that the signature appears late in the model data, and its duration is much shorter than in the vespagram, analogous to the study by Smets et al. (2016). For higher frequencies ( Fig. 4f and g), the duration of a change from an eastward to a westward pattern is longer and continues until late March or early April, which corresponds to reanalysis data (Manney and Lawrence, 2016).
Another feature revealed is a significant decrease in similarity index between the model and vespagrams during SSWs (Fig. 5), which is characteristic for all events under consideration. The smallest discrepancies in the direction of the dominant wavefront between the model and infrasound data dur-ing SSWs reach about 5-7 • , but the largest reach as much as 90-100 • (Fig. 6). This may be caused by the following factors. In most cases, the back azimuth change around SSWs onsets appears earlier in the vespagrams than in the model, with the difference of 3 to 24 h. Note that this also depends on the frequency band. Similar results were previously obtained by Smets and Evers (2014) and can be explained by the presence of an error in determining a SSW onset day from (re-)analysis data because of a scarcity of observations at stratospheric altitudes (Charlton-Perez et al., 2013) or by inadequate stratospheric analysis and forecast during SSWs, as addressed by Diamantakis (2014) and Smets et al. (2016). Sometimes the SSW signature does not appear in the vespagram while appearing in the model (see Fig. 6 around SSW 2018 onset day, for example). This can arise when employing a horizontally homogeneous atmosphere and overly constraining the model with the ECMWF wind and temperature at 50 km altitude. Such an approach does not allow a  red; 2015 -orange; 2016 -black; 2017 -green; 2018 -blue; 2019 -magenta. full, altitude-dependent description of infrasonic waves in the atmosphere and causes discrepancies between model and vespagrams. Considering long propagation path for microbaroms, net wind effect along the propagation path can be equal to zero in the vespagram, in contrast to the model, which estimates the probability of the signal arrival at the final point of the path. It has been demonstrated by Evers and Siegmund (2009) and Smets and Evers (2014) that the ECMWF wind direction does not always characterize the actual infrasound path, resulting in the abovementioned modelvespagram discrepancies.
Despite slight differences in the dominant direction of the wavefront arrival during SSW events, both model and vespagrams reproduce changes in the infrasound pattern correctly in time. Moreover, since vespagrams can detect changes in the stratospheric dynamics during extreme events, there is a potential for using it for near-real-time stratospheric diagnostics.

Discussion and conclusions
In this study, we compare observed and predicted microbaroms soundscapes using a vespagram-based approach. Analysis is performed based on the calculation of microbaroms power as a function of time and back azimuth at a constant apparent velocity of 350 m s −1 . Note, however, that the vespagram family of time-dependent microbarom data visualizations can also be constructed using other arrayprocessing techniques that estimate power as a function of the slowness of the wavefront, e.g., using robust estimators as explored by Bishop et al. (2020) or adaptive high-resolution approaches like Capon's method (Capon, 1969). An advantage of the vespagram approach is that microbarom radiation and propagation models can be benchmarked against recorded infrasound data for all directions simultaneously, as opposed to methods where only the back azimuth direction of maximum power is considered (e.g., Hupe et al., 2019;Figure 6. Comparison between microbarom azimuthal distributions at IS37 (vespagrams) normalized per time step and the back azimuths of the dominant signals as predicted by the model (red dots). The results are shown around SSWs 2015-2019 for the 0.3-0.4 Hz band. White vertical dashed lines indicate onset days when SSWs (minor or major) criteria were met. Gray fields indicate periods where infrasound data are disregarded due to noise and an indistinct directional spectrum. Smirnov et al., 2021). Since the vespa processing is computationally low cost and able to track variations in microbarom parameters over extended periods spanning 1 year or several years, it can be utilized for a near-real-time assessment of atmospheric model products and for developing infrasoundbased stratospheric diagnostics. It can also be used when assessing changes in infrasound signatures over shorter time windows, e.g., during extreme atmospheric events.
Limitations in this study are predominantly related to microbarom propagation modeling. In addition to the scarcity of observations at the stratospheric altitudes (Charlton-Perez et al., 2013), which affect the accuracy of the directional distribution of predicted microbarom soundscapes, the horizontally homogeneous atmospheric approximation used in the study creates substantial limitations. In fact, the atmosphere is not homogeneous. It has horizontal and vertical inhomo-geneities in the atmospheric wind and temperature fields caused, for example, by gravity waves, tides, and SSWs. These atmospheric disturbances significantly affect the longdistance infrasound propagation and, as a result, the directional spectra detected at the reception point. Moreover, the modeling would benefit from applying a full-waveform simulation code for the propagation of the radiated microbaroms to the station (e.g., Assink et al., 2014;Kim and Rodgers, 2017;Brissaud et al., 2017;Petersson and Sjögreen, 2018;Sabatini et al., 2019). This would provide a more refined modeling of the atmospheric ducting compared to the semiempirical approach (Le Pichon et al., 2012) applied in the current study. An alternative, which is less computationally expensive, is (3-D) ray tracing, which can account for both range-dependent atmospheric models and cross-wind effects (e.g., Smets and Evers, 2014;Smets et al., 2016). However, 526 E. Vorobeva et al.: Benchmarking microbarom radiation and propagation model the inherent high-frequency approximation of the ray theory can limit the modeling of diffraction and scattering effects (Chunchuzov et al., 2015) that can be important for the lowfrequency microbaroms. Also note that more advanced simulations of infrasound propagation would require wind and temperature profiles with a high vertical resolution (or an appropriate stochastic parametrization) to account for the effect of small-scale atmospheric disturbances on microbarom scattering. Resolving small-scale structures in atmospheric models, reanalysis, and forecasting systems remains a topic of active research. Several research efforts were made to develop methods exploiting infrasound observations to improve the representation of wind and temperature in atmospheric model products (e.g., Chunchuzov et al., 2015;Assink et al., 2019;Amezcua et al., 2020;Vera Rodriguez et al., 2020).
A more elaborate microbarom propagation model could also allow for an estimate of the full microbarom wave field impinging an infrasound station and, hence, provide an estimate of its power within the full horizontal slowness space of plane wavefront directions (or a selected relevant region). In this way, we could benchmark modeled and recorded microbarom fields at an infrasound array for each sliding time window in the full horizontal slowness domain, without restricting the analysis to the region around a fixed apparent velocity as carried out in the current study. Notably, such "slowness plots" of modeled and recorded microbarom power are also (time-varying) images, which can be assessed and compared using the versatile ecosystem of image-processing and image comparison algorithms.
Future developments can include a compilation of longterm, time-dependent statistics of the similarities between model and infrasound recordings for multiple stations on global and regional scales. This would allow the definition of anomaly flag criteria, which would indicate unexpected inconsistencies between model and observations due to, for example, biases in atmospheric model products. Moreover, we suggest applying the approach presented here to global assessment and comparisons of ocean wave action model products, as well as to validate and further refine microbarom radiation estimation algorithms.
E. Vorobeva et al.: Benchmarking microbarom radiation and propagation model 527 Appendix A: Apparent velocity statistics Figure A1 displays histograms of the apparent velocity detection statistics calculated using progressive multi-channel correlation (PMCC) processing (Cansi, 1995) for all frequency bands and years. These support the choice of 350 m s −1 as apparent velocity in the vespagram calculations for the analysis of microbaroms ducted through the stratosphere. Data availability. The IMS network's data can be accessed upon request through the virtual Data Exploitation Center (vDEC) via https://www.ctbto.org/specials/vdec (last access: 7 June 2021).
The high-resolution forecast model data, produced by the Integrated Forecast System of the ECMWF, can be accessed via https: //www.ecmwf.int/en/forecasts/datasets (last access: 7 June 2021).
Author contributions. EV and SPN developed the vespagram calculation code, performed the infrasound data processing, and made the model and data analysis. MDC and ALP developed the microbarom model code and performed the simulations. EV prepared the paper, with contributions from all coauthors. PJE supervised the work and reviewed the results. SPN and ALP initiated the study.
Competing interests. The authors declare that they have no conflict of interest.
Acknowledgements. This study was facilitated by previous research performed within the framework of the ARISE and ARISE2 projects , funded by the European Commission FP7 and Horizon 2020 programmes (grant nos. 284387 and 653980). The authors are thankful to Igor Chunchuzov and one anonymous reviewer, for valuable suggestions and constructive review of the paper. Ekaterina Vorobeva and Sven Peter Näsholm are also grateful to Yvan Joseph Orsolini, for the discussion and valuable comments.
Financial support. This research has been supported by the Research Council of Norway (Norges Forskningsråd) programme (grant no. 274377) under the project of Middle Atmosphere Dynamics: Exploiting Infrasound Using a Multidisciplinary Approach at High Latitudes (MADEIRA).
Review statement. This paper was edited by Gunter Stober and reviewed by Igor Chunchuzov and one anonymous referee.