Influence of turbidity and clouds on satellite total ozone data over Madrid (Spain)

This article focuses on the comparison of the total ozone column data from three satellite instruments; Total Ozone Mapping Spectrometers (TOMS) on board the Earth Probe (EP), Ozone Monitoring Instrument (OMI) on board AURA and Global Ozone Monitoring Experiment (GOME) on board ERS/2, with ground-based measurement recorded by a well calibrated Brewer spectrophotometer located in Madrid during the period 1996–2008. A cluster classification based on solar radiation (global, direct and diffuse), cloudiness and aerosol index allow selecting hazy, cloudy, very cloudy and clear days. Thus, the differences between Brewer and satellite total ozone data for each cluster have been analyzed. The accuracy of EP-TOMS total ozone data is affected by moderate cloudiness, showing a mean absolute bias error (MABE) of 2.0%. In addition, the turbidity also has a significant influence on EP-TOMS total ozone data with a MABE ∼1.6%. Those data are in contrast with clear days with MABE∼1.2%. The total ozone data derived from the OMI instrument show clear bias at clear and hazy days with small uncertainties ( ∼0.8%). Finally, the total ozone observations obtained with the GOME instrument show a very smooth dependence with respect to clouds and turbidity, showing a robust retrieval algorithm over these conditions.


Introduction
Satellites provide a global view of the Earth's atmospheric system over extended periods of time, with an appreciable spatial resolution allowing the systematic monitoring of the ozone layer.The instruments on board satellites need a continuous validation by well-calibrated and well-maintained ground-based instruments in order to assess the quality and accuracy of satellite data and to clarify local to regional specific sources of uncertainties.In this sense, the Brewer and Dobson spectrophotometers are generally considered as the standard reference for the remote sensing of the vertically integrated ozone amount (named total ozone column, hereafter denoted as TOC) from the Earth's surface (WMO, 1996).In a similar way, the performance of the ground-based instruments is being assessed using satellite data (Fioletov et al., 2008).
The main objective of this paper is to compare the TOC data provided by three satellite instruments: Total Ozone Mapping Spectrometers (TOMS), Ozone Monitoring Instrument (OMI) and Global Ozone Monitoring Experiment (GOME), with temporally collocated ground-based measurements from the Brewer spectrophotometer 070 located in Madrid.Because of its history of excellent maintenance, this Brewer instrument has an excellent accuracy (Redondas et al., 2002(Redondas et al., , 2008)).Although several validation exercises have been carried out in the Iberian Peninsula using Brewer instruments (e.g., Anton et al., 2008Anton et al., , 2009a, b), b), the present work adds a novel method to analyze the dependency on turbidity and cloudiness of the differences between the satellite and the ground-based data.It is known that the clouds are a great obstacle for a precise determination of ozone concentrations from satellite instruments (Koelemeijer and Stammes, 1999; J. L. Camacho et al.: Influence of turbidity and clouds on satellite total ozone data Liu et al., 2004).In addition, it has been reported that tropospheric and stratospheric aerosols might degrade the accuracy of the ozone column derived from satellite instruments (Dave, 1978;Torres and Barthia, 1999).Thus, this paper is expected to contribute to improve the knowledge of TOC satellite observations under these atmospheric conditions.
The instrumentation and the data used in this paper are described in Sect. 2. Section 3 describes the methodology followed in the analysis.Section 4 presents and discusses the results obtained in this work and, finally, Sect. 5 summarizes the main conclusions.

Ground-based measurements
The National Radiometric Centre of Spanish Meteorological Agency (AEMET) is located at the northwest area of Madrid city (Spain), at coordinates 40 • 27 N, 3 • 43 W. The global solar irradiance is recorded by a Kipp&Zonen CM21 piranometer, using a solar tracker to provide diffuse solar radiation.In addition, direct solar irradiance is measured by a pirheliometer Kipp&Zonen CH-1.Daily integrated radiation data (global, direct and diffuse) have a continuous and high quality record and they are expressed in 10 kJ m −2 .Cloudiness data have been taken by manmade observations from the neighbour Observatory of Retiro also in Madrid.
The TOC data are measured in the National Radiometric Centre by means of the MKIV Brewer spectrophotometer no.70 since 1993.This instrument is biannually calibrated at El Arenosillo station (Huelva, Spain) by comparison with the travelling references Brewer 017 from the International Ozone Services (IOS) and Brewer 185 from the Regional Brewer Calibration Centre Europe (RBCCE) (Redondas et al., 2002(Redondas et al., , 2008)).Thus, the ozone calibration of the Brewer instrument no.70 is traceable to the triad of international reference Brewers maintained by the Meteorological Service of Canada (MSC) at Toronto.In this work, we only use the TOC records obtained through the Direct Sunlight (DS) measurements in order to ensure high quality.It is known that Brewer DS measurements have a relative accuracy of ±1% over extensive time series when the instruments are properly calibrated and regularly maintained (WMO, 1996).Brewer operations and maintenance procedures are completely standard and further details about them can be found in the work of Kerr et al. (1984).TOC data are regularly sent to the World Ozone and Ultraviolet Data Centre (WOUDC).

Satellite observations
The last NASA TOMS instrument was launched in July 1996 aboard the Earth Probe (EP) satellite.The EP-TOMS instrument measures solar irradiance and the radiance backscattered by the Earth's atmosphere in six selected wavelength bands in the ultraviolet spectral region (between 308 nm and 360 nm) (McPeters et al., 1998).The retrieval algorithm applied to EP-TOMS observations is the long-standing NASA TOMS Version 8 (V8) algorithm (Bhartia and Wellemeyer, 2002).The EP-TOMS instrument began to display significant errors in the TOC data after the year 2000.This problem is believed to be a complex issue involving the inhomogeneous degradation of the scanner mirror on the EP-TOMS instrument causing a calibration error even after on-board correction methods (Haffner et al., 2004).We use a new corrected version of the EP-TOMS TOC data set (EP-TOMS V8-corrected), which applied an empirical calibration technique to remove errors for the period extending from July 1996 to December 2005 (McPeters et al., 2007).The work of Antón et al. (2010) showed that the empirical correction of the EP-TOMS data record provides a reprocessed set of high quality.
The Ozone Monitoring Instrument (OMI) (Levelt, 2002) launched in 2004 on the NASA EOS-Aura satellite continue the global monitoring of ozone performed by the NASA TOMS series of instruments that have flown since 1978.The OMI instrument is a nadir-viewing wide-swath UV/VIS hyperspectral spectrometer measuring solar light reflected and backscattered from the Earth's atmosphere and surface in the wavelength range from 270 nm to 500 nm with a spectral resolution of 0.45 nm in the ultraviolet and 0.63 nm in the visible.In this work, we use the OMI TOMS TOC dataset, named OMTO3 collection 3 (hereafter denoted as OMI TOC data), which was released in 2008 using the updated TOMS Version 8.5 algorithm.
Aerosol Index values from EP-TOMS and OMI provide useful information about aerosol load (Hsu et al., 1999).These values are taken from overpasses files downloaded from the NASA website: http://jwocky.gsfc.nasa.gov/ozone/ozoneother.html.
The ESA Global Ozone Monitoring Experiment (GOME) on board the Second European Sensing Satellite (ERS-2) is the first European space-borne, UV-visible-near-infrared spectrometer, which has been recording global TOC measurements since July 1995 (Burrows et al., 1999).A detailed instrument description can be found in the GOME User's Manual (ESA, 1995).GOME takes 3584 spectral channels in the range 240 to 793 nm with a spectral resolution of 0.2 to 0.4 nm.In this paper, the GOME Data Processor (GDP) version 4.4 (Loyola et al., 2010) has been applied in order to derive TOC data.This technique is based on the standard Differential Optical Absorption Spectroscopy (DOAS) retrieval method.

Methodology
Firstly, we characterize the effects or clouds and aerosol by the use of a multivariate and automatic un-supervised method (cluster analysis) that allows to classify every day according with radiation behaviour including cloudiness and aerosol  content.A description of cluster analysis techniques could be found at Wilks (2006) and one example of application on separating hazy days from clear days could be found in Gutiérrez et al. (2007).The data sets used in this work for Madrid were: daily global, direct and diffuse solar irradiance, cloudiness and Aerosol Index (AI) taken from Madrid satellite overpass.As cloudiness and AI ranged only between few unities, we reduce the range of radiation data using relative deviation from normal values (in the summer months) for radiation values instead absolute ones.This allows a better balance between variables and a more efficient cluster classification.
After different trials, using always Eulerian squared distance, three groups of four clusters every one have been created.Every group has the same input variables except AI.First group, using EP-TOMS AI would characterize days from 1996 to 2001 to be applied to EP-TOMS comparison.Second group, using OMI AI, would characterize days from 2006 to 2008 to be applied to OMI comparisons.Finally, the third group utilized both AI data set (they never overlapped in time) and allow us to characterize days with GOME measurements from 1996 to 2008.Cluster agglomerative hierarchical method selected were "Farthest neighboug" for EP-TOMS and "Ward Method" for OMI.GOME categories are a blend of both depending on available EP-TOMS or OMI AI observations every day.Selected methods gave adequate spread of data into different clusters and their outputs are meaningful.
Every group have four different clusters with characteristic centroids.Such centroids are constituted by the average value for the five chosen variables.A visual inspection of the scattergrammes and the centroids values showed that it is possible to interpret every group according specific physical properties and thus, labelled them as cloudy days, very cloudy days, (almost) clear days and (high) aerosol (load) days.Direct radiation irradiance vs. Diffuse radiation is shown at Fig. 1 (for the sake of helping in understanding physical significance, we represent the absolute radiation values for every classified day instead the relative values used to build the clusters).AI vs. Cloudiness is shown at Fig. 2 and centroid values for every cluster at every group is provided at Table 1.Four categories emerged: the first cluster (1) characterized by lower than normal values of global and direct and higher than normal diffuse daily irradiance values, 4 to 5 oktas of cloudiness and no remarkable aerosol loads could be described as "cloudy" days., second cluster have important decreases in global and direct radiation below normal values and a high increase in diffuse, cloudiness around 7-8 oktas and a little bit higher values on AI that the precedent cluster.Due to the high values in cloudiness and the behaviour of radiation variables, it could be assigned the label of "very cloudy" days.Third cluster have higher normal direct radiation and lower normal diffuse radiation than the other clusters, cloudiness is around 0 to 1 oktas and has no significant AI average.Due to cloudiness values and radiation variables behaviour we assigned to them the label "clear" days.Finally, the fourth cluster has a significant reduction as in global as in direct radiation, and it presents an increase above average in diffuse radiation (especially in EP-TOMS datasets).In addition, average AI values are very high (around 2.0).Thus, we assign to this cluster the label (high) "aerosol" (load) days.It can be seen in Figs. 1 and 2 that this cluster has comparatively few cases in comparison with the total number of data pairs.This classification allows performing a separate intercomparison between Brewer and satellite TOC data for each cluster, quantifying the effect of the presence of turbidity and clouds in the satellite TOC measurements.It is well known that the satellite TOC data present notable dependence on the solar zenith angle (SZA) (Balis et al., 2007;McPeters, 2008; www.ann-geophys.net/28/1441/2010/Ann.Geophys., 28, 1441-1448, 2010  Antón et al., 2008Antón et al., , 2009b)).Thus, to apply our technique and to get a real quantification of the effect of turbidity and clouds, it is necessary to remove the influence of the SZA variability in satellite TOC data.In this sense, we only use summer days (June, July and August) in order to limit the SZA values.We choose these months because it is the period with the highest number of days with DS Brewer measurements.
A linear regression analysis was performed for Madrid data sets.Adjustment parameters: offset and slope plus their respective errors, coefficients of correlation (R 2 ) were obtained.In addition the mean bias error (MBE) and the mean absolute bias error (MABE) between satellite retrievals and Brewer measurements were calculated for each data set.Those last parameters are obtained by the following expressions: The uncertainty of MBE and MABE is characterized by the standard deviation.

EP-TOMS
Once defined the four clusters, a linear regression analysis is applied to 464 pairs of EP-TOMS and Brewer data for the whole data set and for the three selected cluster.Figure 3 show the four scatter plots corresponding to these data sets.It can be seen that the Brewer and EP-TOMS data show a different behaviour under the various atmospheric circumstances sampled.Statistical parameters obtained in the regression analysis are shown at first third of Table 1.Thus, the cluster 3 (clear sky conditions) presents the best agreement between the satellite and ground-based TOC data, with a slope of 0.95, and a correlation coefficient of 0.89.
The cluster 1 (cloudy) and 4 (aerosols) show smaller slopes of 0.91 and 0.75, respectively, and lower correlations coefficients of 0.80 and 0.85, respectively.The negative sign of the MBE parameters indicate that EP-TOMS instrument slightly underestimates on average the Brewer data in Madrid for the three clusters.The value of this parameter changes from −0.50% (cluster 1) to −0.65% (cluster 4), with an intermediate value of −0.61% for cluster 3. The parameter MABE reports about the absolute value of the relative differences between the satellite and the ground-based data.Thus, it can be seen that the cluster 2 (very cloudy) present the highest MABE value (∼4.6%) and also the worst results for any other parameter in any cluster, showing that clouds affect to the accuracy of EP-TOMS TOC observations.Cluster 1 representing cloudy days have a MABE close also to 2% in comparison with 1.2% in cluster 3 "clear" and 1.6% in cluster 4 "aerosols".This fact is due to that, under cloudy conditions, the EP-TOMS ozone retrieval must estimate the amount of ozone below the cloud top (McPeters et al., 1998) the so called ghost column.The TOMS V.8 algorithm obtains this ghost column amount from the TOMS V8 ozone profile climatology and a satellite IR-observations based cloud pressure climatology, generating significant uncertainties in the retrieval of the total ozone column (Lamsal et al., 2007).On the other hand, it is noticeable that results for cluster 4 are not far from cluster 1, and those calculations with the last version of EP-TOMS data derived from a corrected version 8 algorithm, have a clear improvement over calculations using earlier versions (not shown) when aerosol index is high as in cluster 4.
This result suggests that the turbidity still might also degrade the accuracy of the TOC retrieval in the EP-TOMS instrument but improvement has been made with the last  retrieval algorithm.This result is in agreement with early studies that showed that the scattering and absorption effects of tropospheric and stratospheric aerosols may modify the UV radiation field and affect the TOC values derived from satellite instruments (Dave, 1978;Torres and Barthia, 1999).

OMI
Similarly to the TOMS-Brewer study detailed in the above subsection, we have performed a cluster analysis using the global, direct and diffuse radiation in Madrid during the months of June, July and August from 2006 to 2008 for the OMI-Brewer study.In this sense, there are 210 pairs of TOC data available to perform the linear regression analysis.Middle third of Table 2 shows the statistical parameters obtained in this study.The offset, slope and R 2 coefficient derived from the linear adjustment for OMI is better than the results obtained for EP-TOMS.The increase of underestimation for cluster 3 is associated with the real behaviour of the Brewer-OMI differences since the possible influence of the clouds and turbidity has been removed in this data set.This result is in agreement with the work of Anton et al. ( 2009) which showed that the OMI underestimation of Brewer data is usually higher for cloud-free conditions than for cloudy cases in the Iberian Peninsula.MABE value of 2.0%±0.8%could be a good estimation of such bias in summer conditions over central Iberian Peninsula.It is noticeable that all cluster, including cluster 2 "very cloudy" days showed high R 2 values, peaking at 0.96 for clear cluster but still 0.94 as lower value for aerosol cluster.Cloudy days showed MBE and MABE around −2.0% and 2.1%.The number of aerosol days is only 4 but the linear regression was statistically significant at least at 95% level of confidence.For these cases, the OMI data underestimate the Brewer measurements with MBE value of −1.5% (MABE of 1.5%).Improvement of OMI data over EP-TOMS is mainly due to TOMS V8.5 algorithm used by the OMI that includes a cloud-top pressure derived from the OMI data with the Rotational Raman Scattering (RRS) algorithm.This modification with respect to the version V8 used by TOMS produces that TOC values obtained by the TOMS V8.5 algorithm are more accurate than those of the previous version under cloudy conditions (Yang et al., 2008).On the other hand, the difference between the results for cluster 3 and 4 indicates that the turbidity also presents a slight influence over the OMI TOC retrieval.Thus, the OMI underestimation of Brewer data is lower for turbidity conditions than for the cloud-free cases.In addition, it can be seen that the MBE and MABE values for the cluster 3 and 4 present identical absolute values.This reveals the presence of a clear bias with a small statistical spread.The uncertainty of MABE parameters is lower than 1.1% which indicates the statistical significance of the reported values.

GOME
Over the whole 1996-2008 a total of 315 pairs of GOME-Brewer valid data has been identified and provide the analysis using the same cluster tools than in Sects.4.1 and 4.2.Results over the whole summer period and over the four cluster groups are shown in the lower third part of Table 2.The slope and correlation coefficient using all available data are smaller that the results obtained in other Brewer-GOME comparison exercises in the Iberian Peninsula (e.g., Anton et al., 2008).This fact may be due to a statistical issue since the dynamic range of TOC sampled in summer is much smaller than if all months are analyzed together.Table 2 shows that higher underestimation corresponds to cluster 2 (1.14%) against 0.77% and 0.63% for cluster 1 and 3, respectively.In addition, the MABE parameter for cluster 2 also has the highest values (2.51%).These results indicate that the very cloudy conditions significantly affect to the GOME retrieval.
In contrast, the results for the rest of cloudy conditions represented by the cluster 3 show a very similar behavior than the corresponding to the clear sky conditions.This result confirms the good cloud treatment that it is performed by GOME retrieval with GDP 4.4.The GDP 4.x retrieval includes two algorithms for the determination of cloud properties from GOME measurements (Loyola, 2007).The OCRA algorithm uses data fusion techniques to derive the cloud fraction from the sub-pixel PMD measurements, while the ROCINN algorithm derives the cloud-top height and cloud-top albedo from the reflectivity in and around the Oxygen A band at 760 nm.The results of the cluster 4 show that the GOME data overestimate on average the Brewer measurements (0.51%) for turbidity conditions.Nevertheless, it can be seen that the MABE value is the smallest for the four clusters (1.19%), and the ground-based-correlation is quite high (R 2 ∼0.95).These good results suggest that the GOME algorithm is not significantly affected by the turbidity conditions.

Conclusions
In this work, we have performed a cluster classification based on solar radiation (global, direct and diffuse), selecting hazy, cloudy and clear days in Madrid (Spain).The analysis of the differences between Brewer and satellite TOC data for each cluster has drawn some important conclusions.The accuracy of EP-TOMS TOC observations are significantly affected by cloudiness, suggesting that the TOMS V8 retrieval method under cloud conditions present serious uncertainties.In addition, the turbidity has a slight influence on EP-TOMS TOC data.The TOC data derived from the OMI instrument with the TOMS V8.5 algorithm has been substantially improved under cloudiness, although slight remains differences are observed with respect to clear sky conditions.Finally, the GOME instrument also shows a very smooth effect of clouds and turbidity on the TOC observations obtained with GDP 4.4, indicating a robust retrieval algorithm over these conditions.
We would like to note that these results could be affected by the pixel size of the satellite observations.This fact is especially important for the GOME instrument, with a very large ground pixel: 320 km (across orbit) ×40 km (along orbit).Thus, this large GOME pixel could mask the local conditions over Madrid for aerosol and cloudiness.

Figure 1 .
Figure 1.Scattergram direct versus diffuse solar radiation over Madrid in June-July-August for every day with satellite TOC and AI selected value from EP TOMS and OMI instruments in the period 1996-2008.Four clusters classification based on global, direct, diffuse radiation values and cloudiness at 13 TMG over Madrid plus Aerosol Index from satellite overpasses over Madrid.

Fig. 1 .
Fig. 1.Scattergram direct versus diffuse solar radiation over Madrid in June-July-August for every day with satellite TOC and AI selected value from EP TOMS and OMI instruments in the period 1996-2008.Four clusters classification based on global, direct, diffuse radiation values and cloudiness at 13 TMG over Madrid plus Aerosol Index from satellite overpasses over Madrid.
Figure 2. Scattergram satellite aerosol index versus cloudiness at 13 hours TMG measured in oktas over Madrid in June-July-August for every day with satellite TOC and AI selected value from EP TOMS and OMI instruments in the period 1996-2008.Four clusters classification based on global, direct, diffuse radiation values and cloudiness at 13 TMG over Madrid plus Aerosol Index from satellite overpasses over Madrid.Scattergram satellite aerosol index versus cloudiness at 13 h TMG measured in oktas over Madrid in June-July-August for every day with satellite TOC and AI selected value from EP TOMS and OMI instruments in the period 1996-2008.Four clusters classification based on global, direct, diffuse radiation values and cloudiness at 13 TMG over Madrid plus Aerosol Index from satellite overpasses over Madrid.

Table 1 .
Centroid values for Clusters derived relative deviations from the average value in summer months (in percentage) from solar global, direct and diffuse irradiances, cloudiness at 13 h TMG in oktas and Aerosol Index from TOMS and OMI.

Table 2 .
1996meters obtained in the correlation analysis between different TOC satellite-Brewer datasets over Madrid in the months of June, July and August.Comparisons EP-TOMS are over the period1996-2001, OMI are over the period 2006-2008 and GOME over the period  1996-2008whenever a data exists to be matched with Brewer data.N is number of data pairs.Offset (in Dobson Units) and slope (without units) are the values of the parameters of the linear adjustment, R 2 is the value of squared correlation coefficient, MBE is the Mean Bias Error and MABE is the Mean Absolute Bias Error.C1 means •"cloudy" days as selected by cluster classification described at the text, C2 means "very cloudy" days, C3 means "clear" days and C4 (high) "aerosol" (load) days.