© European Geosciences Union 2004

Abstract. We describe a sequential assimilation approach useful for assimilating tracer measurements into a three-dimensional chemical transport model (CTM) of the stratosphere. The numerical code, developed largely according to Kha00, uses parameterizations and simplifications allowing assimilation of sparse observations and the simultaneous evaluation of analysis errors, with reasonable computational requirements. Assimilation parameters are set by using χ 2 and OmF (Observation minus Forecast) statistics. The CTM used here is a high resolution three-dimensional model. It includes a detailed chemical package and is driven by UKMO (United Kingdom Meteorological Office) analyses. We illustrate the method using assimilation of Upper Atmosphere Research Satellite/Microwave Limb Sounder (UARS/MLS) ozone observations for three weeks during the 1996 antarctic spring. The comparison of results from the simulations with TOMS (Total Ozone Mapping Spectrometer) measurements shows improved total ozone fields due to assimilation of MLS observations. Moreover, the assimilation gives indications on a possible model weakness in reproducing polar ozone values during springtime.


Introduction
The analysis of the change in the ozone distribution, and of atmospheric chemistry in general, is severely hampered by a lack of consistent data sets.This prevents one from gaining insights into the role of chemistry and dynamics in determining the ozone distribution.These issues are usually studied by comparing observations with results from numerical modelling.One of the most important sources of data Correspondence to: B. Grassi (barbara.grassi@aquila.infn.it) on the atmosphere are satellite measurements.Space-borne observations have the advantage of giving valuable data over much of the world; however, data resolution is usually unsatisfactory to resolve small-scale patterns and moreover, maps of data are asynoptic because instruments collect observations only along satellite tracks, and long periods of time are required to obtain a nearly complete global coverage of the Earth.On the other hand, models reproduce the chemical evolution of the atmosphere with an uncertainty due to the discretization of spatial and temporal scales, the parameterization of complex phenomena and the incomplete knowledge of some atmospheric processes.A data assimilation scheme allows us to simultaneously make use of available observations, theoretical understanding, and a priori information, within a mathematical framework.The technique seeks to produce an analysis which fits a set of observations taken over a "time window", subject to the constraint that the evolution of the analyzed quantities is governed by a deterministic model describing the given observations.This methodology represents a powerful tool for the understanding of chemical and dynamical atmospheric processes.
Data assimilation techniques are largely used for numerical weather prediction applications and recently have been applied to different fields of geophysical studies.Several research groups have demonstrated that these techniques can be very successfully applied for analysis of atmospheric chemical observations from a satellite (e.g.Fisher and Lary, 1995;Lyster et al., 1997;Khattatov et al., 2000;Stajner et al., 2001;Struthers et al., 2002) or in-situ ozone data from aircraft measurements (Cathala et al., 2003).Asynoptic satellite observations can be used to produce a set of self-consistent synoptic "chemical analyses'" of the observed species, and then, synoptic analyses can be inferred for species included within the model even though not actually observed; the methodology can also be useful for satellite instrument or model result validation.The present paper is organized as follows: the second section presents a description of the simulation performed; the third section describes the model and the formulation of the assimilation algorithm.The results are then presented in the following section while in the final section results are discussed and summarized.

The simulation
As noted in the Introduction, the aim of data assimilation is to ascertain the best possible estimate of the state of a system, by combining information from observations with an appropriate model of the system itself.A sequential assimilation algorithm, developed largely according to Khattatov et al. (2000), to assimilate measurements of long-lived species in the stratosphere, was coupled with a three-dimensional chemical transport code of the stratosphere, which simulates the evolution of atmospheric chemical species.The resulting system, described below, was used to assimilate UARS MLS ozone observations for a three-week period during the 1996 antarctic spring.Instruments on board UARS provided a rich set of observations of trace species concentrations in the middle atmosphere (Reber et al., 1993).We used UARS MLS level 3AT data and selected MLS ozone observations from 10 to 29 October 1996.Following Froidevaux et al. (1995), only data at UARS levels 46.4,21.5, 10.0, 4.6, 2.1, 1.0, and 0.464 hPa, considered to be reliable, were used for the assimilation.For those seven pressure levels the relative errors, calculated from estimated precision and accuracy, are 0.23, 0.06, 0.05, 0.05, 0.06, 0.14 and 0.25, respectively.The chosen period results are particularly interesting and favorable for two reasons: 1. Southern polar latitudes characterized by a peculiar chemical-dynamical situation (polar vortex particularly stable, leading to an "ozone hole" persisting until early December 1996); 2. Availability of 12 days of MLS data during the assimilation period from approximately 35 • N to about 80 • S. We also considered TOMS ozone column data for the Southern Hemisphere and HALOE (HALogen Occultation Experiment) ozone profiles in the middle latitudes, relative to the same period.Note that TOMS and HALOE data were not used in the assimilation, but considered as an independent data set to validate the analysis results.
3 The chemical model and the assimilation scheme

STRATAQ CTM
The STRATAQ model (Grassi et al., 2002) is a threedimensional chemistry transport model of the stratosphere that extends from 0.6 km to about 52 km in altitude with a vertical resolution varying from about 1 km below 20 km and up to 2.5 km in the higher stratosphere (see Table 1).The horizontal resolution is 5 • longitude by 2 • latitude.The chemical package includes 43 chemical species, 89 homogeneous gas-phase chemical reactions and 33 photolytic reactions.To evaluate photodissociation rates J we used a "lookup table" calculated by a radiative one-dimensional model based on the delta-Eddington approximation (Joseph et al., 1976).The rate constants, as well as solar flux values and absorption cross sections for chemical compounds, were taken from DeMore et al. (1997).A treatment of heterogeneous processes, occurring on the surface of stratospheric background aerosols, NAT (Nitric Acid Tetrahydrate) and ice particles, is included in the model.The advection, driven by the UKMO analyses, is calculated using a semi-lagrangian transport code (Lin and Rood, 1996).The semi-implicit symmetric (SIS) method (Ramaroson et al., 1992) is used to integrate the chemical continuity equation.The model chemical and dynamical time step is 12 min.For this simulation the model was initialized using output fields from long-term integration performed with the low resolution version of the 3-D SLIM-CAT CTM (Chipperfield, 1999) relative to 10 October 1996.
An initial background error of 10% was fixed for the ozone field (see, e.g.Levelt et al., 1998;Struthers et al., 2002).

Assimilation scheme
In this work a sequential assimilation technique, based on the suboptimal Kalman filter, (e.g.Ménard et al., 2000), was used to evaluate the "best" value of the state of the system, or analysis X a , which is prior information.Prior information is also given by observations Y and independent estimates of the system X b , (also called background or "forecast" when the background is a model prediction), and by error covariance matrices of the background and observations.A characteristic feature of the Kalman filter is the computation of the time evolution of the forecast error covariance.It can be shown (e.g.Lorenc, 1986), that the expression for the analysis X a is given by: where and H is the observation operator, which represents the transformation from the forecast space to the observation space and is the composition of two operators; one operator represents the interpolation from the geographical locations of the model grid points to the locations of the observations while the other one denotes the relationship between the observed and estimated quantities at the geographical locations of observations.K is called the "Kalman gain matrix".Here B t is the forecast error covariance matrix at time t, O is the error covariance matrix of the observations and R the error covariance matrix associated with errors of interpolation and discretization.Note that the weight given to the observations and the region influenced by each observation is properly related to the observation errors and to the background error covariance matrix.Following Lorenc (1986), the analysis error covariance matrix is expressed as: (3) In our case the best estimate of the state of the atmosphere, for instance, the atmospheric ozone concentration field, is derived from a linear weighted interpolation of the vector X b , representing the CTM model forecast of the ozone field on the three-dimensional grid points, and the vector of observations, Y , representing ozone MLS profiles along satellite tracks.Operatively, the Kalman filter analysis, once we have fixed a time interval called the "assimilation window" at which the analysis is performed, consists of the following steps: 1. Initialize the vector X and the error covariance matrix B; 2. Calculate the updated X and B at the beginning of the assimilation window through model integration; 3. Collect all observations Y along the assimilation window (with relative errors) and use them together with X b and B t as prior information to perform the analysis, following equations above in the text.Therefore, all data over the same window are assumed to be taken at the same time; 4. Then use the obtained concentrations X a (t) as a initial condition for the chemistry transport model M to predict constituent concentrations at the following time step (i.e.: the beginning of the next assimilation window): where t is the time step used for the integration.
Using the linearization L=∂X(t+ t)/∂t of the original model M, according to Lyster et al. (1997), in the extended Kalman filter the evolution of the error covariance matrix due to the model integration is given by: Unfortunately, such an approach is impracticable when working with an a high dimensional problem such as a threedimensional global atmospheric system, due to the considerable computational cost.Therefore, some parameterizations have to be introduced.Following the approach of Ménard and Chang (2000), the evolution of the diagonal element error covariance matrix, b ii , due to the model integration, is written as: where x i is the i-th element of X.Therefore, B(X) is supposed to evolve as X itself while the second term on the righthand side represents additional errors due to imperfections in the model.The off-diagonal elements are parameterized as follows: where D xy and D z represent horizontal and vertical distances between locations i and j .This same parameterization is used to calculate the B a off-diagonal elements, while only diagonal terms are updated using Eq. ( 3).Moreover, matrices O and R are assumed to be diagonal; diagonal elements of matrix O are set to observational error variance while diagonal elements of matrix R are parameterized as r ii =(ry i ) 2 , where r is the relative representativeness error, used as a multiplying factor for the observation y i .Note that in these equations L xy , L z , q and r are tunable parameters that can be calculated to achieve best agreement with observations using the χ 2 diagnostics and the OmF analysis (Ménard and Chang, 2000): Angular brackets denote arithmetic means calculated over different observation points in one assimilation window.
These quantities were evaluated during separate assimilation runs performed by setting different values of the tunable parameters.Results showed a primarily dependency of χ 2 on q and r parameters; on the contrary, χ 2 results were quite insensitive to the value of L xy and L z which were found to influence strongly the OmF values.Based on such results, the minimization of the OmF bias was used as the base criteria for choosing the best value of the correlation lengths (L xy =500.km, L z =0.3 of the standard atmospheric scale height).The tuning procedure was performed by varying L xy values from 300 to 3000 km and L z values from 0.2 to 1.0 standard atmospheric scale height.Resulting OmF residuals are presented and discussed in the next section.
Other parameters of the system, (q=0.015,r=0.13) were fixed to allow the value of <χ 2 >/N , where N is the number of observations used in the assimilation analysis, to tend to unity and not exhibit a temporal trend (Khattatov et al., 2000).The <χ 2 >/N time behavior over the assimilation period is shown in Fig. 1.
The growth rate of the forecast error q and the relative representativeness error r appear to be a little bigger than in other ozone assimilations into a CTM (e.g.Khattatov et al., 2000, where q=0.0135 per hour and r=0.1).These differences may be partly related to the additional interpolation of the MLS data onto STRATAQ vertical pressure levels (in Khattatov et al., 2000, vertical MLS and CTM levels coincided).We used an assimilation window of 1 h, corresponding to the time interval during which approximately 55 MLS ozone profiles were collected.

Results
Figure 2 shows analyzed ozone fields on 29 October 1996 at 6:00, 12:00 and 18:00 UTC and relative percent analysis error fields.Twelve hours of MLS data, from 6 h before to 6 h after the time of analyzed fields, are also shown for comparison.All maps are relative to the 10 hPa level which represents the central level of assimilated MLS data.As expected, the resulting analysis field works as an extension of MLS satellite data, originally sparse and scattered in time and space, to a regular grid for a selected time.Analyzed maps show a good comparison with MLS data.The ozone depletion inside the polar vortex recorded by the MLS instruments is well reproduced in the analyses with minimum ozone values of about 4 ppmv.The principal maxima in ozone values at middle latitudes are also well represented.Analyzed fields seem to match well the polar vortex tilting, showing also the rapid development of a filamentary structure near the antarctic peninsula.The analysis errors show values lower than 14% in the Southern Hemisphere, with maxima at low latitudes and minima in the polar region.In fact, the density of observations in this region is fairly high and this is reflected in a lower associated variance.In general, we found error values of the same order as those shown in literature (e.g.Khattatov et al., 2000).
The typical structures in the error maps show the reduction of the analysis errors along the satellite track and the increase due to the model error growth term.The tracks have a distorted shape reflecting the advection of the forecast standard deviation field.
Another view of the differences between MLS observations and assimilation results is shown in Fig. 3.
These plots present the monthly and zonally averaged difference (top) between observations and a first guess forecast (OmF) and its standard deviation (bottom).As explained when describing the assimilation code, OmF bias was minimized to set the correlation lengths of the forecast errors.The bias reflects data accuracy and precision and, at the same time, it is a measure of the forecast skill.The bias shows absolute values lower than 20%; it is usually below 5% between approximately 30-50 km, with maximum values up to 10%, and, as expected, it grows above and below this region in correspondence to a decrease of accuracy and precision of MLS data.For the same reason, the standard deviation, whose values are lower than about 40%, shows the overall lowest values, up to 15%, between 30 to 50 km, increasing above and belove.
As seen from the type of information obtained from Fig. 3, OmF analysis provides an important quality control mechanism and highlights systematic differences between model results and data.Figure 4 shows another view of the same quantity.In the plot, globally averaged OmF differences (i.e.MLS minus forecast) are plotted versus time over the different pressure levels where assimilation is performed.
Within the pressure range 10.0 hPa-1.0hPa the mean difference (bias) between MLS observations and the first guess forecast is usually positive, with mean values of a few Fig. 2. Results of the assimilation for October 29, 1996, at 10 hPa.Top panels: Analyzed ozone, ppmv; center panels: analysis error in percent; bottom panels: MLS measurements, ppmv.From left to right results are at 6.00, 12.00 and 18.00 UTC. 15 Fig. 2. Results of the assimilation for 29 October 1996, at 10 hPa.Top panels: Analyzed ozone, ppmv; center panels: analysis error in percent; bottom panels: MLS measurements, ppmv.From left to right results are at 6:00, 12:00 and 18:00 UTC.
percents and maximum absolute values of less than 10%.For these levels the behavior of OmF shows a spin-up time of about 40 analysis cycles.This spin-up time, due to a bias in the initial ozone field, increases to about 60 analysis cycles at 21.6 hPa and to more than 100 analysis cycles at 0.46 hPa and 46.4 hPa.At the same time, over these three pressure levels, the graphics show higher OmF variability.The increasing spin-up time and OmF temporal variability are reasonably related to the larger MLS observation errors at 0.46 hPa, 21.6 hPa and 46.4 hPa, which are reflected in the lower information content in the MLS observations and in model results less constrained by observations.Moreover, the spin-up time is also influenced by the ozone chemical time scale which decreases with altitude.
Analysis of Fig. 4 may give further indications about the possible origin of the higher OmF values at top and bottom assimilation levels, shown in Fig. 3, relating them to a bias in the initial (unrealistic) ozone field at the bottom level and to a standard bias in the forecast at the top level.The standard bias in the forecast of about 25% at 0.46 hPa is a sign of a systematic model underestimation of ozone values on this level.Absolute values of ozone are determined mostly by vertical transport and photochemical processes.The ozone chemical lifetime drops rapidly with altitude and, at high altitudes, ozone levels are totally determined by the fast photochemistry.Also, only a process operating on short time scales (diurnal or less) compared to the frequency of assimilation (about 1000 profiles a day) can maintain a systematic model bias of 25%.The diabatic descent affects ozone on very long time scales compared to the frequency of assimilation.On the basis of these considerations, only an incorrect estimation of the model's fast photochemistry can explain a bias which increases with altitude.
Our assimilated field was also used to calculate the ozone total column for 29 October 1996, to be compared with observations of the TOMS instrument on the NASDA/ADEOS spacecraft for the same day (Fig. 5).
To improve the comparison, the "assimilated column" is calculated at each point in the local solar time of the satellite measurement.TOMS data show values between 100 and 500 DU, in reasonable agreement with assimilation results.The assimilated field fairly reproduces morphology and intensity of ozone depletion shown by TOMS.A quantitative comparison, performed on selected model grid points, between co-located TOMS, model analysis and model only control field data is given in Fig. 6, in order to try and obtain a clearer evaluation of the performance of the assimilation.
Figure 6 shows column ozone values at latitudes below −62 • , chosen over grid points along four different directions crossing the South Pole.The comparison shows that the model field tends to overestimate column ozone values with respect to TOMS data, in the inner vortex, up to 30%.The assimilation procedure corrects the model forecast, in the right direction, toward TOMS independent data, leading to a final absolute difference between TOMS and the analyzed field of less than 10%, on average.Such results seem to confirm the hypothesis suggested in a previous work (Grassi et al., 2002) of an incorrect model estimation of the vertical velocity in the polar regions that causes an excess of ozone subsidence from higher model levels, resulting in an overestimation of ozone column values.
A slightly different behavior can be seen on the points located on the outer part of the vortex where little differences can be found between model and analyzed fields, while both showing values less than TOMS data.Those differences, of about 10%, on average, seem to be partially induced by the difficulty of correctly resolving strong meridional dynamical gradients.Also, the tropospheric contribution to the ozone column, not considered in our calculation, may be relevant at middle latitudes.An additional comparison is shown between profiles from the UARS/HALOE (Halogen Occultation Experiment) instrument and analysis profiles for the assimilation run (Fig. 7).The graphics refer to four different latitudes in the latitude band spanned by the instrument (about 8 • -35 • N) during the last three days of the assimilation run.
The agreement between analysis results and independent HALOE data is fairly good if we neglect the points located above 2 mb, where the comparison seems again to highlight the existence of a model bias not removed by the assimilation.Different values of the error bars near the ozone maximum reflect different vertical distance between the CTM levels and the MLS levels where the assimilation is performed.

Conclusions
Assimilation of atmospheric chemical observations has been recently developed and used in a number of different CTMs (e.g.Errera and Fonteyn, 2001;Khattatov et al., 2000;Lamarque et al., 1999;Levelt et al., 1998;Struthers et al., 2002).In our work, a data assimilation system of chemical species coupled with STRATAQ CTM is described and preliminary results discussed.In this context, the computation of time evolution of the error covariance matrix is a crucial point.In our scheme this quantity was evaluated using a certain number of simplifications and parameterizations.The model error growth rate was assumed to be linear.This produced high values of analysis errors in the regions where a small number of assimilated observations are present, as shown in the results.The adjustable parameters, calculated to evaluate the error covariance matrix by using some minimization criteria, showed reliable values if compared to previous similar studies.
Validation through independent observations has been performed on the total ozone column against TOMS data.Also, sparse ozone profiles are shown and compared with HALOE data.TOMS ozone column data give a general indication about the quality of the results; TOMS to CTM assimilated data comparison shows by itself how the assimilation technique improved the CTM capability in reproducing a real atmospheric situation.The analyses of ozone column values show an improvement of the results inside the vortex due to the assimilation leading to a reduction in the differences with TOMS data from about 25-30%, in the case of model only control values, to less than 10% in the case of assimilated values.The study of the temporal behavior of the OmF differences shows a model bias on the upper levels, suggesting an incorrect estimation of the model fast photochemistry at high altitudes.

Fig. 1 .
Fig. 1.Plot of <χ 2 >/N as a function of time for the whole assimilation period.Each value is computed for one assimilation analysis.

Fig. 3 .
Fig. 3. Time and zonally averaged: OmF (MLS minus forecast) mean difference (top), standard deviation from the mean (bottom).The temporal mean is calculated over all the assimilation period.

Fig. 4 .
Fig. 4. Globally averaged mean difference of OmF (MLS minus forecast), in percent between MLS ozone observations and co-located analysis ozone values as a function of time for different pressure levels.Horizontal scale units are the number of assimilation cycles performed from the beginning of the run.Note the different vertical scale in the plot relative to 0.46 hPa.

Fig. 6 .
Fig. 6.Ozone column values at model grid points along directions A, B, C and D, respectively (see map; for each direction 29 model grid points from −62 • lat to −62 • lat through the South Pole are taken into account).Red and blue columns are relative to model and analysis results, respectively, while green columns are co-located TOMS values (unrealistic TOMS data on the South Pole was omitted).