Estimation of the oceanic pCO2 in the North Atlantic from VOS lines in-situ measurements: parameters needed to generate seasonally mean maps

Automated instruments on board Volunteer Observing Ships (VOS) have provided high-frequency pCO 2 measurements over basin-wide regions for a decade or so. In order to estimate regional air-sea CO 2 fluxes, it is necessary to interpolate between in-situ measurements to obtain maps of the marine pCO2. Such an interpolation remains, however, a difficult task because VOS lines are too distant from each other to capture the high pCO 2 variability. Relevant physical parameters available at large scale are thus necessary to serve as a guide to estimate the pCO 2 values between the VOS lines. Satellites do not measure pCO 2 but they give access to parameters related to the processes that control its variability, such as sea surface temperature (SST). In this paper we developed a method to compute pCO 2 maps using satellite data (SST and CHL, the chlorophyll concentration), combined with a climatology of the mixed-layer depth (MLD). Using 15 401 measurements of surface pCO 2 acquired in the North Atlantic between UK and Jamaica, between June 1994 and August 1995, we show that the parameterization of pCO 2 as a function of SST, CHL and MLD yields more realistic pCO 2 values than parameterizations that have been widely used in the past, based on SST, latitude, longitude or SST only. This parameterization was then used to generate seasonal maps of pCO2 over the North Atlantic. Results show that our approach yields the best marine pCO 2 estimates, both in terms of absolute accuracy, when compared with an independent data set, and of geographical patterns, when compared to the climatology of Takahashi et al. (2002). This suggests that monitoring the seasonal variability of pCO 2 over basinwide regions is possible, provided that sufficient VOS lines are available. Correspondence to: C. Jamet (cedric.jamet@univ-littoral.fr)


Introduction
The burning of fossil fuels, the intensification of land-use, and the production of cement over the last 250 years have increased the atmospheric carbon dioxide (CO 2 ) concentration from 280 parts per million (ppm) prior to the industrial revolution (Neftel et al., 1985) to the current level of more than 380 ppm (Keeling and Whorf, 2000;Houghton et al., 2001).During that period, only about half of the CO 2 released by these anthropogenic processes has remained in the atmosphere.Both ocean and land biospheres are believed to act as sinks that have taken up the remainder (Hougthon et al., 2001;Orr et al., 2001;Sarmiento and Gruber, 2002).Contrary to the atmospheric CO 2 concentration, that can be accurately monitored using a global sampling network, the survey of oceanic pCO 2 , the quantity that largely controls the variability of the air-sea CO 2 flux, is a more complex task.
The seasonal and geographical variability of surface water pCO 2 is indeed much greater than that of atmospheric pCO 2 , and hence the direction of the sea-air CO 2 transfer is mainly regulated by the oceanic pCO 2 .The pCO 2 in the ocean mixed-layer, which exchanges CO 2 directly with the atmosphere, is controlled by seasonal changes in temperature, total CO 2 concentration and alkalinity.While the water temperature is primarily regulated by physical processes (i.e.solar energy input, sea-air heat exchanges, physical mixing and mixed-layer depth), the total CO 2 concentration and the alkalinity are primarily controlled by biological processes Monitoring the oceanic CO 2 partial pressure (pCO 2 ) at a monthly or seasonal time-scale is particularly important to estimate regional air-sea fluxes of CO 2 and thus to quantify the role of the ocean in the context of climate change.The major problem for the quantification of the oceanic sink is thus the spatial and temporal distribution of available in-situ pCO 2 data.Because pCO 2 in surface waters varies rapidly in space and time, the paucity of in-situ measurements has strongly limited the accuracy of such estimates.Worldwide networks of measurements of surface water pCO 2 have been initiated in the 1990s (Poisson et al. 1993;Takahashi et al., 1993Takahashi et al., , 1997Takahashi et al., , 2002, among others), among others).The North Atlantic, in particular, is the subject of intensive pCO 2 measurements, using Volunteer Observing ships (VOS) in the framework of the former CAVASSOO (2001CAVASSOO ( -2003) ) and of the ongoing CAR-BOOCEAN (2005-2009) European projects.These data sets can either be used to consolidate numerical models, which can be used, in turn, to compute the carbon budget, or, as in this study, to develop relationships between oceanic pCO 2 and other parameters available globally, such as the satellite sea surface temperature (SST), to interpolate maps of pCO 2 for the North Atlantic.
The marine pCO 2 is primarily related to the SST through the thermodynamic effect.Remotely sensed SST is thus often used as the only physical parameter to generate pCO 2 maps (e.g.Boutin et al., 1999;Lee et al., 1998;Nelson et al., 2001;Olsen et al., 2003Olsen et al., , 2004;;Stephens et al., 1995).However, other parameters, such as the chlorophyll-a concentration (CHL), are likely to provide relevant additional information to improve the interpolation of pCO 2 .Recent attempts have been made to include CHL from ocean color satellites, to account for the important role of the biological pump on surface pCO 2 (e.g.Louanchi, 1995;Ono et al., 2004;Rangama et al., 2005).On the contrary, very few studies have emphasized the importance of accounting for the vertical mixing through the mixed layer depth (MLD) (Dandonneau, 1995;Lüger et al., 2004).Moreover, all these studies were confined to relatively small regions, usually defined from the biogeochemical provinces of Longhurst (1995).Here we aim at generalizing this multi-parameter approach to basin-wide regions, such as the North Atlantic, using in situ measurements from VOS lines.

In situ pCO 2 database
Available pCO 2 measurements are unevenly distributed in both space and time, even for the relatively well-sampled North Atlantic.The seawater pCO 2 measurements performed regularly on board the ship-of-opportunity "Prince of Seas" (UK to Jamaica; Cooper et al., 1998Cooper et al., ) in 1994Cooper et al., -1995 constitute constitute an interesting and consistent data set to study the seasonal cycle of pCO 2 in North Atlantic waters.The "Prince of Seas" operated from Newport (UK) to Costa Rica via the Netherlands and Jamaica, on a five-week roundtrip, normally traveling different, but repeated routes on the westbound and return legs, as shown in Fig. 1. 15 401 in-situ pCO 2 are thus available between June 1994 and August 1995, with the exception of December 1994 and January 1995.Measurements performed within coastal waters, i.e. east of 6 • W, were removed from the database.The number of remaining data for each season is given in Table 1.They are not equally distributed, with more measurements available in spring and summer than in fall.

Global data sets
For the SST, we used the monthly AVHRR oceans Pathfinder global equal-angle best SST 9 km version 4.1 (Vasquez et al., 1998) for 1994 and 1995 (see Fig. 1).Casey and Cornillon (1999) evaluated the satellite-derived sea surface temperature climatology based on the AVHRR sensor with several other climatologies, for their usefulness in the determination of SST trends.They showed that the AVHRR SST climatology is more representative of spatial and seasonal SST variability than the traditional in-situ and blended SST climatology (Bottomley et al., 1990;Levitus and Boyer, 1994;Parker et al., 1995;Reynolds and Smith, 1995).Compared to the COADS in-situ climatology, for instance, the global standard deviation and the global mean of the AVHRR SST climatology anomalies are equal to 1.65 • C and 0.01 • C, respectively (Casey and Cornillon, 1999).For the CHL, we used the Sea-WiFS Level-3 Standard Mapped Products monthly mean climatology (1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004) provided by the NASA/GSFC/DAAC at a resolution of 1/12 • .The use of a climatology was necessary because no ocean color sensor was operated in 1994 and 1995.For the MLD, we used the climatology of de Boyer et al. ( 2004) projected on a 2 • grid.This MLD climatology has been computed from 4 490 571 individual profiles obtained between 1941 and 2002 from the National Oceanographic Data Center (NODC) and from the World Ocean Circulation Experiment (WOCE) database.The seasonal mean of each parameter is shown in Fig. 1.

Multiple linear regression method
Every in-situ pCO 2 measurement was associated with the following coincident information: month, longitude, latitude that come with the measurement, and SST, CHL, MLD extracted from the global products described in Sect.2.2.To co-locate the pCO 2 with SST, CHL and MLD values, we chose the SST (or CHL or MLD) corresponding to the month and the position (longitude and latitude) of the in-situ values (we looked for all values of pCO 2 inside a box with the resolution of the selected data (SST or CHL or MLD)).The process of co-location is straightforward.
We applied a multiple linear regression on a seasonal basis to the database presented in the previous section, in order to quantify the impact of adding CHL and MLD to SST on the accuracy of the parameterization of the oceanic pCO 2 .The four seasons are defined as follows: winter from January to March, spring from April to June, summer from July to September and fall from October to December.
The first step was to separate the whole data set into two independent subsets, one for the development of the parameterization and one for the validation.This has been done randomly by selecting 80% of the data for the parameterization subset (training data set) and 20% for the validation data set.
The second step was to pre-process the data.Since CHL values cover several orders of magnitude, we preferred to use log10(CHL) as an input parameter of the regression.Moreover, since the different parameters involved in the regressions do not vary over the same range of values, we also normalized the various input parameters.For each parameter x i we computed the monthly mean and the variance σ (x i ) 2 over the season, and we obtained the normalized value using: where x N i has a mean value equal to zero and a standard deviation equal to (2/3) over the transformed data set.The same normalization was applied to pCO 2 values.
The three following regressions were applied season-byseason to the training set: Note that the approach of Eq. ( 4), which relies on both SST and geographical coordinates, has often been used in previous works (e.g.Stephens et al., 1995;Lefèvre and Taylor, 2002).Even if taking into account explicitly the location of the measurement in the regression has no physical justification, these studies have shown that the fit is generally improved, at least for relatively small regions.Note also that there is non constant term in the three equations, as we have normalized the data as mentioned previously.
Once the regression coefficients are calculated, it is straightforward to recalculate the unscaled pCO 2 values with the following equation: (5)

Comparison of the parameterizations
For each season, we assessed how accurately the three multiple linear regressions can reproduce the observed pCO 2 by comparing their correlation coefficients r and their rootmean squared (RMS) errors in Table 2 (the regression coefficients and their uncertainties are also given in the table).Seasonal scatterplots (winter, spring, summer and fall) highlight the main differences between the three multiple linear regressions (Fig. 2).All regressions lead to RMS values lower than the spread of the data, given by their standard deviation in Table 1.Equation (2) leads to poor estimates of pCO 2 in winter and fall but works fairly well in spring and summer, with correlation coefficients varying by 0.05 in winter and 0.73 in summer.Equations ( 3) and ( 4) yield significant and similar improvements, in terms of both correlation coefficients and RMS error, essentially in winter and fall.These two equations also lead to improvements in summer, especially for Eq.(3).Using this equation, the RMS decreases from 14.27 to 11.44 µatm and the correlation coefficient increases from 0.73 to 0.84.There is a better estimation of the high pCO 2 values, greater than 330 µatm.Using Eq. ( 2), estimates are clearly thresholded while in using Eq. ( 3), it is possible to estimate the pCO 2 with a better accuracy.Finally, winter is the only season for which none of the three proposed parameterizations is really successful in predicting pCO 2 values, even if Eq. ( 4) leads to the best results.
These results are consistent with those of Lüger et al. ( 2004), except in summer and spring.Using observations in the North Atlantic, they found that SST could explain the variability of pCO 2 for summer only, when stratification prevents the upward mixing of nutrients and limits the biological production.Excluding summer, these authors found a weak, negative correlation between SST and pCO 2 over the entire basin.In fall, the deepening of the mixed layer supplies CO 2 to the surface layer.This explains why the multilinear regression, which takes into account the CHL and the MLD, gives better estimations for these seasons.In winter, the most important secondary parameter is the CHL while it is the MLD in fall.The regression coefficients can be directly compared to assess the relevance of the various parameters because they have been computed using normalized values of the three parameters (SST, CHL, MLD).These values show that SST is always the main parameter, which is not surpris-ing since the thermodynamic effect usually prevails, whereas CHL dominates in winter (B=0.85,C=0.34) and MLD in summer (B=−0.17 and C=−0.41).Both CHL and MLD are significantly contributing to the parameterization during fall (B=0.80 and C=0.89), whereas these two parameters are not relevant during spring, a period during which the SST solely is enough to parameterize the marine pCO 2 (B=−0.07 and C=−0.06).The parameters behave differently depending on the seasons but we decided to keep them for all seasons in the parameterization.If one parameter is not taken into account in the multi-linear regression (Eq.3), the accuracy of the regression is slightly affected.For instance, in winter, the MLD does not play a significant role, the RMS increases from 15.00 µatm to 15.36 µatm and r decreases from 0.45 to 0.40.As the opposite is true, in summer, the MLD is the secondary parameter.Without taking into account CHL, the RMS increases from 11.44 µatm to 11.50 µatm and r decreases from 0.84 to 0.83.These results are consistent with the results obtained by Olsen et al. (2007) in the subpolar North Atlantic.The authors studied the single relationships between pCO 2 and SST or CHL or MLD.They have identified basin-wide relationships between pCO 2 and CHL, valid on nearly annual scales.Concerning the relationship between pCO 2 and MLD, they stated that there is a linear relationship between the parameters from the beginning of the bloom until the MLD deepening in fall.This result is linked to our results showing that the MLD is the primary or secondary parameter in summer and fall in the North Atlantic.Finally, Fig. 2. Scatterplots of the in-situ pCO 2 vs. estimated pCO 2 obtained by three linear regressions based on the training data set: Left column pCO 2 =f (SST) (Eq.2); middle column: pCO 2 =f (SST, longitude, latitude) (Eq.4), right column: f (SST, CHL, MLD) (Eq. 3) for winter, spring, summer and fall.
these comparisons show that it is better to take the three parameters in the regression to obtain the better results, even if one of them is not dominant for a given season.
There is a negligible offset in the seasonal residuals.For instance, in fall, the mean is equal to −0.00003 and the standard deviation of the residual is ±8.92 µatm.Figure 3 shows the seasonal residuals (fitted -observed data) as a function of the day of the year.The value of the residuals is not correlated and not biased with the day.Moreover, its distribution shows neither a spatial nor a temporal trend (not shown) and its values do not depend on the values of the in-situ pCO 2 .
Figure 4 presents the scatterplots of the pCO 2 estimated with the validation data set versus the in-situ pCO 2 for the parametrization defined in Eqs. ( 3) and (4).As stated previously, the validation data set has been chosen randomly in the entire data set.Splitting the entire data set into two independent data sets (training and validation) is a common way to validate estimations from regression analysis (Saporta, 2006).Using this test data set allow us to verify the accuracy of the regression for at least interpolating along the paths of the cruises.Comparing with the results obtained with the training data set (Fig. 2 and Table 2), the multi-linear regression defined by Eq. (3) leads to good validation results.The RMS and the correlation coefficient are of the same values for each season as those obtained with the training data set.This means that this regression can be used with a high level of confidence along the area of the cruises.The same conclusions are reached with the regression defined by Eq. ( 4); using only SST to parameterize the oceanic CO 2 is not enough.It is necessary to take into account the geographical coordinates (Eq. 3) or physical parameters, such as CHL and MLD (Eq.4), to better estimate pCO 2 in local areas.

Seasonal pCO 2 maps
The two parameterizations based on Eqs. ( 3) and ( 4) were applied season-by-season to global maps of SST, CHL and MLD, to generate seasonal pCO 2 maps of the entire North Atlantic (85 • W-6 • W; 10 • N-58 • N), as shown in Fig. 5.In order to avoid extrapolation of pCO 2 values outside the validity range of each individual input parameter (SST, longitude and latitude, CHL, MLD), the pCO 2 was mapped only where all input parameters are within the range of variability explored by the database (see Table 2 for minimum and maximum values of the various parameters).This explains the difference in spatial coverage between the two parameterizations in Fig. 5, notably in fall.In the same way, it is the reason why there is no pCO 2 estimation along the northern east coast of Canada (zone encompassing 60 • W-50 • W and north of 42 • N).In this zone, the SST is always under the minimum value of SST used to calibrate the regression.So this zone was flagged for the computation of the pCO 2 .
As the SST, CHL and MLD data do not have the same resolution, it was necessary to re-grid the data.As a reminder, the SST data has a resolution of 9-by-9 km, the CHL a resolution of (1/12 • )-by-(1/12 • ) and the MLD a 2 • ×2 • resolution.To create 2 • -by-2 • maps, the mean values of the SST and CHL data are calculated over a 2 • -by-2 • box.
The fitted seasonal cycles of seawater pCO 2 , obtained by Eqs. ( 3) and ( 4), are compared to the pCO 2 cycles of Takahashi et al. (2002).Both data sets are created for the year 1995.
In Fig. 5, the maps obtained using longitude and latitude are clearly unrealistic estimates (left column), especially in winter and spring.The estimation of the seawater pCO 2 is strongly dependent of the longitude and latitude.This is not a surprising result, as the regression coefficient to the latitude is equal to −3.94 in winter (compared −3.58 for the SST coefficient) and to the longitude it is equal to 0.85 in spring (compared to 1.09 for the SST coefficient).This is also due to the fact that these two parameters do not exert any actual control on the pCO 2 .Moreover, even though both latitude and longitude are within their individual validity range, this application represents a de facto extrapolation outside the space spanned by the observations.Thus, whereas this approach can be sufficient to work with regions of small size, it is definitely not applicable to basin-wide regions.
The parameterization with SST, CHL and MLD exhibits more features (middle column).Highest chlorophyll concentrations are located north of 40 • N and are associated with low pCO 2 (middle column in Fig. 1).The pattern of the SST maps (Fig. 1) is also visible with a North-East to South-West warming associated with increasing pCO 2 values.From November the mixed layer deepens and the vertical mixing supplies subsurface CO 2 -rich waters to the surface, thus explaining the relatively high pCO 2 .In March, the mixed layer is quite deep and increases the surface pCO 2 but other processes are also present: biological activity mainly located on the edge of the U.S. and African coasts and the cold temperatures both contribute to lower pCO 2 .
In addition, the parameterization based on SST, CHL and MLD leads to results that are in reasonably good agreement with the climatology of Takahashi (2002), showed in the right column of Fig. 5.At low latitudes (20-35 • N), the pCO 2 cycle is marked by strong seasonal variations, with low values in winter, high values in summer and intermediate values in fall.These variations show a strong relationship with SST in this zone, especially during spring and summer.There is a discrepancy between June and August along the east coast of the US and the west coast of Africa, with higher values of pCO 2 in the climatology of Takahashi.At high latitudes (>40 • N), the seasonal cycle is less strong due to a more active role of CHL and MLD.
The mean values of the seawater pCO 2 over the entire basin are calculated and the seasonal cycle is compared to the seasonal cycle of Takahashi et al. (2002).The seasonal cycle of the mean value obtained by Eq. (3) shows a correlation coefficient of 53.87%, compared to 14.89% with the parameterization using the longitude and the latitude (and SST).
Taking into account CHL and MLD, in addition to SST, leads to a much better interpolation of pCO 2 at the scale of a basin like the North Atlantic than a regression based on SST and some geographical information, at least in spring, summer and fall.

Conclusions
Using 15 401 measurements for the years 1994 and 1995, we have developed a new parameterization of pCO 2 , based on SST, CHL and MLD for the North Atlantic (10 • N-58 • N).This new parameterization has been applied to satellite data, to provide pCO 2 maps of the North Atlantic.Comparison with a linear regression based on SST only shows a general improvement in reproducing pCO 2 , except during winter.Our basin-scale results are also more realistic than those obtained using a parameterization based on SST, longitude and latitude, which cannot be used for extrapolation to basinscale.This multi-linear regression is not valid for others basins and cannot be extended to the equatorial and southern Atlantic Ocean.This study demonstrates the importance of taking into account the CHL and MLD in the parameterization of the oceanic pCO 2 .

Fig. 1 .
Fig. 1.Seasonal variations of the SST (left column), CHL (middle column) and MLD (right column) from winter (top row) to fall (bottom row).The tracks of in-situ pCO 2 measurements are shown on the figures in black crosses.

Fig. 3 .
Fig. 3. Residuals values obtained with the following parametrization: pCO 2 =f (SST,CHL,MLD) as a function of the day of the year.

Fig. 4 .
Fig. 4. Scatterplots of the in-situ pCO 2 vs. estimated pCO 2 obtained by the linear regressions defined by Eqs.(3) (right column), (4) (left column) based on the validation data set for winter, spring, summer and fall.

Table 1 .
Number of pCO 2 data per month, maximum and maximum of the longitude, latitude, SST, CHL and MLD for the paths of the cruises for the period June 1994-August 1995.
(i.e.photosynthesis, respiration and calcification) and by upwelling of subsurface waters enriched in respired CO 2 and nutrients.

Table 2 .
Regression coefficients, correlation coefficients and root-mean squared errors of the three regressions (see Eqs. 2, 3 and 4).