www.ann-geophys.net/29/283/2011/ doi:10.5194/angeo-29-283-2011 © Author(s) 2011. CC Attribution 3.0 License.

Abstract. The hourly values of the geomagnetic field from 1911 to 1931 derived from measurements made at Eskdalemuir observatory in the UK, and available online from the World Data Centre for Geomagnetism at http://www.wdc.bgs.ac.uk/ , have now been corrected. Previously they were 2-point averaged and transformed from the original north, east and vertical down values in the tables in the observatory yearbooks. This paper documents the course of events from discovering the post-processing done to the data to the final resolution of the problem. As it was through the development of a new index, the Inter-Hour Variability index, that this post-processing came to light, we provide a revised series of this index for Eskdalemuir and compare it with that from another European observatory. Conclusions of studies concerning long-term magnetic field variability and inferred solar variability, whilst not necessarily consistent with one another, are not obviously invalidated by the incorrect hourly values from Eskdalemuir. This series of events illustrates the challenges that lie ahead in removing any remaining errors and inconsistencies in the data holdings of different World Data Centres.


Introduction
For the past few years it has been known that the Eskdalemuir hourly values from 1911 to 1931 were post-processed with unfortunate consequences.These data were formerly available online from the World Data Centre (WDC) for Geomagnetism at the Danish Meteorological Institute till the start of 2007 and then at the authors' institute.The postprocessing was the application of a 2-point running mean and Correspondence to: S. Macmillan (smac@bgs.ac.uk) transformation to declination, horizontal and vertical intensities (DHZ) from the original northerly, easterly and vertical intensities (XYZ) recorded in the tables of the observatory yearbooks.The reason for the post-processing was presumably to make these earlier hourly values compatible with the later values which were hourly mean DHZ values centred on the half hour (Eskdalemuir yearbooks, 1932 onwards).There is no record of when this post-processing was done.The 1911The -1917 data were XYZ spot values on the hour (according to yearbook table headings but more on this later) and the 1918-1931 data were XYZ mean values centred on the hour (Eskdalemuir yearbooks, 1911(Eskdalemuir yearbooks, -1931)).This postprocessing had the effect of lowering the spectral content for high frequency studies performed on the hourly value time series.However the significance of this only really came to light with the development of the Inter-Hour Variability index in the 2000s.

The discovery
The The creation of the IHV index was motivated by issues concerning the aa index (Svalgaard et al., 2003).The aa index has been critical for inferring long-term changes in the Sun (e.g.Lockwood et al., 1999) as it is the longest running index of magnetic activity available, dating back to 1868.However some records of observatory hourly values are also very long and this lead to the development and Published by Copernicus Publications on behalf of the European Geosciences Union.analyses of IHV indices for inferring long-term changes in the Sun and near-Earth space (Svalgaard et al., 2004;Mursula et al., 2004;Mursula and Martini, 2006;Svalgaard and Cliver, 2007a, b).The problem with the Eskdalemuir IHV series was known at this stage and attempts were made to correct it (Martini and Mursula, 2006).Another point to note at this stage was that the World Centres were acknowledged collectively for the supply of the hourly values and the individual WDC supplying the data was not identified.This turns out to be important.In the meantime the long-term solution to the problem seemed to lie with either undoing the postprocessing or redigitising the original hourly value tables in the observatory yearbooks.

Undoing the post-processing
In theory, it should be possible to undo such relatively simple post-processing by obtaining an initial value from the yearbooks and applying it to each series of post-processed data using where E i is an original hourly value of an element of the Earth's magnetic field for hour i and E i is a 2-point averaged hourly value of an element of the Earth's magnetic field for hour i. Knowledge of E 0 (the value at the start of the time series) should be all that is required to regenerate the original series of data.However it turns out that the transforming from XYZ to DHZ and the limitations of the WDC format significantly complicates matters.We explain this now in more detail.
The digital storage and exchange format that is widely used for hourly values is the WDC format, described at http://www.wdc.bgs.ac.uk/catalog/master.html.This is the natural extension of the way that hourly values were presented in the older yearbooks.The use of a baseline value, one line of data per element per day and 1 nT resolution is common to both the yearbook tables and the WDC-format files.The unit for declination of the magnetic field in the WDC-format file is tenth arc-minute, and the resolution is 1 tenth arc-minute.
There are 4 options of how the post-processing was done: 1.The XYZ values were transformed to DHZ, then 2-point averaged and rounded to 1 nT.
2. The XYZ values were transformed to DHZ, rounded to nearest 1 tenth arc-minute and 1 nT (the resolution of the WDC format), then 2-point averaged and rounded to 1 nT.This option is possible if WDC-format files of unaveraged DHZ existed at an intermediate stage.
3. The XYZ values were 2-point averaged, then transformed to DHZ and rounded to nearest 1 tenth arcminute and 1 nT.
4. The XYZ values were 2-point averaged, rounded to nearest 1 nT, then transformed to DHZ and rounded (again) to nearest 1 tenth arc-minute and 1 nT.This option is possible if WDC-format files of averaged XYZ existed at an intermediate stage.
Within each of these options are choices for rounding 0.5 to the resolution of the WDC-format files which is 1.Whilst the default rounding in the IEEE 754-2008 floating point arithmetic standard is to round to even, its implementation is dependent on computer platform and programming language.
Rounding to zero and rounding up are other possibilities.Also unknown is whether the same rounding was applied to each component.A further complication is that the Y component in the yearbooks assumes west is positive, but it is not known whether this was the case in any intermediate WDCformat files.
Because of these uncertainties in the post-processing, and also because any resulting values from attempts to undo the post-processing would require comparison with the original data sources, the yearbook tables were scanned and digitised.It should be noted that the original data are the analogue magnetograms and tables of absolute observations and whilst there is a programme underway in BGS to digitise these for long-term preservation, it was not deemed practical to use these to obtain new series of absolute hourly values within a reasonable timescale.

Experiences with scanning and digitising yearbook tables
In support of a study by Martini and Mursula (2006), work began to redigitise the original hourly values directly from the monthly tables in the Eskdalemuir observatory yearbooks.Originally this involved making scanned copies of only the January tables for each year as described in Martini and Mursula (2006), however this work continued to complete the full set of digital images for all months.The yearbooks were located and transported to Edinburgh -some from the BGS archive at Hartland observatory and others from Eskdalemuir.Following tests on and selection of scanning hardware and Optical Character Recognition (OCR) software, the monthly tables for each component for 23 years (1932 and 1933 were also done, resulting in a total of over 800 tables) were scanned and saved as tagged image file format (TIFF) files.Some of the yearbooks were in degraded condition and thus the clarity of the tabulated data within the images obtained was often poor (Fig. 2).Using OCR software, attempts were made to extract the data from the TIFF files.However too many errors or unrecognised characters were obtained for each image and the process was abandoned as one that was uneconomical in time and effort.The OCR method was confirmed as too much of a challenge in this case, when example files were sent to three different specialist companies.They all recommended direct data entry rather than OCR.One company was commissioned to carry out the data entry work.
The digitised XYZ data were returned in Microsoft Excel spreadsheet format, with each table on a single worksheet as per its original layout in the yearbook.The tables included the daily averages as well as the averages over each UT hour in the month.These averages would prove useful for quality control (QC) purposes.The data were transformed into DHZ and two-point averaging applied.This enabled comparison with the existing post-processed WDC data.It was clear at this stage that there were many errors in the digitised data (e.g.Fig. 3 top and middle panels) and that the data entry company had not carried out the QC that had been assured.
Further work was then required to identify and correct the data entry errors.Firstly the Excel spreadsheets could be used to compute daily and monthly UT averages, and compared against the corresponding values entered in the spreadsheets.Any errors identified at this stage were corrected.Secondly these revised data were then compared again with the post-processed WDC data, as in the previous paragraph, and all differences greater than 10 nT identified and corrected: a task that proved to be very labour intensive.Although a large number of smaller errors remained (e.g.Fig. 3 bottom panel) it was clear that the Eskdalemuir hourly values from this procedure were a better match to the original data in the yearbooks than the post-processed data available from the WDC.These data were not released however as it was the intention to complete the processing by removing all the remaining errors.This task has now been superseded by the discoveries and work carried out in the remainder of this paper.

Comparison of WDC holdings
Until recently it had been widely assumed that the WDCformat files for Eskdalemuir held at different WDCs contained the same data.However it was noted in Love et al. (2010) that the files held at Edinburgh and Kyoto WDCs were different and that the Kyoto files appeared to contain unaveraged XYZ values.A detailed comparison of the files was made by the authors, including monthly plots of differences, using the 4 options of post-processing listed in Sect.3.
It was found that option 1 with rounding up for 0.5, resulted in the least differences between the 2 data series, particularly for Z. Table 1 lists the annual means and standard deviations of the differences in DHZ for this option.Although Table 1 shows that the overall differences are not al- ways zero with this option, this, along with the monthly plots of differences, was strong support for Love's claim that the data in the Kyoto WDC-format files were not post-processed.Before replacing the Edinburgh files with those from Kyoto the cause of the large differences in 1917 and 1925 (Table 1) had to be found.For this the monthly plots of differences, yearbook scans and digitised values were used.For 1917 it was the July values in Y and Z in the Kyoto files that were wrong.For 1925 it was the January values in Y in the Kyoto files that were wrong and the December values in Y and Z in the Edinburgh files that contained additional errors over and above the post-processing.
Further checking of the Kyoto files revealed missing chunks of data in 1915 and 1921.Again the yearbook scans and digitised values were used to complete the files.A final quality-control measure was to select one day at random for each year from 1911 to 1931 and to check that the values in the WDC-format files agreed with those from the yearbooks.The corrected Kyoto files were transmitted back to Kyoto and the post-processed Edinburgh files available online were replaced with the correct files on 9 August 2010.

Revised IHV indices
As further evidence of the reliability of the updated files, we compute IHV indices for a number of observatories and compare ratios of their annual means.In Fig. 4 it can be seen that the revised Eskdalemuir IHV index series no longer shows a large step at end of 1931 when compared with that from Niemegk.In order to investigate the anomalous value for 1911 we "zoom in" and use ratios of monthly means of IHV indices.In Fig. 5 we can see there is a step at the end of 1911.This is when the hourly values at Eskdalemuir changed from being spot values to mean values (as noted by Martini and Mursula, 2006, the headings of the yearbook tables of hourly values are not clear in this respect, but a note in the text of the yearbook for 1912 spotted by Love et al., 2010, states that the values are mean values centred on the hour).Changing from spot to mean values also happens in other observatory series (in Niemegk it happened before the start of the Eskdalemuir series) and whilst an adjustment to the resulting IHV series can be estimated for each observatory (Svalgaard and Cliver, 2007a), the ultimate remedy may lie in making high-quality scans of the original magnetograms and yearbooks and producing hourly mean values where previously only hourly spot values are available.A programme to do this is currently underway for the UK observatories.In the meantime extra information has been added to the metadata for Eskdalemuir held at http://www.wdc.bgs.ac.uk/.This states that 1911 hourly values are spot values centred on the hour and that 1912-1931 are mean values centred on the hour.

Conclusions
The significance of this work is that Eskdalemuir data are widely used in studies concerning long-term trends in magnetic field variability and inferred solar variability (for example Mursula et al., 2004;Svalgaard and Cliver, 2007b).For that reason inconsistencies and errors in the Eskdalemuir time series need to be identified and carefully corrected.Fortunately, because Eskdalemuir is but one of several observatories used in these studies, the conclusions on solar variability are unlikely to need reconsidering.However it was the detailed analyses undertaken in these solar variability studies that brought the problem reported here to light.For this we are grateful and we believe that future research using long series of hourly mean values can now be made with high confidence following the extensive analysis described here.Work is underway to improve the metadata, including timedependent metadata, held for each observatory at the World Data Centres.
first indication of something unexpected in the early Eskdalemuir hourly values was from correspondence with Leif Svalgaard in 2004 where he presented ratios of the newly defined Inter-Hour Variability indices for the UK observatories.The IHV index is the sum of absolute differences of hourly means (or values) for a geomagnetic component from one hour to the next over the six hour interval around local midnight.The ratios for the annual means of the IHV indices for the horizontal component as recorded at the UK observatories at Eskdalemuir, Lerwick, Abinger and Hartland revealed a sharp change in the Eskdalemuir record at the end of 1931 (Fig. 1).

Fig. 1 .
Fig. 1.Ratios of annual means of UK observatory IHV indices based on data available at the Edinburgh WDC before 9 August 2010.Green is ratio of Eskdalemuir to Abinger, red is ratio of Eskdalemuir to Hartland and blue is ratio of Eskdalemuir to Lerwick.

Fig. 2 .
Fig. 2. Example of a TIFF file of a monthly table of hourly values scanned from the ESK yearbooks.This illustrates the typical poor quality of the original tables.

Fig. 3 .
Fig. 3. Errors in the hourly mean values for 1917 from manual data entry carried out in 2007.The top panel shows full range of errors, the middle panel reduces the range to 100 nT to show more detail and the bottom panel shows the remaining errors following the corrections described in the text.

Fig. 4 .
Fig. 4. Ratios of annual means of Eskdalemuir and Niemegk IHV indices based on data available at the Edinburgh WDC before (blue) and after (red) 9 August 2010.

Fig. 5 .
Fig. 5. Ratios of monthly means of Eskdalemuir and Niemegk IHV indices based on data available at the Edinburgh WDC after 9 August 2010.