Missing data and the accuracy of magnetic-observatory hour means

Analysis is made of the accuracy of magnetic-observatory hourly means constructed from definitive minute data having missing values (gaps). Bootstrap sampling from different data-gap distributions is used to estimate average errors on hourly means as a function of the number of missing data. Absolute and relative error results are calculated for horizontal-intensity, declination, and vertical-component data collected at high, medium, and low magnetic latitudes. For 90% complete coverage (10% missing data), average (RMS) absolute errors on hourly means are generally less than errors permitted by Intermagnet for minute data. As a rule of thumb, the average relative error for hourly means with 10% missing minute data is approximately equal to 10% of the hourly standard deviation of the source minute data.


Introduction
Hourly data have been an important product of magnetic observatories since the early 19th century.Their time series are used for a wide variety of applications, including estimating long-term solar-terrestrial interaction (e.g.Svalgaard and Cliver, 2007), production of magnetic indices for monitoring the Earth's magnetosphere (e.g.McPherron, 1995), mapping electric currents in the Earth's ionosphere (e.g.Winch, 1981), exploring the electrical conductivity of the Earth's mantle (e.g.Olsen, 1998), and studying geomagnetic secular variation originating in the Earth's core (e.g.Lesur et al., 2008).When the routine production of hourly data was first started, it was labor-intensive work, with direct measurements being made visually by on-site workers.Then, with the inven-Correspondence to: J. J. Love (jlove@usgs.gov)tion of photographic recording systems (Brooke, 1847), the production of hourly data became more automated.Today, formal hourly means are constructed by averaging digital minute data.The advancement of data-acquisition technology has brought improved data quality and greater quantities of data, but it has also introduced some difficulties.Digital systems sometimes give time series having artificial spikes and data dropouts, and with the desire on the part of observatory institutes to meet higher and higher operational standards, data judged to be defective or inadequate in some way are often removed during computer-based processing.Therefore, the question arises: How many missing minute data might be tolerated before an hourly mean becomes unacceptably erroneous?
Over the past several years, a number of researchers have investigated the accuracy of hourly means having missing minute data (Mandea, 2002;Schott and Linthe, 2007;Herzog, 2009;Newitt, 2009;Hejda et al., 2009).The subject has sparked spirited discussion at recent Intermagnet and International Association of Geomagnetism and Aeronomy (IAGA) meetings.Some observatory-institute representatives have, at times, advocated a stringent rule of not permitting the reporting of an hourly mean if even one minute of source data is missing.Other representatives have advocated a more forgiving rule under which an hourly mean could be reported if, for example, 90% (10%) of the minute data are present (missing).More complicated rules have also been suggested, but so far, few, if any, have been put into practice.
At some observatories, workers avoid the issue of missing minute data, to some extent, by backfilling with data taken from redundant acquisition systems (e.g.p. 2 of Hitchman et al., 2008).Workers at other observatories, preferring a more hands-off treatment of the data, simply report gaps as they are.Of course, users of observatory data can, and often do, choose to fill data gaps according to phenomenological models (e.g.Barkhatov et al., 2002), but what is acceptable to one user might not be acceptable to another, and so observatory institutes tend to avoid data manipulations that might introduce bias.It is also worth recognizing that some modern sensor systems have internal cyclings that result in momentary loss of temporal continuity; so, for example, the one-second, absolute-vector sensor of Pulz et al. (2009) sequentially combines 180 ms averages of magnetic signals every 200 ms -using what is, in effect, a 90% rule, albeit for the production of average data that correspond to a much higher frequency than the hourly means considered here.In any case, the effect of missing data and their possible imputation on longer timescale averages needs to be investigated and understood.For general review of the statistics of data with missing values see Little and Rubin (2002).
In estimating the accuracy of hourly means, focus is made on (1) the number of missing minutes of data and the distribution of data gaps within each hour, (2) the magnetic-vector component having missing data, and (3) the level of local magnetic activity and its dependence on observatory location and solar-terrestrial conditions.In the search for useful insight, the effects of these several variables need to be distilled down to a compact set of results.Lessons learned can then contribute to the discussion of possible standards for hourly means, and they might also be applied to the production of other observatory data types, such as minute means constructed from second samples, second means constructed from sub-second samples, etc.Our hope is that knowledge of the factors affecting data-product accuracy would inform wise decisions that strike a balance between that which is practical for the observatories to produce and that which is actually needed by the wider scientific community.

Data and gaps
The data used in this analysis are summarized in Table 1.They are definitive data collected at standard magnetic observatories (Love, 2008) for the years 1991-2008.The observatories are situated at high (Sodankylä SOD, Dumont d'Urville DRV), medium (Hartland HAD, Tihany THY), and low (Hermanus HER, Honolulu HON) magnetic latitudes.
We use the (horizontal intensity H , declination D, vertical component Z) magnetic-vector ingredients; where these are reported, then we use them without any modification, otherwise, if and when cartesian horizontal components (north X, east Y ) are reported, then we calculate (D, H ) appropriately.A simple algorithm is used to remove a few obvious data spikes and these are replaced by short data gaps.When referring to an observatory, we use its name, when referring to data from the observatory, we use its IAGA-designated three letter code.The data were obtained from the Intermagnet (Kerridge, 2001;Rasson, 2007) website and the Kyoto World Data Center website (for HER 1991(for HER -1992)).
For each observatory latitude-pair, one of (SOD, HAD, HER) was picked because it has few missing data; the other of the latitude-pair (DRV, THY, HON) was picked because it has relatively many missing data1 .In Fig. 1 we show the cumulative distributions of data gaps over time from observatories having significant numbers of missing data (DRV, THY, HON).The data-gap histories reveal more or less continuous time series that are punctuated by occasional and intermittent data gaps caused by operational difficulties.Obviously, these histories are different at different observatories.
In Fig. 2 we show the data-gap and data-sequence occurrence statistics within each Universal-Time hour (00:00-00:59, 01:00-01:59, etc.).For an individual observatory, if each hour of time has only one continuous data gap, then the gap length and number of missing data will be identical, as they nearly are for DRV and THY (a,b).Sometimes, however, an individual hour will have multiple short gaps causing the two distributions to be somewhat different, as they are (slightly) for HON (c).Despite this mild qualification, we can reasonably infer that most data gaps occur as continuous sequences of missing data.Figure 2 also reveals some of the systematic ways in which some observatory data gaps are realized.Note, for example, the tendency for HON data gaps to occur in multiples of 12 min (c), or the tendency for stretches of THY data to occur in multiples of 10 min (e).With respect to the position of the missing data within each hour, for DRV if only 1 min of data is missing, then this tends to occur relatively uniformly over the hour (g), but for THY 1 min of missing data tends to occur on minutes 10, 20, 30, and 40 (h).If (say) 6 min of data are missing, then occurrence is not uniform for any of the observatories (DRV, THY, HON).Apparently, observatory data gaps occur in a variety of ways -in some respects, at some observatories, they occur randomly, and in other respects, at some observatories, they do not.This difficult fact needs to be accommodated in an analysis of the accuracy of hourly means, otherwise biased results might be obtained.

Implications of possible standards
It is important to consider the implications of establishing a standard for hourly means formed from incomplete minute data.If a strict standard were adopted, such as choosing not to calculate a mean when even just one minute of data is missing, then every time an hour of observatory data is missing one minute, the result would be a 60-min gap in the hourly time series.This is an extreme and very obvious example, but it illustrates the trade-off between a stringent standard for constructing hourly means and the overall continuity of hourly-mean time series.In Fig. 3a we show cumulative missing minutes as a function of missing minutes per hour, and in panel (b) cumulative hours that contain missing minutes.For specificity, consider the 90% rule, where hourly means are not calculated if more than 6 min of data are missing within an hour.For the 18-year time span considered for (DRV Z, THY D, HON H ), there are a total of (3053, 1253, 7502) missing minutes in hours having 10% (6 min) or fewer missing data.This corresponds to (2.12, 0.87, 5.21) total days of missing minute data, and these missing minutes fall within (1193, 600, 4439) separate hours.In other words, if a 90% rule were adopted, then (49.71, 25.00, 184.96) days of hourly means would be accepted into the time series that otherwise, under the strictest criterion, would not have been accepted.With (0.03%, 0.01%, 0.08%) of each time series missing in terms of minute data, (0.76%, 0.38%, 2.81%) of the hourly time series are affected.For HON this represents over 6 month's worth of data!It is clear that a missing-data rule for calculating hourly means can have a dramatic impact on the continuity of the hourly time series.

Absolute errors
Let i B k represent a minute mean of an arbitrary magneticvector component (H, D, Z, etc.), where i denotes a specific minute within hour k.An hourly mean µ k is obtained by summing over the minute data within each hour, where each of the i a k is a binary (1 or 0) variable.For each hour, the N minute data used in constructing the hourly mean have i a k with value 1.For each hour, the missing minute data have i a k with value 0. If N=60 (M=0), then coverage is complete, no data are missing, no data have been omitted, and a "perfect" hourly mean can be formed using all of the available minute data.On the other hand, if N <60 (M>0), then only an "erroneous" hourly mean can be formed from incomplete coverage.Since the issue we are addressing is errors on hourly means when data coverage is incomplete, we define the absolute error, very simply, as the difference between the incomplete and complete means, which we note is a function of the number of available data N (missing data M) and the distribution of those data (gaps) within the hour.
How might this measure of error be calculated?In a typical observatory minute time series there are usually periods of good data continuity, and there are also some periods with data gaps that are caused by operational problems.As a result, most hourly means are complete µ k (N=60), but a few are incomplete µ k (N<60).Of course, there is no way to calculate a complete hourly mean µ k (60) if there are miss-ing minute data within the hour -if the data are really missing, then they are gone forever.Therefore, in conducting numerical experiments in which Eq. ( 3) is to be estimated, it is common to adopt a bootstrap approach (Efron, 1982) -calculating incomplete means µ k (N ) from complete data through a statistical simulation in which gaps are artificially inserted into the time series.But exactly how the gaps should be distributed has been something of an open question.Here, we consider three different distributions of simulated gaps: (1) binomial, (2) continuous-sequence, and (3) proxy.In addition, (4) we establish upper bounds on the errors on hourly means that can be realized from individual observatory time series.
It is straightforward to devise a binomial-simulation algorithm (Herzog, 2009).For each hour k, N of i a k are selected randomly, independently, and without replacement.As a representation of data present, these are assigned the value 1.The M remaining i a k represent missing data, and these are assigned the value 0. This yields a distribution of data and gaps within each hour that can be described in terms of binomial combinatorics.Using the observatory time series having relatively few actual gaps (SOD, HAD, HER), we calculate absolute errors using Eqs.(1) and (3) for numerous simulated realizations of binomial-distributed data and gaps.There are different ways in which N data (M missing data) can be realized in this type of simulation.The number of possible realizations can be extremely large -so large that for some N (M) an exhaustive tabulation (>10 17 ) is impossible, even on a modern computer.Still, accurate average results can be obtained from an unbiased random sampling of binomialdistributed gaps and data.Root-mean-square average errors, RMS{ }(N, B), where B denotes binomial sampling, are plotted in Fig. 4 as a function of the number of missing data M per hour.As expected, the greater the number of missing data, the larger the average error on the hourly mean.A simpler approach, and one which Fig. 2 demonstrates is more realistic, is to choose missing data sequentially so that they form long, continuous gaps in time having lengths ranging from 1 min up to 59 min (Mandea, 2002).For a gap of length M within an hour, there are N+1 different possible positions within that hour that such a gap can be situated.The i a k corresponding to the M missing minutes within each gap are assigned the value 0, and the i a k corresponding to the N data present are assigned the value 1.Assuming that each gap position has equal probability of being realized, we randomly sample from all possible continuous gaps across the entire (SOD, HAD, HER) time series, and we calculate their corresponding absolute errors using Eqs.( 1) and (3).RMS{ }(N, C), where C denotes continuous-gap sampling, are shown in Fig. 4. It noteworthy that these results are very different from those obtained through binomial sampling.In general, C sampling gives larger average errors than B sampling (Newitt, 2009).
Another approach, one that we assert gives very realistic results, is to take the data-gap histories (Fig. 1) from observatory time series having relatively many missing data and apply those gap histories to time series that are relatively complete.So, for example, whenever there is a missing minute datum from Honolulu, where the i a k have value 0, we introduce a simultaneous missing minute datum to the more continuous time series from Hermanus.With the combination of the original and essentially continuous HER time series and the synthetic, gap-filled HER time series, we can calculate absolute proxy errors using Eqs.( 1) and (3) for HER as if it had data gaps like those found in HON.More generally, we apply, respectively, the gap histories from (DRV, THY, HON) to the time series from (SOD, HAD, HER), and we calculate root-mean-square average errors, RMS{ }(N, P ), where P denotes proxy sampling.Results are shown in Fig. 4. In each case, the proxy P results are similar to the continuous C results, apparently validating the continuous-sampling approach.
It is useful to put upper bounds on the absolute errors.For every hour within each time series (SOD, HAD, HER), we rank the 60 individual residuals (5) We assign 1 to the i a k for the N residuals that give the largest possible absolute errors for all combinations of minutes i and for all hours k over the entire duration of each time series 1991-2008; we assign 0 to all the remaining i a k .In this  case it is possible to conduct an exhaustive search of the observatory time series.The maximum errors are designated MAX, and results are shown in Fig. 4. At first thought, readers might consider the MAX results to be large -for the 90% rule, errors for high-latitude H and Z exceed 100 nT -but it should be recognized that the largest absolute errors occur during periods of extremely high magnetic-field activity, when missing only a few minutes of data can have a significant affect on an hourly mean, and even then, the missing data must be those specific values that, when lost, have the maximum affect on the hourly mean.MAX should be interpreted as the worst case, a hard upper bound on any conceivable error on an hourly mean given many years of magnetic measurement at an observatory.
Figure 4 shows us that average absolute errors and maximum absolute errors are diverse functions of magnetic-vector component and observatory latitude.Therefore, in light of these results, it is not surprising that previous researchers Ann.Geophys., 27, 3601-3610, 2009 www.ann-geophys.net/27/3601/2009/have found it difficult to draw general conclusions about how many missing data might be tolerated for routine production of meaningful hourly means (e.g.Schott and Linthe, 2006).

Relative errors
We next examine how error estimates are related to magneticfield activity.In conducting their research on the errors on hourly means, Mandea (2002) and Herzog (2009) used the K index to group results according to quiet, medium, and disturbed magnetic-activity levels.In this analysis we prefer to use a more direct measure of magnetic activity -the hourly standard deviation σ k of the minute data, With this we define relative error as which we note is dimensionless.In Fig. 5 we show RMS{e} results, each for binomial (B), continuous (C), and proxy (P ) gap distributions, and in the same figure we also show MAX{e} results.In contrast to our previous results for absolute errors, the results for relative errors are much more consistent across the variables of magnetic-vector component and observatory latitude.The simplicity of Fig. 5 indicates that general conclusions can be obtained.
For just a few missing data, M 60, the C and P relative errors are reasonably well fitted by a simple linear proportionality: and this relationship holds pretty well for all magnetic-vector components (H , D, Z) and all observatory latitudes.This scaling relationship breaks down for large M, but that is not of much importance since hourly means do not have much meaning when many minutes of data are missing.The situation with B relative errors is very different.Here, a good fit is obtain by the power-law: Relative errors on means having binomial distributed data and gaps are, on average, a factor of M 1/2 less than relative errors on means having continuously distributed gaps.With respect to the 90% rule, hourly means with 6 missing minutes are typically erroneous by about 10% of σ k , assuming that gaps are well approximated by a C distribution.If data and gaps are B distributed (something, we emphasize, would be unusual), then the typical 90%-rule error is smaller, about 4% of σ k .
The distributions of the P relative errors (7) for the 90% rule with 6 missing minutes, e(N=54) are shown in Fig. 6.They compare favorably with zero-mean normal (Gaussian) distributions having similar variances.While we find this interesting, other than appealing to the central-limit theorem (e.g.Bulmer, 1979), we do not have any a priori reason to expect relative errors to be normally distributed.Indeed, Kolmogorov-Smirnov tests (Press et al., 1992) show probabilities for consistency ranging from 0.2849 to 0.9506.A high (low) probability indicates that the true underlying distribution is consistent with (different from) the data.Where the probability is low, the difference might be very slight, and in this respect statistical tests of significance do not provide useful guidance.From our perspective, what is important is the visually-obvious fact that the relative errors are very close to be normally distributed.Therefore, their dispersions have a straightforward interpretation, with about 68% of the errors being smaller than the quoted RMS values.

Discussion
In terms of current Intermagnet standards, definitive minute data are supposed to have an accuracy of better than 5 nT.This is usually interpreted in an average sense as applying to data collected over a period of about a year, although some clarification on the part of Intermagnet would be useful.In any case, it is unreasonable to expect that a standard for hourly means can exceed the standard set for the source minute data.For the time series considered here, a 5 nT level of accuracy would be satisfied, on average, if the 90% rule were to be adopted (Fig. 4).These results are encouraging, since they appear to permit a reasonable tolerance for missing data, but we acknowledge that during magnetically-active periods the errors will be larger.For this reason we have also presented results in terms of relative errors (Fig. 5).There are no corresponding Intermagnet standards for such "signalto-noise" measures, but they might be worthy of specification, since some observatory hourly means constructed from minute data having gaps can apparently have low absolute accuracy but high relative accuracy.
Generally speaking, for scientists investigating long-term trends, such as secular variation, good long-term absolute accuracy of observatory hourly means is important, and complete temporal continuity is probably less important.On the other hand, for scientists investigating magnetic storms, rela-tive accuracy and continuity are probably the most important qualities a time series can have.We also know that some scientists investigating unusual signals will sometimes have their own unique data requirements.It is difficult for a data provider to satisfy every data user.Scientists having specific and stringent requirements, can, if they choose, construct their own hourly means from the source minute data.But data centers will probably want to continue to provide hourly means as a convenient service to the scientific community, and this will almost certainly require some degree of compromise on standards.We hope that any adopted standards will be simple and well-documented.
Of course, hourly means can be erroneous for many reasons, not just because they might have been formed from incomplete minute data.The source minute data might have been collected from a faulty magnetometer, or the observatory site might have been magnetically-contaminated in some way.A stringent standard on the construction of hourly means from incomplete minute data will not fix such problems.The occasional isolated erroneous hourly mean can often be identified in an observatory time series as a spike, and it is normal practice to remove spikes, either by directly editing a data file or by using a simple computer algorithm.Prolonged periods of problematic data are usually manifest as either anomalous statistical jitter -"noise" -or periods of obvious bias.These problems can be identified through inter-comparison of data from different observatories.We are reminded that users of data need to inspect the data that they use, even if the data have already been declared to meet an established standard.And everyone needs to appreciate the obvious: although we might strive for perfection, in the end, it is impossible to attain.

Fig. 2 .
Fig. 2. Data-gap and data-sequence statistics.(a-c) Number of occurrences of data-gap length (blue) and missing minutes per hour (red) for (DRV Z, THY D, HON H ). (d-f) Number of occurrences of data-sequence length (blue) and minutes present per hour (red).(g-i) Position of missing data within each hour; 1 missing minute in an hour (blue), 6 missing minutes (red), all missing minutes (black).

Fig. 3 .
Fig. 3. Cumulative missing minutes (a) for (DRV Z, THY D, HON H ) and cumulative hours (b) with missing minutes, each as a function of number of missing minutes per hour.The cumulative of the missing-minutes-per-hour density results shown in Fig. 2a-c.

Fig. 4 .
Fig. 4. Root-mean-square RMS and maximum MAX absolute error results (N) for (a-c) H , D, and Z magnetic-vector components for high-latitude observatories SOD (DRV),(d-f) medium-latitude observatories HAD (THY), (g-i) low-latitude observatories HER (HON), each as a function of number of missing data M. (B, C, P ) denotes results for (binomial, continuous, proxy) gap distributions.No proxy result is calculated when 3 or fewer proxy instances are available.

Fig. 5 .
Fig. 5. Root-mean-square RMS and maximum MAX relative error results e(N) for (a-c) H , D, and Z magnetic-vector components for highlatitude observatories SOD (DRV),(d-f) medium-latitude observatories HAD (THY), (g-i) low-latitude observatories HER (HON), each as a function of number of missing data M. (B, C, P ) denotes results for (binomial, continuous, proxy) gap distributions.For comparison, fits proportional to M and M 1/2 are also shown.No proxy result is calculated when 3 or fewer proxy instances are available.

Fig. 6 .
Fig.6.Cumulative distributions of relative error results for the 90% (10%) rule with 6 missing minutes, e(N =54), and comparison distributions for zero-mean normal (Gaussian) distributions having similar variances, for (a-c) H , D, and Z magnetic-vector components for high-latitude observatories SOD (DRV), (d-f) medium-latitude observatories HAD (THY), (g-i) low-latitude observatories HER (HON).Also given, in each case, are the Kolmogorov-Smirnov probabilities of consistency.