A global model of the ionospheric F2 peak height based on

Abstract. The ionospheric F2 peak height hmF2 is an important parameter that is much needed in ionospheric research and practical applications. In this paper, an attempt is made to develop a global model of hmF2. The hmF2 data, used to construct the global model, are converted from the monthly median hourly values of the ionospheric propagation factor M(3000)F2 observed by ionosondes/digisondes distributed globally, based on the strong anti-correlation existed between hmF2 and M(3000)F2. The empirical orthogonal function (EOF) analysis method, combined with harmonic function and regression analysis, is used to construct the model. The technique used in the global modelling involves two layers of EOF analysis of the dataset. The first layer EOF analysis is applied to the hmF2 dataset which decomposed the dataset into a series of orthogonal functions (EOF base functions) Ek and their associated EOF coefficients Pk. The base functions Ek represent the intrinsic characteristic variations of the dataset with the modified dip latitude and local time, the coefficients Pk represents the variations of the dataset with the universal time, season as well as solar cycle activity levels. The second layer EOF analysis is applied to the EOF coefficients Pk obtained in the first layer EOF analysis. The coefficients Ak, obtained in the second layer EOF analysis, are then modelled with the harmonic functions representing the seasonal (annual and semi-annual) and solar cycle variations, with their amplitudes changing with the F10.7 index, a proxy of the solar activity level. Thus, the constructed global model incorporates the geographical location, diurnal, seasonal as well as solar cycle variations of hmF2 through the combination of EOF analysis and the harmonic function expressions of the associated EOF coefficients. Comparisons between the model results and observational data were consistent, indicating that the modelling technique used is very promising when used to construct the global model of hmF2 and it has the potential of being used for the global modelling/mapping of other ionospheric parameters. Statistical analysis on model-data comparison showed that our constructed model of hmF2, based on the EOF expansion method, compares better with the observational data than the model currently used in the International Reference Ionosphere (IRI) model.


Introduction
Ionospheric modelling has been one of the leading ways to study the ionosphere.An ionospheric model can be either a theoretical model (also named physical model or first principle model) which is derived from various laws of physics and based on the numerical solution of the equations describing the spatial and temporal distribution of medium parameters or an empirical (or semi-empirical) model which is derived from the observational results.Empirical ionospheric models are not only important for ionospheric research, but also very useful and much needed in a wide range of practical applications, such as radio and telecommunication, satellite tracking, Earth observation from space, etc.
In many empirical ionospheric models, such as the International Reference Ionosphere (IRI) (Bilitza, 1990;2001) and NeQuick (Radicella and Leitinger, 2001;Leitinger et al., 2005) models, the calculation of the electron density profile usually uses the critical points such as F2, F1 and E layer peaks as anchor points, using parameters of foF2, hmF2, foF1, hmF1, foE and hmE (the critical frequencies and peak heights of the F2, F1 and E layers, respectively) as inputs.They are based on the global or regional "maps" of the ionospheric peak parameters.Since the ionospheric electron Published by Copernicus Publications on behalf of the European Geosciences Union.
M.-L.Zhang et al.: Global model of the ionospheric F2 peak height based on EOF analysis density has its maximum values in the F2 layer, this layer is the most important region of the ionosphere which is primarily responsible for the reflection of radio waves in highfrequency communication and broadcasting.For this reason, F2 peak is the key anchor point to determine the ionospheric electron density profiles.Therefore, the F2 layer peak height hmF2 is one of the most important parameters in ionospheric empirical modelling.Due to the lack of observational data of hmF2, in practice, people usually obtained hmF2 based on its strong anti-correlations with the ionospheric propagation factor M(3000)F2 whose value is provided usually by the CCIR (International Radio Consultative Committee) or now called ITU (International Telecommunication Union) M(3000)F2 model (CCIR report, 1967).The CCIR M(3000)F2 model predicts M(3000)F2 based on the 12-month running average sunspot number Rz12, then hmF2 is calculated based on this CCIR M(3000)F2 model values using Eqs.(1-6) listed in Sect. 2. However, recently some authors (Adeniyi et al., 2003;Obrou et al., 2003;Zhang et al., 2004;2007) found that in the equatorial and low-latitude regions, the values of hmF2 converted from the CCIR M(3000)F2 model value using Eqs.(1-6) have remarkable discrepancies with the observational hmF2.They revealed that the discrepancies stemmed from the inaccuracy of CCIR M(3000)F2 model.Because when the measured M(3000)F2 values are used, the hmF2 converted using Eqs.(1-6) agrees very well with the observational result derived from the manually edited traces of ionograms using ionogram inversion programs.Therefore, there is a necessity to update the existing M(3000)F2 model or construct directly a global model of hmF2.This is an urgent main task of the IRI working group community.To this goal, recently, some new modeling techniques have been proposed to model these parameters.For example, Oyeyemi et al. (2007) proposed a new modelling technique based on the application of neural network to model the M(3000)F2 parameter, whereas Liu et al. (2008) attempted to model the M(3000)F2 parameter based on the empirical orthogonal function (EOF) analysis of the observational dataset.Furthermore, recently Gulyaeva et al. (2008) derived a numerical model of hmF2 from the topside database of about 90 000 electron density profile provided by ISIS1, ISIS2, IK19 and Cosmos-1809 satellites for the period of 1969-1987.In this paper, we pursue constructing a global model of hmF2 using a modelling technique which is based on the empirical orthogonal function (EOF) decomposition of the global hmF2 dataset and the modelling of its associated EOF coefficients.In the following sections, we will first mention the dataset used for our model construction in Sect.2, and then in Sect. 3 we will describe the modelling technique used.The modelling results with some discussions will be shown in Sect. 4 and the last section (Sect.5) is the summary and conclusion.

Data and transformation equations
The fact that hmF2 is a parameter which is not easy to obtain from measurement makes it difficulty to have an observational hmF2 dataset with enough spatial coverage and history length of data accumulation that can be used for global modelling.However, it has been shown (Shimazaki, 1955;Wright and Mcduffie, 1960) that hmF2 is strongly anti-correlated to the ionospheric propagation factor M(3000)F2 and, fortunately, M(3000)F2 can be routinely scaled from ionograms recorded by ionosondes/digisondes distributed globally and its data has already been accumulated for a very long time.This enables us to construct a database of hmF2 from the global M(3000)F2 database for our modelling study.The original empirical formula between hmF2 and M(3000)F2, i.e., hmF2=1490/M(3000)F2-176, was derived by Shimazaki (1955).However, it was found, by later researchers, that a correction term M should be added and the formula now has the format (Wright and Mcduffie, 1960;Bradley and Dudeney,1973;Eyfrig, 1973;Bilitza et al., 1979): The correction term M in Eq. ( 1) accounts for the delayeffect caused by the ionizations in the E layer that is related to the foF2/foE ratio.There are various expressions of M derived by different authors (Bradley and Dudeney, 1973;Eyfrig, 1973;Bilitza et al., 1979).The expression of M we used in our calculation of hmF2 is the one derived by Bilitza et al. (1979), which has being used in the IRI model since its first release in 1978 (Rawer et al., 1978): where Where R 12 is the 12-month running mean of the sunspot number, is the magnetic dip latitude which is related to the magnetic inclination I as 2tg =tgI.
The hmF2 data, used in our present study, is calculated from M(3000)F2 based on Eqs.(1-6).The use of Eqs.(1-6) is justified by many works (e.g., Adeniyi et al., 2003;Obrou et al., 2003;Zhang et al., 2004;2007) that showed when the measured M(3000)F2 values are used as input, the hmF2 value obtained with Eqs.(1-6) agrees very well with the observational ones derived from the manually edited traces of ionograms using ionogram inversion programs.As for the accuracy of Eq. ( 1), it is estimated by Dudeney (1983) that the uncertainty of hmF2 calculated from observed M(3000)F2 is within 4-5%, i.e., about 15-20 km.Thus, the relationship between M(3000)F2 and hmF2 can be taken as true.In the present study, data of M(3000)F2, foF2 and foE used to construct the hmF2 database were downloaded from the Space Physics Interactive Data Resource (SPIDR) website http://spidr.ngdc.noaa.gov/.Figure 1 shows the global distribution of the stations used for the present hmF2 global modelling study.In the present work, we used the monthly median data to do the modelling.Therefore, the magnetic activity effects are ignored in the model.

Fundamental of EOF analysis method
The modelling technique we used to model hmF2 is mainly based on the empirical orthogonal function (EOF) expansion or decomposition of the dataset.EOF analysis method was actually invented by Pearson (1901).This method has been widely and successfully used by meteorologists and oceanographers to analysis the spatial and temporal variations of physical fields since Lorenz (1956) introduced it into meteorology.In the ionospheric field, it was Dvinskikh (1988) who first introduced the EOF analysis into the empirical modelling of the ionospheric parameters.His pioneer work was followed by other works (e.g., Singer and Taubenheim, 1990;Bossy and Rawer, 1990;Singer and Dvinskikh, 1991;Dvinskikh and Naidenova, 1991).It has been shown by many researchers that the EOF analysis is a powerful method in the ionospheric data analysis and empirical modelling (e.g., Dvinskikh, 1988;Singer and Dvinskikh, 1991;Dvinskikh and Naidenova, 1991;Daniell et al., 1995;Marsh et al., 2004;Matsuo et al., 2002Matsuo et al., , 2005;;Mao et al., 2005Mao et al., , 2008;;Materassia and Mitchell, 2005;Zhao et al., 2005;Zapfe et al., 2006, Liu et al., 2008;Wan et al., 2009, and many more).
EOF analysis involves a mathematical procedure that transforms a dataset into a number of uncorrelated components called principal components, with any two components orthogonal to each other.Thus, this method sometimes is also named Principal Component Analysis (PCA) or Natural Orthogonal Component (NOC) algorithm.The underlying physical meaning of EOF analysis is that the variation of a physical field variable is mainly controlled by some independent processes that can be separated.This method decomposes a dataset into a series of eigen function (or base functions) and its associated coefficients, with the base functions orthogonal to each other.As we know, people usually use the Fourier or Spherical Harmonic analysis method to decompose a dataset.However, the base function set used in Fourier or Spherical Harmonic analysis is predesigned artificially.In EOF decomposition, the orthogonal base functions are not artificially designed in advance, but are naturally determined by the experimental dataset to be decomposed.Therefore, they possess the inherent characteristics of the original data, and the eigen series will converge very quickly.This makes it possible to use only a few orders of EOF components to represent most of the variance of the original dataset.Hence, the EOF expansion has advantages in data analysis and representation.In the following, we will briefly describe the fundamentals of the EOF analysis method.For further details, readers are referred to Dvinskikh (1988), Storch and Zwiers (1999) and Xu and Kamide (2004).
Let Y ij =Y (t i , x j ) represent the value of a variable (e.g., hmF2) at the spatial point x j (e.g., longitude and latitude) at time t i , i=1, 2, . . .,m and j =1, 2, . . ., n.Then Y m×n is a matrix of m rows and n columns.Suppose there are totally r independent physical processes affecting the variation of the variable, then Y can be decomposed into a series of base function E k (x j ),(j = 1,2,...,n) and its associated coefficients A k (t i ),(i = 1,2,...,m). Where T .The superscript T over a matrix means the transpose of the matrix.The base functions E k (k = 1,2,...,r) are uncorrelated, that is, they are orthogonal over space: e k j and a k i are obtained by minimizing  with respect to e k i and e k j .In fact, E k (k = 1,2,...,r) is the orthogonal eigenvectors of the symmetric matrix S = Y T Y , which can be obtained by solving the equation Where λ k is the eigen value.Once E k is found, the associated coefficient A k can be computed by It can be proved that The percentage variance of the dataset captured by the first r components is

Data preprocessing and decomposition
Before we can apply the EOF decomposition to the dataset of hmF2, we need to do some data preprocessing work.Since the time periods with available data are different for different stations, we must normalize the data of each station to the same time period.In this study, we choose the years of 1975-1985 covering both ascending and descending phases of the solar cycle activities as the time period used for global modelling.This time period was chosen because data during this period are available for most of the stations used.To fill the missing data for any station during this period, we preprocessed the data as follows.First, for each single station, with its observational values of M(3000)F2, foF2 and foE as input, M(3000)F2 is converted into hmF2 using Eqs.(1-6).When foE is not available, the value obtained from IRI model is used as input.With the converted hmF2 dataset, single station model of hmF2 is constructed for each individual station using the modelling technique described in Liu et al. (2008).
Then, the missing data (if any), for any chosen station during the time period of 1975-1985, are filled with the single station model values.
After normalizing all the chosen stations' data to the time period of 1975-1985 as described above, data at evenly distributed grids (5 • × 5 • in a latitudinal range of 85 • S-85 • N and longitudinal range of 0-360 • E) were then obtained by interpolation using Kriging method.To test the validity of the Kriging method in our data preprocessing, the accuracy of Kriging maps is estimated.Figure 2 is a sample but typical map of the standard deviation obtained when doing the Kriging interpolation.In Fig. 2, the black points represent stations whose data is used in interpolation.As can be seen from the figure, the standard deviations of the Kriging maps, at the large areas where observations are sparse, are less than 10 km, which is a very general result in our data preprocessing.This means the accuracy of the Kriging maps obtained in our data preprocessing is quite acceptable.The prepared gridded data are the dataset to which the EOF decomposition is going to be applied.
The first step in our modelling is to do the EOF analysis on the gridded dataset prepared as described above.Two layers EOF decomposition are applied.In the first layer, the prepared gridded dataset are decomposed into the EOF base functions E k (µ, LT), which represent the variation of hmF2 with the modified dip latitude (short:Modip) µ and the local time (LT), and the associated EOF coefficients P k (UT,m), which represent the variations of hmF2 with the universal time (UT) and seasonal as well as solar cycle variations (denoted all by the variable m) as the following Here the coordinate (µ, LT) instead of geographical latitude/longitude or magnetic latitude/longitude is used since it was found (Rawer, 1963) that features of the ionospheric parameters, in particular those of the F2 layer, are mostly well organized under the coordinate (µ, LT) which is also well confirmed by recent research works (e.g., Azpilicueta et al., 2006) and by our experience when we tried to decompose the dataset using other coordinates such as the geographical latitude/longitude one.This is due to the fact that the ionosphere is controlled by both the orientation of the Earth's rotation axis and the configuration of the geomagnetic field.Therefore, its variation depends on both the geographical and geomagnetic latitudes, which is embedded in the modip µ defined as tgµ = I cos 1/2 ϕ, where I is the magnetic inclination and ϕ the geographical latitude.The ionosphere has also longitudinal variation.In our modelling, the longitudinal variation is embedded in the LT (of E k ) and UT (of P k ) variations since LT=UT+Longitude/15.
In the second layer of EOF decomposition, the coefficients P k (UT,m) obtained in the first layer EOF analysis are decomposed once again into a series of base function F representing the variation with the universal time (UT) and the associated coefficients A j k (m) representing the seasonal as well as solar cycle variations

Modelling the associated EOF coefficients
The second step in our modelling is to model the EOF coefficients A j k (m) with the following harmonic functions representing the seasonal (annual and semi-annual) and solar cycle variations.The reason justifying the usage of the following harmonic functions with their amplitudes changing with the solar flux index F 10.7 to model the coefficients A j k (m) will be given in Sect. 4 when we discuss the results obtained by the EOF decompositions.where m is the month representing the seasonal variation, and F 10.7 index is a proxy most commonly used to represent the solar activity levels which was originally called the Covington index (Covington, 1948).It is the solar radio flux density measured at a wavelength of 10.7 cm and expressed in units of 10 −22 Watts/m 2 /Hertz.For more details on the proxy solar radio flux, please refer to Tobiska (2001)  A j k and the base functions F j k obtained in the second layer EOF decomposition, the modelled first layer EOF coefficients P k are calculated with Eq. ( 14).At last, with the calculated modelled P k and the base functions E k obtained in the first layer EOF decomposition, the modelled value of hmF2 is calculated using Eq. ( 13).

Results and discussion
Figure 3 shows the contour plots of the first four orders of the base functions E k , obtained with the first layer EOF decomposition of hmF2 dataset, versus the modified dip latitude (µ) and the local time (LT).It can be seen that apparently, the base functions E k we obtained showed some typical features that have been observed in many data.For example, the distribution of the first order of base function E 1 manifests mainly a typical phenomenon related to the equatorial ionization anomaly.As is well known, the equatorial ionization anomaly is a phenomenon characterized by a structure with two crests of ionization (best represented by foF2) at about ±17 • dip latitude on each side of the magnetic equator and a trough in between.It is formed as a consequence of the so called "fountain" effect.The influence of this equatorial fountain effects on the F2 peak height hmF2 is that it will produce a latitudinal distribution of hmF2 with higher value near equatorial and low latitudes but with lower value outside.This latitudinal distribution structure of hmF2 is exactly what we see in the distribution of the first order of the base function E 1 .The second base function E 2 mainly reflects the north-south asymmetry which is closely related to the seasonal change of the solar zenith angle.Because we will see, in Fig. 4, the associated EOF coefficient P 2 corresponding to this base function shows mainly an annual variation pattern.As for the third base function E 3 , the most remarkable feature to notice is the evening enhancement of hmF2 in the equatorial and low-latitude region.This feature is apparently a result of the regeneration/enhancement of the fountain effect during the post-sunset hours due to the evening prereversal enhancement (PRE) of the F2 region plasma's electromagnetic E × B drift caused by the enhanced zonal electric field near the magnetic equator (Fejer et al., 1995, and references therein).Another feature worth noticing is the enhancement/abatement of E 3 in the north/south auroral zones.This feature can be explained by combining it with the pattern of P 3 shown in Fig. 4. P 3 is positive/negative during the north winter/summer seasons.Therefore, the contribution of the component E 3 × P 3 is positive/negative during the north winter/summer seasons.This is apparently a phenomena caused by the difference of sunlit hours in the North/South Hemispheres during winter/summer seasons.Therefore, as is demonstrated here, the EOF decomposition analysis method, to some extent, is able to separate the variance of a dataset into components caused by sources due to different physical processes or mechanisms.This demonstrated that EOF analysis is a powerful and advantage method to analyze and organize the ionospheric data.
Figure 4 shows the distribution of the first four orders of the associated coefficients P k obtained in the first layer EOF decomposition of hmF2 dataset.They correspond to the first four base functions shown in Fig. 3.It can be seen that P 1 shows a variation pattern with a very strong dependence on the solar cycle activity.It also shows some seasonal variations.The coefficients (P 2 -P 4 ) corresponding to the other orders of base functions show most obviously the annual and semi-annual variations, besides, they also show dependence on the solar activity levels.
Figure 5 shows the results obtained in the second layer EOF analysis, that is, the EOF decomposition of P k .The left panels are the distributions of the base functions F j k obtained, the right panels are for the corresponding associated coefficients A j k .The different curves are for the first three components (j =1, 2, 3).For A j k , the results modelled with Eqs.(15)(16)(17)(18) are also plotted (thin green curves overlaying on each corresponding decomposed ones).As we can see from the right panels, the second layer EOF coefficients A j k mainly contains the annual and semiannual variation components and their amplitudes also change with the solar cycle activity levels.This justifies the modelling of the coefficients A j k with the harmonic functions representing the seasonal (annual and semi-annual) variations as well as their so- As an example for showing the validity of the constructed global model of hmF2, Fig. 6 shows the scatter plots of the model value versus observational data of hmF2.Please pay attention that the observation presented here is the true observational data from all individual stations available, the kriging and single stations' screen points are not included here.The upper panel is for the case of the low solar activity year (1976) and the lower panel for the case of the high solar activity year (1981).It can be seen that the model values calculated with our constructed model based on the EOF decomposition and the observational data show a high linearity and the correlation coefficients R between the model values and the observational data are very high (R is 0.939 and 0.945 for the years of 1976 and 1981, respectively).The high linearity and correlation coefficients between the modelled hmF2 values and the observational data imply that the constructed model is able to reproduce the observational data quite well.
Figure 7a-b shows some sample plots demonstrating the comparison between the observational data of hmF2 and the model values given by our EOF based hmF2 model as well as those given by the IRI model for the low solar activity year 1965 (Fig. 7a) and the high solar activity year 1970 (Fig. 7b) for 10 stations representative of high, middle and low latitudes from both North and South Hemispheres.Their geographical coordinates are listed in Table 1.In Fig. 7ab, from top to bottom, stations are ordered from north high, middle and low latitudes to the south low, middle and high latitudes.In these plots, the thick green curves represent our EOF hmF2 model results; the thick black curves represent the  Table 1.Geographical coordinates of the stations labeled in Fig. 7a-b.
No Code GLAT( • N) GLON( IRI model results, whereas the blue points are the observational data.As an additional comparison, results provided by the EOF-based M(3000)F2 model (Liu et al., 2008) are also shown (thin red curves).They are obtained by converting the modelled M(3000)F2 to hmF2 using Eqs.(1-6).As can be seen from these plots, the EOF model results (both of hmF2 and M(3000)F2) in general reproduced quite well the variation behaviour of the observational data.Compared with the results given by the IRI model, the EOF model results compare obviously better than the IRI model results with the observational data.As for the EOF model results, it can be seen from Fig. 7a-b that, in general, the results based on the M(3000)F2 model are very similar to those provided by the hmF2 model.However, a careful inspection of the details of the plots indicates that the hmF2 model results are somewhat improved over the M(3000)F2 model results.
To estimate the accuracy of the model, we made a statistical analysis on differences between the model values and the observational data by calculating the root-mean-squarederror (RMSE) of the model: Where N p is the total number of the data points.The calculated RMSEs for our EOF-based hmF2 model are 11.8 km for the low solar activity year (1965) and 13.4 km for the high solar activity year (1970), respectively.Those for the results based on the EOF M(3000)F2 model are 13.3 km and 14.3 km for the years 1965 and 1970, respectively.As a comparison, the RMSEs for the IRI model results are 18.9 km and 22.6 km for the year 1965 and 1970, respectively.Therefore, in general, our constructed global model of hmF2 based on the EOF decomposition has a higher accuracy than the model currently used in IRI.The accuracy of the EOF-based hmF2 model is also slightly higher than that of the global EOF-based M(3000)F2 model (in terms of the converted hmF2).

Summary and conclusion
In the present study, an attempt was made to construct the global model of the ionospheric F2 peak height parameter hmF2 based on the EOF decomposition of the dataset and the modelling of the associated EOF coefficients with harmonic functions representing annual and semi-annual seasonal variations.Solar cycle dependence is also taken into account in the model by including the changes of the harmonic amplitudes with the solar irradiance flux index F 10.7 .Comparisons between the model predictions and the observational data are in agreement.Statistical analysis on the differences between model values and observational data showed the constructed model of hmF2 based on the EOF expansion agrees better with the observational data than the model currently used in IRI.The accuracy of the hmF2 model developed in this study is slightly higher than that of the global EOF-based M(3000)F2 model (in terms of converted hmF2) previously developed by our team (Liu et al., 2008).From the point of view of practical application, the hmF2 model is more preferable than the M(3000)F2 global model, since in many applications it is the parameter hmF2, rather than M(3000)F2, that is really required.It is also due to the fact that the conversion from M(3000)F2 to hmF2 involves the foF2/foE ratio.It would be much more convenient to obtain hmF2 directly from a reliable hmF2 global model than converting to hmF2 from an M(3000)F2 model.

Fig. 1 .
Fig. 1.Global distribution of the stations used for the present modelling study.

Fig. 2 .
Fig. 2. A sample map of the standard deviation obtained when doing the kriging interpolation.

Fig. 3 .
Fig. 3. Distribution of the first four orders of the base functions (E 1 -E 4 ) obtained with the 1st layer EOF decomposition of hmF2.

Fig. 4 .
Fig. 4. Distribution of the coefficients (P 1 -P 4 ) corresponding to the first four orders of base functions (E 1 -E 4 ) shown in Fig. 3 obtained with the first layer EOF decomposition of hmF2.
Fig. 5. Distribution of the base functions (F j 1 -F j 4 ) (left panels) and their corresponding coefficients (A j 1 -A j 4 ) (right panels) obtained with the second layer EOF decomposition of hmF2.

Fig. 7 .
Fig. 7. Sample plots of model values and observational data of hmF2 for the years of (a) 1965 and (b) 1970.From top to bottom panels, stations are ordered from north high, middle and low latitudes to the south low, middle and high latitudes.