Modeling ionospheric fo F 2 by using empirical orthogonal function analysis

A similar-parameters interpolation method and an empirical orthogonal function analysis are used to construct empirical models for the ionospheric foF2 by using the observational data from three ground-based ionosonde stations in Japan which are Wakkanai (Geographic 45.4 ◦ N, 141.7 E), Kokubunji (Geographic 35.7 ◦ N, 140.1 E) and Yamagawa (Geographic 31.2 ◦ N, 130.6 E) during the years of 1971– 1987. The impact of different drivers towards ionospheric foF2 can be well indicated by choosing appropriate proxies. It is shown that the missing data of original foF2 can be optimal refilled using similar-parameters method. The characteristics of base functions and associated coefficients of EOF model are analyzed. The diurnal variation of base functions can reflect the essential nature of ionospheric foF2 while the coefficients represent the long-term alteration tendency. The 1st order EOF coefficient A1 can reflect the feature of the components with solar cycle variation. A1 also contains an evident semi-annual variation component as well as a relatively weak annual fluctuation component. Both of which are not so obvious as the solar cycle variation. The 2nd order coefficientA2 contains mainly annual variation components. The 3rd order coefficient A3 and 4th order coefficient A4 contain both annual and semi-annual variation components. The seasonal variation, solar rotation oscillation and the small-scale irregularities are also included in the 4th order coefficient A4. The amplitude range and developing tendency of all these coefficients depend on the level of solar activity and geomagnetic activity. The reliability and validity of EOF model are verified by comparison with observational data and with International Reference Ionosphere (IRI). The agreement between observations and EOF model is quite well, indicating that the EOF model can reflect the major changes and the temporal distribution characteristics Correspondence to: D.-H. Zhang (zhangdh@pku.edu.cn) of the mid-latitude ionosphere of the Sea of Japan region. The error analysis processes imply that there are seasonal anomaly and semi-annual asymmetry phenomena which are consistent with pre-existing ionosphere theory.


Introduction
Different types of variability in ionosphere are subject to a number of interconnecting drivers which can be broadly characterized as follows: (a) solar ionizing radiation; (b) geomagnetic activity; and (c) meteorological influences (e.g., Forbes et al., 2000;Rishbeth and Mendillo, 2001;Lei et al., 2008b;Zhang et al., 2011).The ability to model and eventually anticipate the solar cycle, annual, semi-annual and seasonal variations as well as irregularities in ionosphere is of great use for both ionospheric research and space weather applications (Tóth et al., 2005).An ionospheric model can be either a first-principles-based physics model which is developed from a rigorous mathematical analysis of laws of physics and based on numerical solution of the spatialtemporal equations, or an empirical model which refers to any kind of modeling based on empirical observations.The empirical ionospheric model, which usually describes the spatial and temporal variation of electron density, critical frequency, electron temperature and other parameters of ionosphere in the form of various types of functions (e.g., harmonic function, Chapman function), played an important part in extensive practical applications.
Among different empirical models, International Reference Ionosphere (IRI) model (Bilitza et al., 1990;Bilitza, 2001) is the most widely used one.The electron density profile given by IRI is described by special anchor points of characteristic ionospheric parameters including the F2, F1 and E layer peak densities which depend on the critical Published by Copernicus Publications on behalf of the European Geosciences Union.frequencies.The relationship of peak density NmF2 (unit: m −3 ) and foF2 (unit: MHz) is as follows.
NmF2 = 1.24 × 10 10 × (foF2) 2 (1) Therefore, the F2 layer critical frequency foF2 is one of the most significant ionospheric parameters from which the morphology of topside density profile can be well characterized.IRI provides two choices to describe the foF2: CCIR (International Radio Consultative Committee) or now known as ITU (International Telecommunication Union) model (CCIR, 1967), and URSI (International Union of Radio Science) model (Rush, 1992).Both cases are based on the observations from the worldwide network of ionosonde stations.The availability of reliable data for the specific region and time determined the accuracy of the model (Bilitza and Reinisch, 2008).The ionosphere in East Asia is an important region where the station density is relatively dense.However, the electro and atmospheric dynamics within middle and low latitude ionosphere of East Asia region which is controlled by the equatorial anomaly phenomena can be very complicated.Several studies have shown that there are relatively large discrepancies between the ionospheric parameters predicted by IRI model and the observational data in equatorial and low latitude regions, especially in East Asia and southern China area (Adeniyi et al., 2003;Liu et al., 2004;Zhang et al., 2004).Zhang et al. (2007) examined that the percentage difference values of foF2 predictions by using URSI coefficients in IRI pattern can reach as large as 30 % around pre-sunrise time, and between −5 % percent and −25 % during most time period of the day.Therefore, it is necessary to update the existing CCIR or URSI foF2 model or build directly a single station or regional model of foF2 among East Asia region.
Several new modeling techniques with respect to different ionospheric parameters have been proposed.Some studies made the temporal and spatial forecasting of ionospheric foF2 and built the model by using neural network analysis (Kumluca et al., 1999;Oyeyemi et al., 2005Oyeyemi et al., , 2006;;McKinnell andOyeyemi, 2009, 2010).Of particular intention is concentrated on modeling the ionospheric parameters such as foE, foF2, hmF2, and M(3000)F2, etc. based on empirical orthogonal function analysis (e.g., Dvinskikh, 1988;Liu et al., 2008;Zhang et al., 2009).In this paper, we will focus on constructing single station model of ionospheric foF2 for Wakkanai (Geographic 45.4 • N, 141.7 • E), Kokubunji (Geographic 35.7 • N, 140.1 • E) and Yamagawa (Geographic 31.2 • N, 130.6 • E) during the years of 1971-1987 by using similar-parameters interpolation method and empirical orthogonal analysis, and the results are compared with the observational data and with IRI model.

Data set for station modeling
The ionospheric F-layer over Japan, which lies near the inner flank of the northern crest of ionospheric equatorial anomaly, is a representative sector of East Asia (geographic longitude range: 130 • E-145 • E; geographic latitude range: 30 • N-45 • N; geomagnetic latitude range: 18 • N-35 • N).Relatively large discrepancies have been measured between IRI and observational values among this sector (Liang, 1990;Adeniyi et al., 2003;Bilitza et al., 2006;Vlasov and Kelley, 2010).
Here we use hourly foF2 data observed at three ground-based ionosonde stations in Japan which are Wakkanai, Kokubunji and Yamagawa.These three stations are the oldest established ionosonde sites with long history of reliable data of ionograms (Xu et al., 2004;Lei et al., 2008a).The time period of 1971-1987 are used in the present study because it covers more than one whole solar cycle as long as possible with the maximum data availability.The ionosonde data for 1980, 1981 (high solar activity years) and 1986, 1987 (low solar activity years) are used for data-model comparison in order to assess to what degree the empirical model can represent the observational results.The geographical coordinates and geomagnetic latitudes are listed in Table 1.

Description of the similar-parameters interpolation method
We use similar-parameters method to refill the missing data for aforementioned three stations during the time period of 1971-1987.Similar-parameters method, which was originally applied to the field of aerodynamics (see NASA website: http://www.grc.nasa.gov/WWW/k-12/rocket/airsim),can be used to interpolate the missing data before constructing the empirical model.In this method, the observational data can be influenced and determined by all possible factors or parameters.If two data sets have the same values for the similarity parameters, the data set which contains the missing value under certain temporal and spatial conditions can be interpolated by using its "control" data set.Here we list several possible drivers of ionospheric variability of F layer in Table 2. Accordingly we will choose appropriate proxies as parameters from which the influence of aforementioned drivers on ionospheric foF2 can be reflected.For the solar ionizing radiation, we choose solar index F 10.7 as proxy because it is an ideal indicator for solar cycle and solar rotation variation.The solar radiation also related to the change in X-ray radiation fluxes which also need to be set as proxy.The seasonal variation is related to change in solar zenith angle which is a function of DOY (day of year), local time and latitude.As the latitude is fixed for single station, here we choose DOY (day of year) and local time as proxies.For the geomagnetic activity, Ap index is a suitable proxy because it is a planetary index for calculating the strength of world-wide geomagnetic disturbances.For the meteorological influences, Mendillo et al. (1998) tried to estimate to what degree the F-layer variability could be attributed to the troposphere and lower stratosphere by using the NCAR (National Center for Atmospheric Research) coupled CCM3/TIME-GCM (Community Climate Model-3/Thermospere-Ionosphere-Mesosphere-Electrodynamics General Circulation Model; Roble and Ridley, 1994).Mikhailov et al. (2007Mikhailov et al. ( , 2009) ) showed that synchronous variation of electron density can be observed during geomagnetic quiet day in both E-and F-region.Such variability is considered to be caused by perturbation originating from lower atmosphere.It is hard to set appropriate proxy for meteorological influences not only because the major influences to the variation of foF2 are due to solar ionizing and geomagnetic activity but also since there are so many uncertainties in the climate change as well as in the generation and transmission of the wave.Therefore, the parameters being used here are F 10.7 , Ap index, local time and DOY based on aforementioned analysis.At any single station, the missing value can be interpolated by choosing the median value from the data set which have the "similar" parameters.The settings of the similar parameters are listed in Table 3.One thing worth noting is that the any interpolation method cannot be a good one for interpolating long-gap missing data.However, the data missing for foF2 at Wakkanai, Kokubunji, and Yamagawa during the time interval 1971-1987 is only scattered, which makes it possible to implement the interpolation method.Figure 1 displays the comparison of the original and interpolated foF2 values.It can be seen from the figure that the similar-parameters interpolation method is fairly feasible for data preprocessing.

Brief introduction of EOF analysis method
Empirical Orthogonal Function (EOF) analysis method, which was invented by Pearson (1901), has been widely used to analysis the temporal and spatial variation of the research objects.Dvinskikh (1988) first introduced EOF analysis into the empirical modeling of the ionospheric parameters.It has been confirmed by many researchers that EOF analysis is a powerful method for ionospheric modeling and data analysis (e.g., Singer and Dvinskikh, 1991;Daniell et al., 1995;Matsuo et al., 2002Matsuo et al., , 2005;;Marsh et al., 2004;Zhao et al., 2005;Zapfe et al., 2006;Liu et al., 2008;Mao et al., 2008;Zhang et al., 2009;Matsuo and Forbes, 2010, and more).According to EOF theory, the variation of the ionospheric parameters are attributed to different factors which can be extracted and separated in terms of their relative "contribution" with respect to ionospheric parameters.The EOF method can be utilized to decompose and express the variation as a summation of the eigen modes which are not pre-specified artificially, but are calculated according to experimental data itself in the decomposition process.The combination of eigen modes can reproduce the substantive characteristics of the data.The eigen series have rapid convergence velocity and high calculation accuracy.This makes EOF analysis method a highly effective way of empirical modeling not only by greatly reducing the modeling parameters but also by considerably saving the computation time compared with other expansion methods.For further details of EOF decomposition, readers may refer to Dvinskikh (1988), Xu and Kamide (2004), and Zhang et al. (2009).

Data processing with EOF analysis method
The hourly values of foF2 at three stations are decomposed into the EOF base functions E k , multiplied by the corresponding EOF coefficients A k using the EOF analysis method: Where foF2(d, h) is the combination of hourly values of the observational data expressed as a 6209 × 24 array with the rows corresponding to the days (d = 1,2,3...,6209) which is calculated from 1 January 1971; The column corresponding to the local time LT (h = 0,1,2...,23).E k (h) is the k-th base function of foF2 reflecting the diurnal variation, A k (d) is the coefficients of E k (h) which represents the long-term variation (solar cycle, annual and seasonal, etc.).Theoretically, all of those 24 order base functions and the associated EOF coefficients are needed to reproduce the variation of the original matrix.However, the EOF decomposition converges fairly quick, which makes it possible to use only a limited number of base functions and the corresponding coefficients to reconstruct the matrix and reflect the principal components of the variation of original data set.In this paper, the first four EOF series, which can reflect the 99.7 % information of the fluctuation of the original data matrix, are used to reconstruct the foF2 and build the model.So it is feasible to reduce the number of modeling parameters and to simplify the calculating process to a great extent while the accuracy of data reconstruction being considerably high.
Figure 2 shows the diurnal variation of the first four order EOF base functions and the foF2 at three stations respectively.It can be clearly seen from the left panels that the diurnal variation of the 1st order base function E 1 and foF2 are quite similar to each other, the correlation coefficients between E 1 and foF2 are greater than 0.97 among all those three stations.Therefore E 1 can represent the average diurnal variation trend of foF2.The ionospheric foF2 is also influenced by other factors which including interhemispheric flow, neutral winds and diffusion.Thus one would expect there are small scale disturbances and irregularities superimposed on the diurnal variation due to above influences, which are well represented from the variation of the 2nd, 3rd and 4th order base functions in the right panels.Take the 3rd order base function E 3 as an example.E 3 be-gins to increase at around 05:00 LT and then has a decreasing trend at around 11:00 LT and increases again from 17:00 LT.These phenomena can be explained by ionospheric sunrise enhancement, bite out phenomena and sunset enhancement respectively (Schunk and Nagy, 2000;Liu et al., 2004).
Figure 3 displays the long-term variations of F 10.7 and the first EOF coefficient A 1 from 1971 to 1987. Figure 4 shows the variations of Ap index and EOF coefficients from the 2nd to the 4th order.By comparing these two figures, we can see that the 1st EOF coefficient A 1 has a much larger value than A 2 , A 3 and A 4 which are the 2nd, 3rd, and 4th order EOF coefficients respectively.The amplitude range of the 1st order coefficient A 1 is about two to three times larger than that of the 2nd order coefficient A 2 , and A 2 is also two times larger than the 3rd order coefficient A 3 while the 4th order coefficient A 4 is the smallest.Referring to the analysis of base function in previous paragraphs, we can deduce that the principal components of variation of foF2 are reflected by the first-order EOF series which is represented in the form of  The synchronous solar cycle variation trends both in A 1 and in F 10.7 for all three ionosonde stations can reflect a highly positive-correlation relationship.The correlation coefficients for Wakkanai, Kokubunji and Yamagawa are 0.937, 0.945 and 0.928, respectively.This phenomenon illuminated that A 1 can represent the components of solar cycle variation in original foF2 data set.And it can also be seen that there is an evident semi-annual variation and a relatively weak annual-variation component in A 1 .Though they are not as prominent as the solar cycle variation because they are superimposed on the latter one.Figure 4 shows that there are quite obvious annual variation components in A 2 .A 2 also contains relatively weak semi-annual variation components.Both A 3 and A 4 contain mainly the semi-annual variation elements as well as more subtle variations, such as seasonal variations and small-scale irregularities.The semi-annual variation components can be attributed to periodic wave in geomagnetic Ap indices with maxima near equinoxes (Petrukovich and Zakharov, 2007).The amplitudes of the EOF coefficients are influenced by the solar activity represented in the form of F 10.7 and by geomagnetic activities represented in the form of Ap index.
From the above analysis, we know that the first four EOF coefficients can reflect the solar cycle variation, annual fluctuation, semi-annual oscillation, and short-term irregularities.So we can use the formal Fourier series to model the first four EOF coefficients A n (n = 1, 2, 3, 4).
Where n is the n-th EOF coefficient of base function.B n1 (d), B n2 (d), and B n3 (d) correspond to the solar cycle, annual, and semi-annual variation components in EOF coefficients respectively, ε is the residual error.F 10.7p = (F 10.7 + F 10.7A )/2, which was established based on the value of daily F 10.7 and its 81-day moving average F 10.7A .F 10.7p has been used in solar irradiance empirical models as a solar EUV proxy (e.g., Hinteregger et al., 1973;Richards et al., 1994).It has been validated that F 10.7p represents quite well the intensity of solar EUV flux, which is considered as a better solar proxy for common use (Liu et al., 2006(Liu et al., , 2011)).Here we use a linear function of F 10. in EOF coefficients.The amplitudes of those trigonometric functions can also be expressed in the form of linear functions of F 10.7p and Ap index to show their dependence on the level of solar activity as well as geomagnetic activity.C, D, E, F , G and H are coefficients of various parts in above equations.Those coefficients can be calculated by using linear regression analysis method, and thus the EOF coefficients A n (d) can be acquired by using Eqs.( 3)-( 6) with those determined coefficients.We can still add shorter periods variation components into the EOF coefficients (e.g., the seasonal variation components, the solar rotational components with the period of 27-days, and 16-days solar oscillation, etc.) for the accuracy of the fitted coefficients.The modeled values of foF2 at single stations can be acquired using Eq. ( 2) with the EOF base functions multiplied by the coefficients calculated with aforementioned linear regression method.
Figure 5 shows the solar cycle, annual, and semi-annual variation components of the fitted EOF coefficients at Kokubunji respectively.We can see from the figure that: (1) The contribution to solar cycle variation is mainly made by the 1st order component which has a much larger value than the others.(2) Annual variation can be traced in all of those 4-order components, but the major contribution comes from the 2nd order component.(3) The semi-annual variation is mainly concentrated in the 1st, 3rd and 4th order components whereas contribution from the 2nd order components is smaller.(4) The influence due to solar activity and geomagnetic activity can be indicated from these components.
Figures 6 and 7 show the comparison of the daily variation of the observational foF2 values, the values given by IRI model, and the EOF modeled values during years with high solar activity (1980,1981) as well as years with low solar activity (1986,1987).It can be seen from the figure that the IRI modeled values have a relatively smooth boundary from which the details of variation in original data set can not be well expressed.This illustrated that some variation with short periods and small scale irregularities/disturbances are beyond the level that can be well represented by IRI.On the contrary, the EOF modeled values can reflect quite well different scales of variation in original foF2, which is mainly attributed to that the EOF coefficients are directly obtained from the decomposition of original data set.The quick convergence of EOF expansion made it possible to use limited number of base functions and the corresponding fitted coefficients generated by linear regression analysis to reproduce the observational data set.
Figure 8 shows the scatter plots of IRI and EOF model versus the observational foF2.The left two panels are for high solar activity years (1980,1981), and the right two panels are for low solar activity years (1986,1987).One phenomenon worth noting is that the correlation coefficients between the modeled values and the observational values are larger in high solar activity years for both IRI and EOF model.For IRI model, some studies have noted major shortcoming for low electron densities conditions during which the discrepancies between the model and measurement can be consistently  the accumulated percentages of first 4-order coefficients of EOF base function.
Figure 10 shows the long-term variation of relative errors and relative mean square errors between the model and the observational data in day (6 a.m.-6 p.m.) and night (6 p.m.-6 a.m.).The IRI model results are obviously inferior to that of EOF model results, and there are some other features worth noting.First, there are distinct semi-annual variation components in the relative errors of EOF model.Second, the wave crest and the wave trough of the variation of relative errors are asymmetric for day and night.In the daytime, the relative errors have positive values (near wave crest) or increasing trends around solstice and the negative values (near wave trough) or decreasing trends around equinox.However, the night time conditions are to the contrary.Positive error appears near equinox and negative error appears near solstice.For the seasonal asymmetry, three different explanations are suggested: (1) neutral wind hypothesis: the large-scale interhemispheric circulation induced by asymmetric heat distribution can cause asymmetric density variation (Johnson and Gottlieb, 1970;Fuller-Rowell, 1998).The circulation may also exert influence on foF2 and even on deviation with small scales due to the proportional relation between foF2 and electron density.(2) Equinoctial hypothesis: the variation of ionospheric parameters are related to the angle between solar wind flow and geomagnetic dipole field and to the solar asymmetric illumination (McIntosh, 1959;Lyatsky et al., 2001).(3) Axial hypothesis: the increase of heliographic latitude around equinoxes can place Earth closer to fast solar wind streams from coronal holes (Bohlin, 1977).

Summary and conclusions
Ionospheric foF2 measurements in three ionosonde stations of Japan during 1971-1987 with maximum data availability offer the opportunity to construct the empirical model in order to provide the basis for updating the existing CCIR or URSI foF2 model.In the present paper, the original foF2 data is interpolated with similar-parameters method, then an empirical model is constructed with EOF decomposition combined with regression analysis.We made the following observations and conclusions: First, ionospheric foF2 is subject to a number of drivers which can be broadly divided into three categories: solar ionizing radiation, geomagnetic activity and meteorological influences.It is feasible to use F 10.7 , X-ray fluxes, solar zenith angle and Ap index as appropriate proxies to reflect the aforementioned influences and to refill the missing value of original foF2 with similar-parameters method.
Second, the EOF model can reproduce quite well the original data sets of foF2 by utilizing only the first 4-order base functions as well as corresponding coefficients.The base functions can express the diurnal variation of the original foF2 data while the corresponding coefficients can represent the long-term variation (solar cycle, annual, semi-annual, seasonal, solar rotation, and irregularities, etc.).
Finally, the error analysis reveals that there are seasonal anomaly and semi-annual variation phenomena.These results can be attributed to neutral wind hypothesis, equinoctial hypothesis, and axial hypothesis.

Fig. 1 .
Fig. 1.Comparison among the original foF2 (left panels) and interpolated foF2 using similar-parameters method during the interval 1971at three ionosonde stations.

Fig. 2 .
Fig. 2. Diurnal variation of the first EOF base functions and the foF2 (left panels) and 2nd to 4th order base functions (right panels) at three stations.

Fig. 5 .
Fig. 5. Variation components of fitted EOF coefficients with different period.

Fig. 8 .
Fig. 8. Scatter plots of IRI and EOF model values versus observational data of foF2.

Fig. 9 .
Fig. 9.Diurnal variation of relative errors and relative mean square errors between model and the observational data.

Fig. 10 .
Fig. 10.Long-term variation of relative errors and relative mean square errors between model and the observatioanl data.

Table 1 .
Geographic and geomagnetic positions of the ionosondes in Japan.

Table 2 .
Possible drivers of ionospheric variability of F layer.

Table 3 .
The settings of the similar parameters.