Annales Geophysicae Prediction of SYM-H index during large storms by NARX neural network from IMF and solar wind data

Abstract. Similar to the Dst index, the SYM-H index may also serve as an indicator of magnetic storm intensity, but having distinct advantage of higher time-resolution. In this study the NARX neural network has been used for the first time to predict SYM-H index from solar wind (SW) and IMF parameters. In total 73 time intervals of great storm events with IMF/SW data available from ACE satellite during 1998 to 2006 are used to establish the ANN model. Out of them, 67 are used to train the network and the other 6 samples for test. Additionally, the NARX prediction model is also validated using IMF/SW data from WIND satellite for 7 great storms during 1995–1997 and 2005, as well as for the July 2000 Bastille day storm and November 2001 superstorm using Geotail and OMNI data at 1 AU, respectively. Five interplanetary parameters of IMF Bz, By and totalB components along with proton density and velocity of solar wind are used as the original external inputs of the neural network to predict the SYM-H index about one hour ahead. For the 6 test storms registered by ACE including two super-storms of min. SYM-H< −200 nT, the correlation coefficient between observed and NARX network predicted SYM-H is 0.95 as a whole, even as high as 0.95 and 0.98 with average relative variance of 13.2% and 7.4%, respectively, for the two super-storms. The prediction for the 7 storms with WIND data is also satisfactory, showing averaged correlation coefficient about 0.91 and RMSE of 14.2 nT. The newly developed NARX model shows much better capability than Elman network for SYM-H prediction, which can partly be attributed to a key feedback to the input layer from the output neuron with a suitable length (about 120 min). This feedback means that nearly real information of the ring current status is effectively directed to take part in the prediction of SYM-H index by ANN. The proper history length of the output-feedback may


Introduction
Magnetic storm is a multi-faceted phenomenon that originates from the sun and interplanetary disturbances and manifests in the geospace of magnetosphere-ionospherethermosphere (M-I-T) coupling system.Long-lasting large southward IMF (Interplanetary Magnetic Field) enables energy transfer from the solar wind to the magnetosphere to enhance greatly through magnetic reconnection at dayside magnetopause.Large interplanetary electric fields (IEF) break into geospace, leading to dramatically enhanced and earthward shifted ring currents which cause a rapid decrease in the horizontal component of the geomagnetic field on the ground at low latitudes and characterize a geomagnetic storm development (Gonzalez et al., 1994;Gosling et al., 1991;Tsurutani et al., 1988;Daglis, 2006).Satellite observations show that storm time ring current mainly consists of ions with energy from tens to hundreds keV which drift westward over the equatorial area within 2-7 R E height.During the main phase of large storm, the content of heavy ionospheric ions (especially O + ) in the ring current (L < 4) is extremely increased.The larger the magnetic storm is, the greater the O + contributes to the ring current (Daglis, 1997).When southward IMF decreases or disappears, the rate of magnetic reconnection reduces, convection electric field turns weak, Published by Copernicus Publications on behalf of the European Geosciences Union.particles newly input into ring current decreases, the convection boundary moves outside, cold ionospheric plasma refills the exhausted inner magnetosphere, then encounters with energy particles, and thus gives rise to plasma waves.
Ring current ions loss is caused through 4 kinds of processes such as charge exchanges with neutral atoms, Coulomb collisions with plasma, pitch angle diffusion caused by waveparticle interaction, as well as drifting out of the dayside magnetopause (Daglis, 2006), and thus the ring current decays.This is the recovery phase of a magnetic storm.Among the loss processes cited above, the charge exchange is dominant for mono-charged ions (e.g., H + , O + ) with moderate energy (10-100 keV).For ions with higher energy, the loss caused by the wave-particle interaction becomes more significant due to the decrease of charge exchange section.For low energy heavy ions (less than 10 keV), the loss caused by Coulomb collision cannot be neglected.The fast decay of ring current during the recovery phase of intense storm is caused by the rapid loss of O + and He + .The hourly Dst index is a measure of the total energy content of the ring current, being used the most widely so far to quantify the geomagnetic storm activity (Sugiura and Poros, 1964;Mayaud, 1980;Rangarajan, 1989).SYM-H is another geomagnetic storm index to measure the intensity of the storm-time ring current proposed in recent more than two decades.In contrast to 1 h time-resolution of Dst, SYM-H has the advantage of much higher time resolution of 1 min.Essentially, the SYM-H index is the same as Dst (Wanliss and Showalter, 2006) except that it is calculated from a different set of stations and in a slightly different coordinate system.
In the past several decades, many studies have been devoted to the relationship between the solar wind and the geomagnetic storm (e.g., Rostoker and Fälthammar, 1967;Akasofu, 1981;McPherron et al., 1988).Various methods have been applied to predict geomagnetic disturbances as indicated by Dst index from solar wind and IMF observations, such as differential equations (Burton et al., 1975;Wang et al., 2003), statistical correlative analysis (Baker, 1986;Temerin and Li, 2002;Yermolaev et al., 2005), and artificial neural networks (ANNs) (Wu and Lundstedt, 1996;Gleisner et al., 1996;Wu and Lundstedt, 1997;Lundstedt et al., 2002;Pallocchia et al., 2006;Amata et al., 2008).Of those, artificial neural networks have shown their great ability of nonlinear mapping in Dst prediction.However, there are not any reports yet on the prediction of SYM-H index by ANN in the publications.The prediction of SYM-H index can be applied not only in magnetic storm forecasting, but also in predicting thermosphere and ionosphere parameters through empirical models where the SYM-H or Dst index plays a dominant role (e.g.Zhou et al., 2009;Ridley and Liemohn, 2002).The higher time-resolution of SYM-H index makes its prediction somewhat different from and harder than the Dst index.
In this study, an artificial neural network (ANN) of Nonlinear Auto Regressive with eXogenous inputs (NARX) has been developed for the first time to predict SYM-H index about one hour in advance from solar wind and IMF parameters.The newly developed NARX model shows much better capability than Elman network in SYM-H prediction.In the following sections we will firstly present the data used to establish the SYM-H prediction model of NARX NN and the parameters selected as the external inputs.Then a brief description is given on the architectures and training methods for Elman recurrent neural network (ERNN) that we have ever tried, as well as the newly developed NARX network.Afterwards, the performance and prediction results of the two ANN models are shown and compared with each other.Finally, a brief discussion and summary is given along with ideas for further worthy studies.

Data sets and external inputs
The solar wind and IMF data observed from ACE satellite at Lagrange point L1 are used to establish the neural network model to predict SYM-H index.The ACE data are obtained from CDAWeb site of NASA and the SYM-H index data are provided by World Data Center for Geomagnetism, Kyoto.The solar wind and IMF data are sampled at rates of every 64 and 16 s, respectively, while SYM-H index at 1 min.To unify the temporal resolution, they are all averaged every 5 min.Therefore, all the data we used to establish the ANN model have the same time resolution of 5 min.In this study, we collected 73 time intervals that contain storm events with the minimum SYM-H value less than −85 nT during 1998-2006, of which 67 intervals comprising 66 800 samples are used to train the neural network and the other 6 intervals are used for testing.The selection of the interval is constrained by the data availability and quality of ACE observations in addition to the threshold of −80 nT for minimum The selection of inputs has an important impact on the result of ANN prediction.In previous studies of Dst prediction by ANNs, different combinations of solar wind and IMF parameters with different time-length (i.e., the correlation time between each input and Dst) have ever been used as the ANNs' inputs, such as group of n, V and IMF B z with 1 h correlation length (Lundstedt et al., 2002) where n is the proton number density of solar wind, and V the solar wind velocity, and group of only IMF of B z , B 2 y and ( Pallocchia et al., 2006) and others.In the present study we use the parameter combination of n, V , B z , B y and B with 90-min history length for each parameter as input to predict SYM-H index 60 min ahead.Thus, our prediction model has totally 90 input nodes, each 18 for B y , B z , B, n and V .The input I at moment t and the corresponding output O are: In order to make the neural network converge more easily, all the input parameters are normalized to [−0.8, 0.8] before used in ANN.

ANN models
In this study, we introduce for the first time the NARX neuron network to the prediction of magnetic storm index SYM-H with time resolution of a few minutes and compare it with usual Elman network which enable already very good prediction for Dst index with one hour resolution.In this section we present first the architectures of the two kinds of neural networks.

Architectures
Both of the Elman and NARX artificial neural network are based on the two-layer BP network, having different layout for global feedback.So, we begin with a brief presentation of the so-called BP network.

BP neural network
BP (back-propagation) neural network is a kind of classic feed-forward network using error back-propagating algorithm for its training.It is usually a two-layer network with one hidden layer when used in the Dst prediction.   is only one neural in output layer.The activation function is the hyperbolic tangent for the hidden layer and linear one for the output layer (Haykin, 1999).For the input I = {I 1 ,I 2 ,...,I M }, the output of the j -th hidden unit, H j , is where I i is the value of input node i, M is the number of input nodes, w j i is the connecting weight between the input node i and the hidden neuron j , and b j is bias of the hidden neuron j .And the output O is where S is the number of hidden neurons, w oj is the connecting weight between the hidden neuron j and the output neuron, and b o is bias of the output neuron.

Elman neural network
An Elman neural network is a two-layer BP network with feedbacks from hidden neurons to input (Elman, 1990), being called also Elman recurrent network (ERN).Figure 3 shows its architecture.The input of an ERN consists of two parts, one is true external input I = {I 1 ,I 2 ,...,I M }, and the other is feedback input C = {C 1 ,C 2 ,...,C S }.The feedback input nodes are called context nodes or context units.The value of context unit l at time step t is simply a copy of the hidden output at time step t − 1, i.e.
For the Elman network, the j -th hidden unit output H j is  Context units are considered containing history information of the network status.And it is proved that the Elman neural network can give very good performance of Dst prediction (Wu and Lundstedt, 1997).
The Elman ANN used in this study has various numbers of original external input nodes, 12 hidden neurons and 1 output neuron.

NARX neural network
We introduce a new kind of feedback neural network of Nonlinear Auto Regressive with eXogenous inputs (NARX) model (Haykin, 1999) to predict the SYM-H index from IMF and solar wind.The NARX network is also a two-layer BP network with a time-delay feedback connection as shown in Fig. 4. The input of network consists of two parts as well: true input and context input.In contrast to ERN, the context input for NARX is not from the hidden layer but from the output layer with some certain time-delay.Assuming that I(t) and O(t) are the true input and output at time step t and the length of feedback time delay line is L, the context input is The context nodes fed back from the output contain the being predicted parameter's information with much more steps of history than the Elman network.The determination of the optimum history length for feedback, i.e., the proper number of the context units, will be investigated in detail in Sect. 4. The NARX ANN used in this study has 90 original external input nodes, and 12 hidden neurons as well as 1 output neuron.

The training method
The Momentum Back-Propagation algorithm is used to train the 2 kinds of networks.The course of learning all the samples for once is called an epoch.For each epoch, the cost function is defined as where T k and O k are, respectively, the target output and the actual output of the output neuron, N is the number of training samples.For each connecting weight w at the n-th epoch, it is updated according to where η is the learning rate and α is the momentum parameter.For connecting weights between the hidden and output neurons, they are adjusted following For connecting weights between the input nodes and the hidden neurons, the adjusting is as follows where the input I = {I i } includes the context input for Elman and NARX network.While learning, η and α are adjusted dynamically.According to the change of cost function, E, the adjusting rule is where a and b are positive constants and a > 1 and b < 1, α 0 is the original momentum parameter, ε is a pre-defined scale value that is usually less than 5%.In this study, we set the training parameters as: a = 1.04, b = 0.7, η 0 = 1, α 0 = 0.7 and ε = 0.03.
www.ann-geophys.net/28/381/2010/Ann.Geophys., 28, 381-393, 2010 When the neural networks are training, the accuracy parameters of correlation coefficient ρ and root mean square error (RMSE) as well as the averaged relative variance (ARV) are calculated for the training samples at the end of every training epoch.The accuracy parameters of ρ, RMSE and ARV are defined as (Wu and Lundstedt, 1997) where N is the number of examples, T and O are the average values of the observed and the predicted SYM-H, respectively.
At the beginning of training, RMSE decreases quickly.With the epoch number increasing, the error decreases more and more slowly, converging gradually to an asymptotic value.Assuming that RMSE min (n) is the minimum of RMSE till epoch n, we define a parameter of γ as to judge if the training should finish at epoch n.When γ (1000) < 0.001, the training is thought to be close to the limit and should be stopped.

NARX ANN training with different outputfeedback lengths
A sequence of NARX networks with different length L of the output feedback time-delay line (i.e.number of the context nodes) has been trained using the 67 storm intervals in order to find the most proper value of L and its relationship with the time scale of the ring current decay.The value of L ranges from one (corresponding to 5 min) to 1152 (4 days) with different time resolution, being taken as: 1, 2, 4, 6, 8, 10, . . ., 24,26,64,144,288,576,1152.

Elman ANN training with different history length of input
All the 67 storm intervals are also trained using a series of Elman network with different history length of the external input to see the performance of the Elman network for SYM-H prediction.The history length of the input was taken as 90 min, 240 min, 480 min, and 720 min, separately.

Prediction testing results
We use the parameters of ρ, RMSE and ARV cited in Sect.3.2.1 to characterize the network prediction performance.It can be seen from Fig. 5 that the optimum value of L is 24 (about 120 min) when the prediction performance gets a maximum correlation coefficient of 0.95 between the observed and predicted SYM-H and a minimum RMSE of 17.12 nT for all the 6 testing events as a whole.It seems that the delay time length of the output feedback would be at least larger than 60 min so as to get fair prediction performance.If the length is less than 40 min, the prediction performance is poor.On the other hand, larger length about 1-2 days seems still to provide not so bad prediction performance.The prediction model is still the NARX network with L = 24 which is trained using IMF/SW data from ACE satellite.

Prediction results by using ACE data for 6 testing events
Because WIND is nearer to the earth than ACE and the distance varies in a large range, the preact time of prediction is corrected for each event according to the distance between WIND and the earth at the event occurrence.Table 3 lists the prediction results of the 7 events.These test results show again that the NARX model has good performance in prediction of SYM-H.As a whole of the 7 events, the three performance parameters have the values of ρ = 0.91, RMSE= 14.6 nT, ARV= 18.0%.The best one has the performance parameters of ρ = 0.966, RMSE= 8.58 nT and ARV= 7.3%, as seen in Table 3. Figure 7 shows the SYM-H index predicted by NARX neural network with 24 context units in comparison with the observed SYM-H for the 7 testing storm events whose IMF/SW data are from WIND satellite measurements.
Besides, in order to validate our NARX model, we have also tried to predict the SYM-H index for the superstorms of July 2000 and 24 November 2001 using IMF/SW data from Geotail and OMNI at 1 AU, respectively.Data from Geotail and OMNI can be treated as from dayside magnetospause, so the prediction using these data should be of short time ahead.In the testing, the prediction is one point (5 min) ahead.Figure 8 gives the prediction results, which also show acceptable good performance as: ρ = 0.89, RMSE= 37.3 nT, ARV= 23.5% for July 2000 storm, while ρ = 0.91, RMSE= 30.2 nT, ARV= 30.2% for 24 November 2001 storm.
The prediction results for the 9 events using IMF/SW data from rather than ACE satellite indicate that the NARX prediction model we established has potential capability to predict SYM-H index using input data of IMF/SW from other sources.

Comparison with Elman neural network
To make clear if the Elman network is also effective in SYM-H prediction as it is in Dst prediction, we have made a series training of the Elman networks with different input time delay lines of 90, 240, 480, 720 min (i.e., 1.5 h, 4.0 h, 8.0 h, 12 h), using ACE IMF/SW data for the 67 storms.And, a series of tests are then carried out for the 6 great storms.
Table 4 gives the prediction performance of Elman network with different input time delay lines.When using 90 min length of IMF/SW wind data as input, the testing result shows that the correlation coefficient between the predicted and the observed SYM-H is 0. (e) (f) Fig. 6.Comparison of observed SYM-H index with that predicted by NARX neural network with output-feedback delay time length of 120 min for testing storm events using ACE data.as a whole increased up to: ρ = 0.86, RMSE= 26.2 nT, ARV= 26.2%, which is still not better than NARX model (ρ = 0.95, RMSE= 17.1 nT, ARV= 10.7%).Besides, longer input time length degraded greatly the applicability when only short time qualified IMF/SW data are available (see Fig. 9e).

Comparison with previous Dst prediction by ANN
A practical comparison between the SYM-H prediction performances of our NARX model with previous ones is indeed hard to be made, since the prediction of SYM-H index has ever not been reported before.However, studies of Dst prediction by neural network has been conducted for more than ten years, among which, a detailed study on the prediction of Dst by Elman network is made by Wu and Lundstedt (1997).A comparison between their Dst prediction results and our SYM-H prediction results by NARX model will be made as follows.
In the study by Wu and Lundstedt (1997), IMF/SW data with history length of 1, 4, 8 and 12 h are taken as input to predict the variation of Dst index and a quite accurate prediction is achieved.Table 5 shows the comparison between their Dst prediction result and our SYM-H prediction result by NARX model, from which we can find that the accuracy of SYM-H index prediction by NARX model is comparable with that of the prediction of Dst by Elman neural network.

Discussion and conclusions
In this study, a new kind of neural network has been introduced for the first time to predict storm time SYM-H index.Its prediction capability is examined and compared with Elman network and previous work for Dst prediction.The NARX model introduced in this paper shows much better capability than Elman network for SYM-H prediction, which can partly be attributed to a key feedback to the input layer from the output neuron with a suitable time delay length.An extensive work of training and testing shows that when the length of the output feedback sequence (i.e., the context unit number) L is 24 corresponding to a time length of 120 min, the NARX network gets the best performance.
It shows that although Elman network can make very excellent prediction of Dst, it is not able to predict the variation of SYM-H index with satisfied high accuracy at least in its present frame.The prediction errors are particularly large www.ann-geophys.net/28/381/2010/Ann.Geophys., 28, 381-393, 2010 near the maximum of the main phase (minimum SYM-H) and for the recovery phase.This may be attributed to a lack of the inner status information of the ring current in the network input.The input of the BP network is only the parameters of outer space containing nothing on the information of inner status of the magnetosphere, while the context units of the Elman network fed back from the hidden neurons contain the history information of inner status only 1 step, i.e. 5 min in this study.This length is too short to catch the characteristic time scale of RC decay that is at least tens of minutes.
The development, evolution and decay of the ring current are controlled by both aspects of solar wind driving and magnetospheric internal physical status.The solar wind driving dominates the initial and main phase of the storm, which can be somewhat easily implemented by taking their key parameters with proper history length as the external input of the network.The internal status of the magnetosphere affect all the course of RC growth, evolution and decay, especially dominate the recovery phase of the storm.How to effectively bring the internal status information of the ring current to take part in the prediction is the key point to promote the prediction performance.The global feedback in NARX network developed in this study directs the predicted SYM-H which contains the nearly real ring current information into the external input, thus provides an effective way to get this point.Further, how to shape the feedback to contain necessary and sufficient information for a good prediction becomes an important problem.Here the time scale of the feedback should match the characteristic time of the ring current decay.The time scale of the ring current decay ranges from a few tens of minutes to more than tens of hours (Gonzalez et al., 1989;Daglis, 2001).Too short delay time of the feedback, during which the ring currents do not get change significantly, would contribute little to the prediction.An amount of trials (taking the L from 5 min to 24 h or more) indicates that the output-feedback length seems to be at least larger than about 60 min so as to get fair prediction performance, while it does also using feedback length as long as 1-2 days.The proper history length of the output-feedback in our NARX model is about 120 min.This length may mainly reflect the characteristic time of ring current decay on average and weighted especially for the early recovery phase, which involves various decay mechanisms with ion lifetimes from tens of minutes for pitch angle scattering by EMIC wave-particle interactions (Jordanova et al., 1997(Jordanova et al., , 2001) ) to tens of hours or more for charge-exchange and Coulomb collision loss etc.This proper output-feedback delay time length may somewhat change with the collected storm samples, because different type of individual magnetic storm would have different characteristic evolution and decay time.
Different combinations of solar wind and IMF parameters for input and different time length of input parameters would affect the prediction performance of the NARX network, which have not been verified yet in this study.So the present NARX network and its prediction result are not necessarily the most optimum and the best.They will be examined in an accompanying investigation.However, our study has proved the great advantage of the NARX neural network in prediction of storm-time SYM-H index.
The Elman neural network has potential capability to catch more complex processes due to the connections from the hidden layer to the output.It might be true in principle that the Elman neural network could make storm prediction at least as well as NARX even for higher resolution data.But it needs major improvement by using new techniques like pruning method etc.It is worthy of optimizing Elman network and other ANNs in SYM-H prediction with some new techniques in the further study.
optimum length of the output feedback timedelayAs described in Sect.3.2.2, a sequence of NARX networks with different length L of the output feedback time-delay line (i.e.number of the context nodes) has been trained in order to find the most proper value of L. All the trained NARX networks with different L are tested for the 6 storm events using IMF/SW data from ACE satellite.The test results are shown in Fig.5, which gives the variation of the prediction performance (characterized by the three parameters of ρ, RMSE and ARV) with output-feedback delay time length of L.

Fig. 5 .
Fig. 5. Prediction performance by NARX network versus feedback delay time length of L. (a) For L value from 0 to 24 hours; (b) A zoom-in view of (a) for L value from 5 to 130 minutes.

Fig. 5 .
Fig. 5. Prediction performance by NARX network versus feedback delay time length of L. (a) For L-value from 0 to 24 h; (b) A zoom-in view of (a) for L-value from 5 to 130 min.

Fig. 6 .
Fig. 6.Comparison of observed SYM-H index with that predicted by NARX neural network with output-feedback delay time length of min for testing storm events using ACE data.

Fig. 7 .
Fig. 7.Comparison of observed SYM-H index with that predicted by NARX neural network with 24 context units for each testing storm event whose IMF/SW data are from WIND.

Fig. 10 .
Fig. 10.Correlation plots of the SYM-H indices observed and predicted by Elman and NARX neural networks.(a) For the Elman model with input history length of 90 min; (b) For the Elman model with input history length of 720 min; (c) For NARX model with feedback time delay length of 120 min

Fig. 10 .
Fig. 10.Correlation plots of the SYM-H indices observed and predicted by Elman and NARX neural networks.(a) For the Elman model with input history length of 90 min; (b) For the Elman model with input history length of 720 min; (c) For NARX model with feedback time delay length of 120 min.

Table 1 .
Storm intervals used to develop the ANN prediction model.

Table 2
we have made further tests using IMF/SW data available from WIND satellite for another 7 great storms occurring in1995-1997 and 2005, in which the time ahead of prediction is adjusted according to the position of WIND relative to ACE for each event.Ann.Geophys., 28, 381-393, 2010www.ann-geophys.net/28/381/2010/

Table 2 .
Prediction performance by using ACE data for 6 test events.

Table 3 .
Prediction result of each test event whose IMF/SW data are from WIND.4% for 6 test examples as a whole, which is much worse than the prediction result of NARX model.It is also shown from Table4that when the input time length increases from 90 min to 240 min, the SYM-H prediction performance of the Elman ANN improves obviously from ρ = 0.68, RMSE= 38.5 nT, ARV= 54.4% to ρ = 0.86, RMSE= 26.6 nT, ARV= 26.9%.If the input time length continues to increase the SYM-H prediction performance changes very slowly.When it reaches 720 min long, the SYM-H prediction performance parameters for the 6 storms Ann.Geophys., 28, 381-393, 2010www.ann-geophys.net/28/381/2010/Comparison of observed SYM-H index with that predicted by NARX neural network with 24 context units for each testing storm event whose IMF/SW data are from WIND.
gives the spot plots of the model predicted SYM-H versus observed ones for NARX model and two Elman models with different input time lengths.Figures 9 and 10 show clearly that the NARX model does exceed the Elman network in SYM-H prediction.

Table 1 .
Storm intervals used to develop the ANN prediction model

Table 4 .
Prediction result by Elman ANN with different length of input history.

Table 5 .
Comparison of Dst prediction by Elman network with SYM-H by NARX ANN.