the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Predicting Geomagnetic Indices for Space Weather Applications in Solar Cycle 25
Abstract. This study investigated the relationship between geomagnetic indices (Ap and DST) and solar activity with non-metric multidimensional scaling (NMDS) and the novel LSTM+ forecasting model. The NMDS analysis revealed a stronger association of Ap with overall solar activity and solar wind conditions compared to DST, highlighting the influence of elevated plasma flow speed and proton temperature on geomagnetic disturbances. The LSTM+ model, incorporating a dynamic reforecast procedure, demonstrated high accuracy in predicting Ap and DST, achieving strong performance metrics for SC-24. Based on the model and historical trends, the peak Ap and trough DST for SC-25 are projected to occur between May 2026 and January 2027, aligning with the observed lag between sunspot number and geomagnetic indices. These findings enhance our understanding of solar-terrestrial interactions and provide a valuable tool for space weather prediction, crucial for mitigating potential impacts on technological infrastructure.
- Preprint
(611 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 29 Nov 2024)
-
RC1: 'Comment on angeo-2024-20', Ana G. Elias, 31 Oct 2024
reply
This work presents a very interesting analysis of Ap index and Dst, which are a measure of geomagnetic activity or geomagnetic storms, in connection to solar activity indices and solar wind parameters. The Nonmetric Multidimensional Scaling (NMDS) statistical method is applied to the whole set of parameters, and then a machine learning method, the Long Short-Term Memory (LSTM) Networks, is applied to forecast Ap and Dst annual mean values along solar cycle 25 (from 2019 and ~2029).
I consider this work acceptable for publication, though it need some clarifications based on the comments I outline below. I also detail some minor errors at the end.
Major comments:
(1) I think that it is important to mention the time scale of the data series analyzed in the introduction, or somewhere at the beginning of the work, since I think that it is not usual to analyze prediction methods in interannual time scales for geomagnetic activity indices, since the importance of their forecast for space weather purposes is in general in much shorter timescales, as hourly or daily, which the timesacle of geomagnetic storms and solar disturbances.
(2) Line 19: I don't agree in that "heightened solar activity", which I think the authors refer to high solar activity level, is synonym of "solar storm". You can have high solar activity levels with no solar storms, and also solar storms during low solar activity levels.
(3) Line 76: Where you mention "The horizontal axis (Dimension 1) appears to represent the overall level of solar activity, with higher values corresponding to increased activity. The vertical axis (Dimension 2) seemingly captures the nature of solar wind disturbances, with positive values associated with enhanced plasma flow speed and proton temperature, and negative values linked to reduced solar wind pressure and geomagnetic storm intensity."
Why Dimension 1 "appears to represent"? Is this not for sure? And as I understand, the interpretation of negative values that you give would not agree with Dst located in negative values, since the more negative Dst is, it indicates a stronger storm. Maybe I am missing something here.
(4) Line 80: What is mentioned in all this paragraph is something that can be also deduced from the correlation between Dst and Ap with with each of the indices that you analyze. That is, the direct association between Ap and all solar indices and solar wind parameters, and an inverse correlation with Dst but lower than the values with Ap. I list the values of the squared correlation coefficient in the following Table, based on annual mean values of each parameter:
Â
Ap
Dst
B
0.79
0.73
T
0.82
0.53
V
0.54
0.30
A/P
0.63
0.60
Pressure
0.75
0.61
Rz
0.33
0.40
F10.7
0.32
0.41
Sunspot Area
0.40
0.47
Dst
0.77
1.00
Ap
1.00
0.77
Â
(5) In Figure 1, Dst appears far from Ap, even though they have 77% of common variance, as can be notices in the Table above. The only difference is that they vary in counterphase. Why are they so apart? Maybe I do not understand the methods correctly.
(6) Line 42: I would start a new paragraph with the sentence, "Analysis of Ap index forecasting performance ..." And what is the source of the data in this phrase? I mean, the correlation coefficient's' values mentioned. I understand they come from Paouris et al. (2021) work, but I think that here they lack context. Maybe you can add more explanation.
(7) Lines 134-136: The normalization within each solar cycle, is it necessary? Have you repeated your analysis with a normalization of the whole period, or without normalization at all?
In doing this you are loosing an important aspect of geomagnetic activity that is its intensity. I see however that Ap along solar cycle 24 is significantly lower than in previous cycles (as is noticed in the figure I attached), and Dst is closer to zero along this cycle, accordingly.
I tried to reproduce your data and plot them with the Ap and Dst data prior to normalization, and they really look different. I think that you should discuss more on this procedure that you applied to the data. Or at least the consequences they have. One of them is that the value you obtain of the peak has nothing to do with the true value expected at the peak you detect. This should be highlighted, unless I am not understanding the analysis correctly. I am attaching the figure I made in the case of Ap where you can see clearly what I am mentioning here.
(8) Line 168: In the sentence "Furthermore, the LSTM+ model successfully identifies the timing of the AP maximum and the DST minimum, both projected to occur in September 2026." How do you know that the prediction is successful when September 2026 has not occurred yet?
And, can you explain further the following sentence "This prediction aligns with historical observations across multiple solar cycles". Or the explanation is the following paragraph? If this is the case, I think that the sentence should go then in the same paragraph.
(9) In Figure 3, I notice that the gray lines (that is the observed values) have a departure from the predicted values larger than that observed in Figure 2 (for Ap and for Dst) along SC 24. Of course that years 2019-2023 that are seen in Figure 3 are not included in Figure 2 (the two panels), but I expected a better agreement for the period with data as in your previous cases. What happened here? Please check.
(10) I guess that in the LSTM+ method you included all the solar and solar wind indeces. What is the purpose then of the NMDS analysis? Again maybe I am not understanding well all the methodology.
(11) When you mention " However, the development of geomagnetic storms, as indicated by DST, appeared to be influenced by a confluence of factors rather than being solely attributable to individual solar parameters."
Don't NMSD consider all the parameters combined? If not, a simple regression analysis would have served to conclude this. In fact, the correlation coefficient of a linear regression of Ap and all the parameters you consider (except SFI since I was not able to find the concatenated series for the period) is 0.96. And in the case of Dst it is 0.83. That is 96% of the variance of Ap is explained by the set of the 8 out of the 9 solar activity and solar wind parameters you considered, and 83% of Dst variance.
How the NMSD method improves the understanding of a high correlation in this case?
(12) In the conclusion, I do not really see how "These findings contribute significantly to our understanding of the intricate relationship between solar activity and geomagnetic fluctuations.". I see that they confirm an intricate relationship, but I do not see the understanding that this analysis adds.
Â
Minor comments:
Line 29: I think that instead of "diminishment", "decrease" is better.
Line 34: In "Zhang et al." the year is missing.
Line 37: " stormss" should be " storms"
Line 37: In "Nilam and Ram" the year is missing.
Line 39: "(Nilam and Ram, 2022)" could be deleted here, since the sentence deals with their resutls, mentioned at the beginning.
Line 39: In "Abduallah et al." the year of this reference us missing.
Line 80: I think that " DST" should be "Dst". Check this in all the manuscript and check please if it is more correct to use it as "Dst" instead of "DST".
-
RC2: 'Comment on angeo-2024-20', Anonymous Referee #2, 11 Nov 2024
reply
General comments:
The preprint investigates annual averages of the geomagnetic activity indices Ap and Dst in two respects. First, their relationship with various solar activity proxies and solar wind parameters for past solar cycles 21-24 (1976-2019) is analyzed using an ordination technique (nonmetric multidimensional scaling – NMDS). Second, the timing of their maxima in solar cycle 25 (2019-2029) is predicted using a type of recurrent neural network (long-short term memory – LSTM).
In my opinion the manuscript contains new ideas that are presented using fluent language. However, I think that there are fundamental shortcomings concerning the acknowledgement/incorporation of the international state of research, the substance of the conclusions, the pertinence of the title and abstract, and the overall clarity of the presentation. Hence, I believe that the manuscript may be eligible for publication after additional work and resubmission.
Specific comments:
- The title doesn’t specify what is predicted and for what purpose (see also comment 3). I suggest to clarify these points in a revised title.
- The abstract doesn’t report the manuscript’s contents and findings in sufficient detail. I suggest to clarify/specify the following points:
- Lines 7-8: You work on an annual time scale. Without this information one would expect the native time scales of the chosen indices (1-hour Dst, 1-day Ap).
- Line 8: As far as I understand, the LSTM+ model is not entirely new, but it is an adaptation of your previously published LSTM model where the "+" represents a new forecasting procedure (see lines 96-99). This information is important to accurately delineate the contribution of this study.
- Lines 11-13: I suggest that you quantify the expressions "high accuracy", "strong performance metrics"‚ "between May 2026 and January 2027" (you state "September 2026" in line 169) and "observed lag between sunspot number and geomagnetic indices" in order to strengthen the expressiveness of the abstract.
- Lines 13-15: I fail to follow your claims on your study’s relevance in the space weather context (see also lines 56-57, 200-203). First, different dependencies between solar activity proxies / solar wind parameters and Ap on the one hand and Dst on the other are known from previous work. I suggest that you explain in detail what elements of the NMDS results are new (or unexpected) with respect to the current status of knowledge (see also comments 8, 16c). Second, given the existing predictions of sunspot number and radio flux (e.g., https://www.swpc.noaa.gov/products/solar-cycle-progression) and that the delay between those and annual geomagnetic activity appears to be known (see comment 23), I suggest that you explain for which specific use cases your results offer additional merit. Those discussions could be placed fittingly into the "Results and Discussion" section (see comment 24).
-
Lines 17-18: Co-rotating interaction regions (CIRs) are also among the most important solar wind structures that drive geomagnetic storms (see, e.g., Richardson & Cane, 2012, JSWSC) and should be mentioned in this context.
-
Line 19: "Periods of heightened solar activity" are not "known as solar storms". I suggest that you distinguish more clearly between (give definitions for) the terms "solar/geomagnetic activity" and "solar/geomagnetic storms".
-
Line 25: The reasons for choosing Ap and Dst specifically from the various existing indices (6 IAGA-endorsed ones https://isgi.unistra.fr/, excl. their multiple derivatives) are not explained convincingly (see also lines 31-32). First, it depends on the specific use case which index (or combination of indices) is "key" so I suggest that you give examples of relevant use cases. Second, I suggest that you add an explanation why you pick a derivative (Ap) over its parent (ap, and ultimately Kp).
-
Lines 28 & 30: I suggest to replace the citation "Mayaud, 1980" (book) with the original works in which Kp (Bartels, 1949) and Dst (Sugiura and Kamei, 1991) were first introduced. Wherever else you decide to cite books (e.g., lines 18, 72) I suggest to add chapter numbers marking the location of the relevant information.
-
Line 33: I suggest to add a paragraph here summarizing the current status of knowledge on solar wind – magnetosphere coupling functions, specifically w.r.t. Ap and Dst on annual time scales, that motivates the NMDS analysis (a starting point could be, e.g., Lockwood, 2022, Space Weather; Finch & Lockwood, 2007, Annales Geophysicae; and suggested references in comment 16c). This should address some relevant questions that currently remain unanswered: What outstanding research question do you tackle in the first part of the paper? Why do you choose the NMDS method over other methods (e.g, a principal component analysis)? In what respect do you need/use the outcome in the second part?
-
Lines 43-44: I don’t understand what "Day-0" (and "Day-2 forecast") mean exactly and suggest to add an explanation.
-
Line 61: Why don’t you use the full vector information (or at least Bz) in addition to |B|?
-
Lines 63-64: There are multiple options of data sets on the linked pages. I suggest to specify the exact ones you are using.
-
Lines 65-67: I can’t find the "SFI database version 25" online and the reference "Balch, 2009" is missing.
-
Line 71: How did you deal with gaps in the OMNI data set?
-
Lines 72-73: I suggest that you expand your description of the NMDS methodology such that all readers can follow how Fig. 1 comes about. This should include how the dissimilarity matrix is calculated exactly (Why choose Euclidean distance as dissimilarity measure?) and how the optimization process works (Why choose "Young‘s S-stress formula" as goodness-of-fit measure?), including relevant equations.
-
Line 75: Add a legend to Fig. 1 indicating what the different colors mean.
-
Lines 74-91: I am having trouble to understand your interpretations of Fig.1 (partly repeated in the abstract and the conclusions) and suggest to add explanations for the following points:
- Lines 76-79: How do you deduce from Fig. 1 what the axes could represent physically? With "nature of solar wind disturbances" do you refer to distinct solar wind categories? If so, what are they and how can I imagine them to be aligned along the vertical axis?
- Lines 80-81: Could the fact that Dst is singled out on the horizontal axis (all other quantities between about -0.75 and 0.5) be simply due to its reversed sign with respect to Ap? Have you tried using |Dst|? Would that change your interpretation of the axes (and Fig. 1 as a whole)?
- It is well known that "Dst is influenced by a combination of factors" (line 82; e.g., Burton et al., 1975, JGR) and that "plasma flow speed [...][is] linked to higher Ap values" (lines 84-85; e.g., Crooker et al., 1977, JGR) and that "high-speed solar wind streams […] can drive geomagnetic disturbances" (lines 86-87; this is not a hypothesis, see also comment 4). I suggest to highlight those findings that are new and relevant as input for the LSTM+ model.
-
Section 3.1: I can’t extract from this description how the LSTM+ model is setup exactly and suggest to add an addition figure (perhaps a diagram specifying the gates etc.) to aid the comprehensibility.
-
Line 113: I don’t think that the Nash-Sutcliffe model efficiency coefficient is a standard performance metric that one can expect readers of this journal to be familiar with (at least I don’t know it). I read that it is equivalent to the "coefficient of determination (R²)" in certain regression settings – is that the case here? If so, I suggest to call it "coefficient of determination" as this is more widely known. Otherwise, I suggest to add an explanation of why you chose this specific metric.
-
Line 126: I think that Eq. 3 should look similar to Eq. 2 given that it is supposed to be the absolute percentage error.
-
Lines 129-144: These paragraphs don’t present/discuss results but refer to the methodology and thus I suggest to move them elsewhere:
- Lines 129-133: I suggest to move this to section 3.1 and clarify which data set the LSTM+ model is trained on exactly (Is it the same as in section 2.2?) and what parameter combinations were chosen for the prediction of Ap, Dst in SC-25.
- Lines 134-144: You mentioned the standardization using z-scores in line 71. If you use the data set you prepared in section 2.2 (see above), then I suggest to move this paragraph over there and to section 3.1 otherwise.
-
Line 150: I find it noteworthy that both offsets you report here are multiples of 27-days (synodic Carrington rotation rate). This suggests to me that they could have a common physical cause. Perhaps this can be traced back to the way you define your input? I suggest to add a discussion on plausible causes in the "Results and Discussion" section (see comment 24).
- Regarding Fig. 3:
- Increase the quality by making its style comparable to Fig. 2 (incl. panel names "a" and "b").
- What is the temporal resolution of these plots? It looks like you have more than one value per year here. If so, why don’t you update the curves to show the most recent available observations?
- Why do you get a notably greater deviation between observed and predicted values for SC-25 than for SC-24 (Fig. 2)?
- Line 169: How can your predictions "align with historical observations"?
-
Lines 171-172: Where does the stated lag time (1-3 years) come from? If this refers to your own (unpublished) work I suggest to add it in more detail or give a reference otherwise.
-
The section "Results and Discussion" only refers to the second part of the study. I suggest to restructure the text such that subsections 2.1, 2.2, 3.1, 3.2 are put into one section (on methods) and all results (from both parts) are reported and discussed together.
-
Lines 203-204: I suggest to be more specific on which additional solar and geophysical parameters could be incorporated into the LSTM+ to enhance its predictive accuracy.
Technical corrections:
- Line 7 ff.: "Dst" instead of DST.
- Line 8: Explain what abbreviation "LSTM" means here (you introduce it in line 34).
- Line 11: Explain what abbreviation "SC" means here (you introduce it in line 111).
- Line 34: Add missing year in citation "Zhang et al."
- Line 37: Remove erroneous "s" in "stormss".
- Line 37: Add missing year in citation "Nilam and Ram"
- Line 39: Add missing year in citation "Abduallah et al."
- Line 41: Remove erroneous space in "inde x"?
- Lines 46-47: "are promising candidates" instead of "have shown promise"?
- Line 61: Explain what abbreviation "Na/Np" means.
- Line104: Choose different letters to abbreviate the number of hidden layers (N) and batch size (B) to avoid confusion with density and magnetic field.
- Line 110: "Performance metrics" instead of "evaluation indices"?
- Line 117: Add missing brackets around "E_T" and remove "were employed".
- Line 130 ff.: "Ap" instead of ‚AP‘.
- Fig. 2: Add "a", "b" to the two panels of Fig. 2 (similar for Fig. 3) and refer to them in the text (e.g., lines 146, 147).
- Caption of Tab. 1: "[…] for the prediction of Ap and Dst indices in SC-24 from the LSTM+ model".
- Line 163: Add missing "s" in "illustrates".
- Caption of Fig. 3: "[...] actual values of Ap [...] in SC-25 and the predicted [...].
- Lines 210-237: The formatting of the references should be revised so that the reader can find specific citations more easily (e.g., lines 223-227 is actually just one reference).
- Line 236: Add missing "l" in "Model".
Citation: https://doi.org/10.5194/angeo-2024-20-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
93 | 24 | 8 | 125 | 4 | 5 |
- HTML: 93
- PDF: 24
- XML: 8
- Total: 125
- BibTeX: 4
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1