Deep temporal convolutional networks for <i>F</i><sub>10.7</sub> radiation flux short-term forecasting

Wang, Luyao; Zhang, Hua; Zhang, Xiaoxin; Peng, Guangshuai; Li, Zheng; Xu, Xiaojun

doi:https://doi.org/10.5194/angeo-42-91-2024

Articles | Volume 42, issue 1

https://doi.org/10.5194/angeo-42-91-2024

© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/angeo-42-91-2024

© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 42, issue 1

Regular paper

|

12 Apr 2024

Regular paper |

| 12 Apr 2024

Deep temporal convolutional networks for F_10.7 radiation flux short-term forecasting

Luyao Wang, Hua Zhang, Xiaoxin Zhang, Guangshuai Peng, Zheng Li, and Xiaojun Xu

Download

Final revised paper (published on 12 Apr 2024)
Preprint (discussion started on 15 Aug 2023)
Supplement to the preprint

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-1801', Anonymous Referee #1, 06 Sep 2023

I have read the manuscript "Deep Temporal Convolutional Networks for F10.7 Radiation Flux Short-Term Forecasting" by Wang et al. The authors present a new approach to build a predictive F10.7 model which exhibits very promising results. Nevertheless, I have two major comments.
1) The language used in many cases is really bad and confusing to the reader. Please take careful care of the syntax and rewrite the manuscript where needed. I have also pointed out several cases in the attached pdf file.
2) The authors use approximately 4 solar cycles for the training of the ML scheme and 1 solar cycle (solar cycle 24) as a test dataset. Even though this is a pretty usual technique to validate a model, it can potenitally lead to significant misconceptions. This is due to the fact that the solar cycle 24 is quite weak compared to previous cycles (this is also something that it is not discussed in the text at all). A more robust technique would be to use an iterative leave-one-out method, which is described in detail in Aminalragia-Giamini et al. 2020 (https://doi.org/10.1051/swsc/2019043). A suggestion could be that the authors leave iteratively one solar cycle out as a test dataset and rerun the model each time (e.g. keep SC23 as test dataset and train the model with the rest SCs, then keep SC22 as test dataset and train the model with the rest SCs, etc.). In the end they can evaluate the metrics (MAE, RMSE, etc.) using the predictions of all solar cycles.
I have several minor and syntax comments as well, which are provided in the pdf file attached.

Citation: https://doi.org/10.5194/egusphere-2023-1801-RC1
- AC1: 'Reply on RC1', lu yao wang, 02 Oct 2023
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-1801/egusphere-2023-1801-AC1-supplement.zip
  
  Citation: https://doi.org/10.5194/egusphere-2023-1801-AC1
RC2:
'Comment on egusphere-2023-1801', Anonymous Referee #2, 13 Sep 2023
General comments
In this paper, the authors describe and investigate a deep-learning-based model dedicated to forecasting the daily F10.7 index. The model is constructed using an architecture known as Temporal Convolutional Network (TCN), which employs specialized convolutional kernels designed for processing sequential data like time series. The authors provide forecasts for the F10.7 index ranging from 1 to 3 days ahead and evaluate their model using three common metrics: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Pearson's correlation coefficient. They compare their model's performance to forecasts generated by the US Space Weather Prediction Center (SWPC) and another so-called auto-regressive (AR) model and observe that their new model significantly outperforms the SWPC and AR baselines. Additionally, they compare it to other deep learning-based models and conclude that their model represents an improvement over existing architectures, including Long Short-Term Memory (LSTM) networks. The key interest of the paper is the quality of these forecasts and their significant improvement over the state of the art, which would be worthy of publication.
In my opinion however, this manuscript is marred by a significant number of flaws, both in content and form, which prevent me from recommending it for publication unless it undergoes a thorough revision.
Indeed, the quality of the English used in this manuscript is at times rather weak, and many sentences are unclear or become ambiguous as a result. I understand that writing a manuscript in a foreign language can be arduous and require considerable extra effort. That's why I suggest that authors, if they have the opportunity, have their manuscript proofread by an English-speaking person who can help them improve its fluency. In particular, I recommend avoiding overuse of the passive voice, which can make reading confusing, and thoroughly checking the syntax.
More problematically, the authors occasionally employ exaggerated statements, that are inappropriate for a scientific manuscript and lacks supporting references. This is particularly evident in section 2.3, where they describe the methodology (please refer to specific comments below).
I also find several aspects regarding the architecture of the proposed model and the comparisons with other models somewhat unclear. Additionally, their assertive conclusions seem disproportionately strong given the limited scope of their comparisons, often relying on a single metric. In my view, there isn't sufficient evidence to substantiate the authors' conclusions yet and more comprehensive comparisons with the baseline models are needed.
My comments and recommendations are listed below.
Specific Comments
L. 29-30: I disagree with the author’s definition of a time series. A time series does not necessarily have a fixed interval between points.

L. 33-35: The authors enumerate several institutions that are presumed to be the primary sources for F10.7 forecasts. Instead, the authors could simply list and reference the forecast models, as they do shortly afterward. Alternatively, if they wish to mention institutions offering operational F10.7 forecasts, they should rephrase their sentence to eliminate any implication of ranking. Without a credible source, this ranking lacks scientific validity and holds no relevance in a scientific article.

L. 58: The authors should provide a reference to back up their claims about RNNs (which, as far as I know, are correct).

L. 59-60: I fail to see how the TCN's capacity for parallelizing calculations could be a result of reading data faster (in fact, the causal relationship might be the reverse). If the authors intend to highlight that TCN trains faster than an LSTM, they should consider rephrasing their sentence.

Section 2.1: To complete their presentation of the F10.7 index, we suggest that the authors include a histogram of the distribution of possible F10.7 values, along with an autocorrelation plot. This would likely be very helpful for readers who are not very familiar with the study of this index.

Section 2.1: The authors only use a train set and a test set, that is also used for validation. I strongly recommend that the authors use a split into three sub-sets: training, validation and testing. Indeed, when the same set is used for both validation and testing, it can lead authors to select an architecture and model hyperparameters that yield optimal results on this particular set. This approach is occasionally adopted when the dataset is too small to be divided into three adequately sized subsets, but this is not the situation here.

Section 2.1: The authors use solar cycle 24 as their test case. This solar cycle is known as a very low-activity solar cycle. It would be more interesting to have test results on another cycle, such as cycle 23. To achieve this, the authors could implement cross-validation or, at the very least, train the model using a separate data split distinct from the initial one. It appears that the authors used the years 2003 and 2004 for testing as well, but it is unclear if they used the results from the training set, or if they trained a different model (see comment 24).

Section 2.1: The authors mention they use processed data, but do not explain how they processed it. Please elaborate.

Section 2.2: Surprisingly, this section consists of just one table. We suggest that the authors merge this section with sections 2.3 and/or 2.4, or put this table in the Appendix. On a side note, it's nice to see that this model can run on a "normal" PC and doesn't require a large compute server, which should make it easier to replicate.

L. 80-81: This sentence is just a spectacular publicity stunt, without any scientific value, and is not supported by any source. It should be removed.

L. 83-85: As per my understanding, the fact that TCNs do not suffer from information leakage from future time steps is what distinguishes it from classical 1D-CNNs and not from RNNs. Besides, it has nothing to due with RNNs “gate mechanisms”. Please comment and/or correct.

Section 2.3: It would help if the authors could define what the vanishing gradient problem is, and explain a little more clearly how the TCN's particular architecture helps to avoid it.

Section 2.3: I find the overall explanation of the TCN to be unclear. Specifically, there is ambiguity regarding the terms “causal convolution”, “inflated causal convolution” (Line 101), “causal extended convolution” (Line 106), and “dilated causal convolution” (Figure 4). This section should be revised to reduce confusion.

L. 131: Define ReLU.

L. 132: Are weights normalized with Batch Normalization? Please clarify.

Table 2: I find this table very confusing.
How is the batch size “None”?

What unit is the step length?

Clarify what input “dimension” and “shape” are.

Why do “tcn_layer.receptive_field” and “Dense(1)” parameters not have a value?

Section 2.5: Although MAE, RMSE, and R are indeed three commonly used measures, they may not offer a comprehensive assessment of a model's quality.
I recommend that the authors consider following the guidelines provided by [1] and include additional bias and/or discrimination metrics, such as Probability of Detection and False Alarm Rate, to enhance the completeness of their evaluation.

It would be even more beneficial if the authors could assess the timeliness of their forecasts using Dynamic Time Warping (DTW)-based metrics, as demonstrated by [2], [3].

Section 3:
It is very surprising that the model seems to perform sometimes better for 3-days ahead forecasts rather than 2-days or even 1-day ahead forecasts. Can the authors please comment?

Since the authors indicated earlier that one of the advantages of a TCN is its training time, I suggest that the authors indicate the training time of their model.

Figure 5:
The Figure is too small to be useful. Any “persistence-like” (i.e. using the last true observation as the forecast) behavior would be hidden. Please resize and zoom-in, one or two month(s) of data is probably enough.

In addition, sub-figure vertical axes should start at 0.

Section 3 and Figure 6:
The authors should describe and reference the model used by the SWPC to forecast the F10.7 index.

They should also detail how they got these SWPC results (did they reproduce it? Did they download it?).

It is also unclear why the comparison is limited to the years 2009 to 2013 and does not extend until 2019. Please provide an explanation for this choice and, if possible, complete the evaluation with the remaining years.

Section 3 and Figure 7:
it is unclear to me which model is the so-called “AR” model. Please clarify.

It is again unclear why the model is evaluated only for those specific years. Please explain and if possible complete the evaluation with the remaining years.

Section 3 and Table 4:
The authors are now using the years 2003 and 2004 to evaluate and compare their model. Did they train another version of the model with a different train/test split? Please comment.

Please comment why the RMSE is the only measure used for comparison purposes. In my opinion, the authors could additionally use at least one measure of correlation and one of bias (or even a measure of training time) in their comparison.

Please clarify if all the BP and LSTM results were reproduced or obtained from their original authors. If the results were reproduced, it would be beneficial to have a description of the models’ architectures and training procedures (at least as an Appendix).

I find it rushed to assert that the TCN architecture is intrinsically better than an LSTM for predicting the daily value of F10.7 when only one evaluation metric is used, over only two test years (even if these are years of high activity), without knowing anything about how the LSTM was trained (same sequence length?). In my opinion, the authors should comment on these points, and probably be more careful when asserting that TCN is a significant improvement over an LSTM. Additional metrics, or even figures to back up their claim, would be more convincing.

Conclusion: The conclusion should probably be reworked and tempered after the above points have been addressed.

Technical Comments
Many typing errors, e.g. on lines: 22; 25; 37; 42; 48; 50; 53; 166; 175; 176; 183; 188; 191; 195; 204; 268; etc.

The authors should indicate in their abstract what is the forecast horizon associated with the provided performance metrics.

The authors should specify in the abstract that their model is an autoregressive model (it only takes past values of the F10.7 index as inputs).

The abstract is written in a rather poor and confusing language (in particular Lines 11 – 15), and should be rewritten.

L. 30: Did the authors meant “correlation” instead of “link”? This sentence should be rephrased.

L. 45-46: This sentence is very poorly written and should be completely reworded.

In the manuscript, the authors often refer to a so-called Back-Propagation (BP) network. I understand that this denomination comes from the article by Xiao et al., 2017 cited by the authors. I find it a misnomer because "back-propagation" is the method used to make the neural network learn, and does not depend on the architecture of the network. For example, the vast majority of convolutional networks also use "back-propagation" during their training phrase (and the author’s TCN model also does). Here, the network referred to by the authors is simply a “feedforward artificial neural network” (also sometimes called “multi-layer perceptron”). This is why I suggest that the authors change the BP and BPNN denomination to a more standard and understandable one.

L. 55-56: This sentence is very poorly written and should be reworded. “sfu” should be introduced.

L. 56: The authors mention “RNN-based” architectures without introducing the meaning of the acronym and explaining that “RNN-based” methods include “LSTMs”.

L. 58: The right reference is probably Bai et al., 2018 and not 2017.

Section 2.3: Notation for vector is inconsistent. Sometimes the vector "x" is referred as "x" sometimes as 𝑥⃑ (with an upper arrow).

L. 124: Please add a reference for L1 loss.

L. 138-139: “Business sector” is probably a confusing way of referring to operational space weather centers or space agencies. Please clarify.

Figure 5: “Practical” is unclear. Please consider changing it to “Observations” or something similar.

References
[1]          M. W. Liemohn, A. D. Shane, A. R. Azari, A. K. Petersen, B. M. Swiger, and A. Mukhopadhyay, “RMSE is not enough: Guidelines to robust data-model comparisons for magnetospheric physics,” J. Atmospheric Sol.-Terr. Phys., vol. 218, p. 105624, Jul. 2021, doi: 10.1016/j.jastp.2021.105624.
[2]          E. Samara et al., “Dynamic Time Warping as a Means of Assessing Solar Wind Time Series,” Astrophys. J., vol. 927, no. 2, p. 187, Mar. 2022, doi: 10.3847/1538-4357/ac4af6.
[3]          G. Bernoux, A. Brunet, É. Buchlin, M. Janvier, and A. Sicard, “An operational approach to forecast the Earth’s radiation belts dynamics,” J. Space Weather Space Clim., vol. 11, p. 60, 2021, doi: 10.1051/swsc/2021045.
Citation: https://doi.org/10.5194/egusphere-2023-1801-RC2
- AC2: 'Reply on RC2', lu yao wang, 02 Oct 2023
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-1801/egusphere-2023-1801-AC2-supplement.zip
  
  Citation: https://doi.org/10.5194/egusphere-2023-1801-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Reconsider after major revisions (further review by editor and referees) (06 Oct 2023) by Georgios Balasis

AR by lu yao wang on behalf of the Authors (06 Oct 2023) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (09 Oct 2023) by Georgios Balasis

RR by Anonymous Referee #1 (11 Oct 2023)

RR by Anonymous Referee #2 (10 Nov 2023)

Suggestions for revision or reasons for rejection

General Comments

Compared to the first version of the manuscript, the authors have made additions and changes and have attempted to address most of the comments made by the two reviewers. Understanding of the article is still hampered by poor language and syntax, but the authors mention their willingness to hire an external service provider to help them with this, which would be beneficial. I would like to start by thanking the authors for their efforts.
From my point of view, the article still has both major and minor flaws that could be the subject of another round of major revisions. However, during this first round of revision, some critical points have not been addressed by the authors. Worse still, some of their responses and revisions raise new negative issues that are, in my opinion, serious enough to lead me to recommend that the article be rejected for publication in its current form. In particular, the authors make unsubstantiated claims (both in the article and in their response) and claim to have corrected certain aspects of the article when this is clearly not the case. In a scientific article, these methodological and scientific errors are unacceptable.
However, this does not mean that their conclusions are necessarily incorrect in absolute terms, nor that the core of their work, if revised and adapted accordingly, cannot be published in the future. Therefore, I recommend that authors take the time to redo the necessary experiments and rewrite certain parts of the article before resubmitting, in order to increase the value of their results that may be of interest to the community.
Below, I detail the reasons that I believe justify the rejection of the article, as well as my other specific comments that should also be considered in the event of a new submission.

Critical Comments

1) In my first review, I suggested that the authors split their dataset into three sub-sets: training, validation and test (instead of their simple 2-way split, that could lead to hyperparameter overfitting). In their response (AR:6) and in the revised manuscript, the authors claim they have followed the suggestion and are now using solar cycle 23 for validation and solar cycle 24 for testing (the rest of the solar cycles being used for training, see Figure 1). This implies they have re-run their pipeline, done a novel hyperparameter search using the new validation set, and trained at least a new model with this split. It appears however that their hyperparameters are still exactly the same (see Table 2), which is surprising, but not impossible. However, the numerical values of their results did not vary at all, down to the hundredth, so that Table 3 and Figures 7 and 8 are exactly the same as in the first version. This is extremely implausible given the stochastic nature of neural network training and the remarkable fact that they claim having removed an entire solar cycle from their training set. I therefore assume that the authors simply changed Figure 1 and their text in response to my comment, but did not actually do the work necessary to really take it into account. Thus, their results most likely come from a model and a pipeline that do not correspond to what they describe, which is a serious misconception.

2) In Section 3, the authors compare the performance of their TCN model to the SWPC model. They make the following statement (L240 – 244):
“The main reason the TCN model outperforms the SWPC forecast results, in predicting the F10.7 values for 2 and 3 days ahead, is that the TCN model effectively captures the long term dependencies in the time series data by its structure of convolutional layers and residual connections. The structure of the TCN model could solve the non linearities in the F10.7 sequence more effectively, to improve stability and prediction accuracy (Bai et al.2017).”
However, when asked to give details about the SWPC model during the first revision, the authors replied (AR16-(1)):
“SWPC is the US Weather Prediction Centre, I'm sorry I don't know what model they used to predict F10.7.”
Therefore, how can the authors explain the reasons that make their model better than the SWPC model if they do not even know the nature of the SWPC model? As far as we know, the SWPC could also use RNNs or even a TCN. The authors' assertion reflects the conclusion they would like to reach without being based on real arguments, which is a serious methodological error.

3) As mentioned in the general comments, the document is still rather poorly written. This doesn't just mean that there are grammatical and syntactical errors, but also that the way in which the sentences are constructed and the arguments of the authors in general sometimes seem rather unclear and confused to me. I believe this should be addressed in depth before a novel submission.

4) I find the critical discussion in section 3 and its reiteration in section 4 still weak. The authors mostly limit themselves to describing their figures and concluding that their model is very good. In my opinion, this lacks the necessary critical perspective. For example, in the first review, I asked the authors to comment on the astonishing fact that their model performs equally well with forecast horizons of 1, 2 and 3 days (after all, why stop at 3 days and not, say, 1 week?) In their reply, the authors simply repeat that their model is good for these 3 forecast horizons, unlike the reference models, without adding anything else to the critical discourse of the article. I find this disappointing and it is one of the boundaries between a simple technical report and a real scientific article.

Specific Comments

1) L.32: The authors state: “The correlation between F10.7 at the current and previous moments decreases as the time interval increases”. If I am not mistaken, F10.7 is actually highly autocorrelated with a period of 27 days. This statement is therefore incorrect and should be revised. For the same reason, why did the authors use 20-day long sequences for their forecasts and not 27-day (or greater) long sequences, that would allow them to take advantage of this autocorrelation?

2) Section 2.2: The text added by the authors is useful, but would be better if it was better worded, and if the authors added the references for each framework.

3) Section 2.3, L. 101: “Since its introduction, TCN has caused a huge response”: this sentence is exaggerated and misleading. See e.g. Google Scholar research for the term "temporal convolutional network" (within quotes) which yields 9,030 results, while “convolutional neural network” yields 852,000 results and “recurrent neural network” yields 455,000 results. Since this sentence is in addition not useful to the paper, it should be removed.

4) Table 2: The authors state their “batch size” is None. According to their letter (AC12-(1)), they “don't set the value of batch size, [they] directly take an entire training set for training. So, the batch size is none.” This is very surprising, as mini-batched training is one of the fundamental techniques in neural network training, and I cannot see any other contemporary example of a study performed without mini-batched training (see any deep learning reference textbook, e.g. “Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. Illustrated edition. Cambridge, Massachusetts: The MIT Press.” or “Chollet, Francois. 2021. Deep Learning with Python. Simon and Schuster”). I am not familiar with the tensorflow framework used by the authors, so I suggest they double check that setting this value to None really makes their model train with the whole train set at once, and if so, that they try to make this value vary, as it would probably further improve their results.

5) Section 2.4 and Figure 5: The authors say they use the relative error to measure the bias of their model. However, the formula they give is the formula for a normalized absolute error. This should probably be corrected, specially since in Figure 5 they use what seems to be a non-normalized error.

6) Section 3, L.182: The authors state that “Zhang et al. (2020) showed that the variation in error follows the same trend as the sunspot number” and use this statement to explain the performance of the TCN model, which varies over the years. However, as I understand it, Zhang et al. (2020) conclusions apply only to the LSTM model developed in their own article. Thus, the authors should conclude that they come to the same conclusions as Zhang et al. (2020), rather than citing Zhang et al. (2020) to explain their results. This may simply be a writing problem (see critical comment 3), but it should not be overlooked.

7) Section 3, L.196 – 197 and Section 4, L.287 – 288: The authors repeat that their model "does not affect the final F10.7 forecasts due to specific properties of the data". This sentence is completely incomprehensible to me, since obviously their model, which produces the forecasts, does affect them. The authors repeat this sentence in both Section 3 and in the conclusions so it seems important. It should be rephrased so that it makes sense.

I hope that these comments will help authors to improve their manuscript should they wish to resubmit it.

Hide

RR by Anonymous Referee #3 (02 Jan 2024)

ED: Publish subject to revisions (further review by editor and referees) (11 Jan 2024) by Georgios Balasis

AR by lu yao wang on behalf of the Authors (16 Jan 2024) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (17 Jan 2024) by Georgios Balasis

RR by Anonymous Referee #3 (24 Feb 2024)

ED: Publish as is (29 Feb 2024) by Georgios Balasis

AR by lu yao wang on behalf of the Authors (03 Mar 2024) Manuscript

Deep temporal convolutional networks for F10.7 radiation flux short-term forecasting

Download

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Suggestions for revision or reasons for rejection

Suggestions for revision or reasons for rejection

Deep temporal convolutional networks for F_10.7 radiation flux short-term forecasting