General Comments
Compared to the first version of the manuscript, the authors have made additions and changes and have attempted to address most of the comments made by the two reviewers. Understanding of the article is still hampered by poor language and syntax, but the authors mention their willingness to hire an external service provider to help them with this, which would be beneficial. I would like to start by thanking the authors for their efforts.
From my point of view, the article still has both major and minor flaws that could be the subject of another round of major revisions. However, during this first round of revision, some critical points have not been addressed by the authors. Worse still, some of their responses and revisions raise new negative issues that are, in my opinion, serious enough to lead me to recommend that the article be rejected for publication in its current form. In particular, the authors make unsubstantiated claims (both in the article and in their response) and claim to have corrected certain aspects of the article when this is clearly not the case. In a scientific article, these methodological and scientific errors are unacceptable.
However, this does not mean that their conclusions are necessarily incorrect in absolute terms, nor that the core of their work, if revised and adapted accordingly, cannot be published in the future. Therefore, I recommend that authors take the time to redo the necessary experiments and rewrite certain parts of the article before resubmitting, in order to increase the value of their results that may be of interest to the community.
Below, I detail the reasons that I believe justify the rejection of the article, as well as my other specific comments that should also be considered in the event of a new submission.
Critical Comments
1) In my first review, I suggested that the authors split their dataset into three sub-sets: training, validation and test (instead of their simple 2-way split, that could lead to hyperparameter overfitting). In their response (AR:6) and in the revised manuscript, the authors claim they have followed the suggestion and are now using solar cycle 23 for validation and solar cycle 24 for testing (the rest of the solar cycles being used for training, see Figure 1). This implies they have re-run their pipeline, done a novel hyperparameter search using the new validation set, and trained at least a new model with this split. It appears however that their hyperparameters are still exactly the same (see Table 2), which is surprising, but not impossible. However, the numerical values of their results did not vary at all, down to the hundredth, so that Table 3 and Figures 7 and 8 are exactly the same as in the first version. This is extremely implausible given the stochastic nature of neural network training and the remarkable fact that they claim having removed an entire solar cycle from their training set. I therefore assume that the authors simply changed Figure 1 and their text in response to my comment, but did not actually do the work necessary to really take it into account. Thus, their results most likely come from a model and a pipeline that do not correspond to what they describe, which is a serious misconception.
2) In Section 3, the authors compare the performance of their TCN model to the SWPC model. They make the following statement (L240 – 244):
“The main reason the TCN model outperforms the SWPC forecast results, in predicting the F10.7 values for 2 and 3 days ahead, is that the TCN model effectively captures the long term dependencies in the time series data by its structure of convolutional layers and residual connections. The structure of the TCN model could solve the non linearities in the F10.7 sequence more effectively, to improve stability and prediction accuracy (Bai et al.2017).”
However, when asked to give details about the SWPC model during the first revision, the authors replied (AR16-(1)):
“SWPC is the US Weather Prediction Centre, I'm sorry I don't know what model they used to predict F10.7.”
Therefore, how can the authors explain the reasons that make their model better than the SWPC model if they do not even know the nature of the SWPC model? As far as we know, the SWPC could also use RNNs or even a TCN. The authors' assertion reflects the conclusion they would like to reach without being based on real arguments, which is a serious methodological error.
3) As mentioned in the general comments, the document is still rather poorly written. This doesn't just mean that there are grammatical and syntactical errors, but also that the way in which the sentences are constructed and the arguments of the authors in general sometimes seem rather unclear and confused to me. I believe this should be addressed in depth before a novel submission.
4) I find the critical discussion in section 3 and its reiteration in section 4 still weak. The authors mostly limit themselves to describing their figures and concluding that their model is very good. In my opinion, this lacks the necessary critical perspective. For example, in the first review, I asked the authors to comment on the astonishing fact that their model performs equally well with forecast horizons of 1, 2 and 3 days (after all, why stop at 3 days and not, say, 1 week?) In their reply, the authors simply repeat that their model is good for these 3 forecast horizons, unlike the reference models, without adding anything else to the critical discourse of the article. I find this disappointing and it is one of the boundaries between a simple technical report and a real scientific article.
Specific Comments
1) L.32: The authors state: “The correlation between F10.7 at the current and previous moments decreases as the time interval increases”. If I am not mistaken, F10.7 is actually highly autocorrelated with a period of 27 days. This statement is therefore incorrect and should be revised. For the same reason, why did the authors use 20-day long sequences for their forecasts and not 27-day (or greater) long sequences, that would allow them to take advantage of this autocorrelation?
2) Section 2.2: The text added by the authors is useful, but would be better if it was better worded, and if the authors added the references for each framework.
3) Section 2.3, L. 101: “Since its introduction, TCN has caused a huge response”: this sentence is exaggerated and misleading. See e.g. Google Scholar research for the term "temporal convolutional network" (within quotes) which yields 9,030 results, while “convolutional neural network” yields 852,000 results and “recurrent neural network” yields 455,000 results. Since this sentence is in addition not useful to the paper, it should be removed.
4) Table 2: The authors state their “batch size” is None. According to their letter (AC12-(1)), they “don't set the value of batch size, [they] directly take an entire training set for training. So, the batch size is none.” This is very surprising, as mini-batched training is one of the fundamental techniques in neural network training, and I cannot see any other contemporary example of a study performed without mini-batched training (see any deep learning reference textbook, e.g. “Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. Illustrated edition. Cambridge, Massachusetts: The MIT Press.” or “Chollet, Francois. 2021. Deep Learning with Python. Simon and Schuster”). I am not familiar with the tensorflow framework used by the authors, so I suggest they double check that setting this value to None really makes their model train with the whole train set at once, and if so, that they try to make this value vary, as it would probably further improve their results.
5) Section 2.4 and Figure 5: The authors say they use the relative error to measure the bias of their model. However, the formula they give is the formula for a normalized absolute error. This should probably be corrected, specially since in Figure 5 they use what seems to be a non-normalized error.
6) Section 3, L.182: The authors state that “Zhang et al. (2020) showed that the variation in error follows the same trend as the sunspot number” and use this statement to explain the performance of the TCN model, which varies over the years. However, as I understand it, Zhang et al. (2020) conclusions apply only to the LSTM model developed in their own article. Thus, the authors should conclude that they come to the same conclusions as Zhang et al. (2020), rather than citing Zhang et al. (2020) to explain their results. This may simply be a writing problem (see critical comment 3), but it should not be overlooked.
7) Section 3, L.196 – 197 and Section 4, L.287 – 288: The authors repeat that their model "does not affect the final F10.7 forecasts due to specific properties of the data". This sentence is completely incomprehensible to me, since obviously their model, which produces the forecasts, does affect them. The authors repeat this sentence in both Section 3 and in the conclusions so it seems important. It should be rephrased so that it makes sense.
I hope that these comments will help authors to improve their manuscript should they wish to resubmit it. |