Reply on RC1

The Introduction has been restructured, to provide more clarity about the state of the art and a better understanding of the main ideas of the article. A complete grammar review of the article has been done, resulting in a paper more friendly to the reader. Figure 4 has been discarded, as it duplicated information from Table 1. An extra parameterization has been added, in order to compare its results with those of the Machine Learning parameterization. Several figures have been modified to reflect the inclusion of the additional parameterized model. The conclusions are now supported with the analysis of the comparisons of three models, instead of two.

The Introduction has been restructured, to provide more clarity about the state of the art and a better understanding of the main ideas of the article. A complete grammar review of the article has been done, resulting in a paper more friendly to the reader. Figure 4 has been discarded, as it duplicated information from Table 1. An extra parameterization has been added, in order to compare its results with those of the Machine Learning parameterization. Several figures have been modified to reflect the inclusion of the additional parameterized model. The conclusions are now supported with the analysis of the comparisons of three models, instead of two.

General comments
The study introduces a new parameterization of the collision-coalescence process that is based on the results from machine-learning procedures, with an aim to eventually use it in weather forecasting models. The authors utilized 100,000 size distributions of drops (including both cloud droplets and raindrops) to obtain the tendencies (time derivative) of 0th-5th moments, which were used for training a machine (80%) or evaluating the machine 's predictions (20%). Each droplet size distribution was assumed to be a composite of two lognormal size distributions, represented by 6 parameters. The paper compares the evolutions of drop size distributions predicted by the machine-learning based parameterization and explicitly calculated by the method in Bott et al. (1998). The authors concluded that the differences were always less than 10% and therefore it has a promising potential for the future implementation in weather forecasting models. The overall idea of utilizing the machine-learning method is innovative and aligns with what the cloud-modeling community has started working on in recent years. The results of the study are interesting and provide promising suggestions for the future model improvements. At the same time, the paper seems to require some improvements in its structures and also in providing sufficient information. Most importantly, the conclusions would become much more solid and significant if (i) more than one test simulation is done and/or (ii) if the comparison to an existing parameterization is shown. Regarding (i): although a large number of samples were used for training the machine, the overall evaluation of the new parameterization seems to rely only on one simulation (Table 4), particularly its comparison with the explicit calculation by Bott et al. (1998) under the same condition. The prediction accuracy must be somewhat dependent on each case and it is not known if this one test case falls in the "well-" or "badly-" predicted group. Regarding (ii): the prediction would always have some errors, but the magnitude of the errors is important, particularly in comparison to errors made by other existing parameterizations. Therefore, I think (i) more test simulations to compare the predictions with Bott's calculations and/or (ii) comparison with existing two-moment parameterization is necessary to draw a solid conclusion. I would highly suggest (ii). Detailed suggestions are listed below.
Answer: Regarding (i), the authors agree with the referee on performing more test simulations. However, it is not the objective of the paper to show the behavior of the parametrization under several initial conditions, or even under extreme cases of study, but to introduce the Machine Learning methodology applied to the series of basis functions modelling philosophy, and to eliminate the need to solve complex integrals as part of the formulation of the parameterization. Further testing will be done addressing those and more concerns, including the addition of a condensation module.
Regarding (ii), an additional comparison has been included in the revised version of the manuscript, taking into account an extra parameterization, as suggested by the referee. The popular WDM6 (WRF Double Moment 6-class) parameterization was used in the simulation, using the same initial conditions and simulation parameters. The results and discussion of the comparison have been included in the updated version of the manuscript. It was the intention of the authors to include a second extra parameterization in the paper ((Seifert & Beheng, 2001)), but because deadline issues and the extensive work needed, it was not included.

Specific comments
Lines 12-13: It seems very important to clarify what was calculated and what was predicted/estimated. Since it's supervised learning, the machine did not calculate the moments based on equations, but they must have been calculated in advance elsewhere and the results (inputs & output) were fed into the machine to train it. Afterwards, during the testing/validation phase, the total moments were predicted, not calculated by physical equations, by the trained machine. I understand the overall meaning but the readers may be misled that the machine can analytically solve the SCE and calculate the tendencies of the moments. But in reality, the machine simply gives the prediction based on what it learned before. Therefore, the word "predict/estimate" sounds more appropriate than "calculate".
Answer: The authors agree with the referee, and the wording of the abstract have been changed to reflect the fact that the Machine Learning model only predict the tendencies of the total moments, and does not solve the SCE itself.
Line 27: Adding a short explanation on a self-preserving form would be helpful (e.g., what it is, why it gets formed, etc.), especially if this is relevant to collision-coalescence.
Answer: The self-preserving form size distributions are analyzed in detail on (Swift & Friedlander, 1964), and is related to the preservation of the type of distribution function with time. Self-preserving distributions are relevant to collision-coalescence mainly because the evolution of the distribution functions due to this process can be expressed in this mathematical form. However, to avoid any further complication on the interpretation of that paragraph, the corresponding sentences have been removed from the manuscript. 6 is not mentioned later in the paper, although I understood/knew them individually. Therefore, I suggest that the authors add a few sentences at the end of Section 2 to summarize the entire section.
Answer: The structure of the section have been modified to better organize the contents, rearranging the subsections as 2.1 and 2.2. The system of equations expressed in Equation 6 is transformed to its matrix form in Eq. 7. Equation 13 represents the way on which the total moment tendencies are calculated in the original parameterization (Clark, 1976), and is the definition of the components of vector F (right-hand side of the system of equations).
Lines 211-212: Although mentioned later, it would be better to mention here why the third moment tendency is not calculated.
Answer: An explanation is made about why the third moment order is not included, as suggested by the referee.  (e.g., minimum, maximum, mean, median, etc.) can be provided separately for two lognormal distributions on Table 1, this figure can be omitted, as the information overlaps.
Answer: The authors agree with the redundancy of information between Figure 4 and Table 1. Thus, Figure 4 has been deleted from the article, and the rest of the figures have been renumbered. Answer: Since the values of the total moment tendencies are normalized (scale of 100), MSE values of 10-4 are considered a good performance. This explanation has been included in the manuscript, for more clarity in the text and interpretation of results. A column has also been included in Table 3, detailing the Correlation Indexes calculated between the output of the trained neural networks and the solution of the KCE.
Section 4: I think this section can be included as a subsection of 5.1 in the following Section 5, or even as 2.3 in Section 2.
Answer: The authors agree with the suggestion of the referee, and Section 4 has been relocated as subsection 2.3. All subsequent equations and sections have been renumbered accordingly. Table 1.

Table 4: I understand that these conditions were chosen based on Clark (1976), but I think it would strengthen the argument that this case (or f1) is a good representation of the training data on which the machine was trained, if the authors mention the mean values in
Answer: An explanation was included in the manuscript to reflect the fact that the initial conditions from Table 4 are in fact a good representation of the data used to train the neural networks. Figure 10, for example, if another parameterization predicts 100 cm-3 at t=900s, then the new machine learning-based parameterization would be a better predicter. Furthermore, if such a comparison can be done for more than one case, the results would become much more solid and substantial.

Lines 357-358 and Figures 9 and 10: It is difficult to conclude whether the differences between what's predicted by the new parameterization and what's calculated by Bott's code are small enough or not, only from the figures. However, if you can add predicted values from other existing two-moment parameterizations (one frequently used in weather forecasting models), that would give the readers some insight; in
Answer: In order to better demonstrate the accuracy of the developed parameterization, a comparison with the results from the collision-coalescence section of the WRF Double Moment 6-class parameterization (WDM6) have been established (Cohard & Pinty, 2000). However, a comparison methodology had to be developed, since both parameterizations are of different kinds, and their formulations are focused on different modelling philosophies. Despite that, the comparison showed promising results for the Machine Learning parameterization, particularly in the calculation of the individual moments of the drop spectrum. The proper figures and comments have been added to the manuscript, to incorporate those new findings from the comparison. It was the intention of the authors to compare the results with at least another parameterization (the one from (Seifert & Beheng, 2001)), but the amount of work needed to establish that comparison exceeded the available time offered by GMD, due to the extensive differences between the formulations of the parameterizations. Such work will be done in future research regarding the parameterization philosophy of series of basis functions here presented. Figure 12: While the authors clearly state the percentage differences between the predictions and the explicit calculations, its physical meaning also needs a clarification.

Table 5 and
For example, what does the -8% error of M2 tendency prediction physically mean, and why could it be underestimated by the machine? Even more, for instance, how does this magnitude of errors compare to the errors made by other existing parameterizations?
Answer: The calculation of the percent errors are done taking the bin model results as reference. For example, a -8 % error of M2 tendency means that the predicted value of that specific moment is 8 % lower than the reference solution, regarding the reference solution itself. The causes of those differences are still subject of investigation. However, the comparison with one commonly used parameterization (explained in the previous answer) shows a better skill at predicting the statistical moments of the drop spectra than the added parameterization (WDM6). To reflect this, Table 5 has been modified to include the results of the extra parameterization considered.
Section 7: The authors conclude that the overall prediction accuracy was high, but additional analyses and/or a comparison with existing parameterizations seems to be necessary to draw the conclusion. Although the errors in Figure 12 remained less than 10%, how about other existing parameterizations? Would they be within 5%, or more than 50%? I think such a comparison would provide the readers more in-depth understanding and better assessments of the presented ML-based parameterization.
Answer: Same as the two previous comments. The authors understood that comparison with at least one extra parameterization was needed in order to provide a better assessment on the accuracy of the Machine Learning model.

Technical corrections
The authors thank the referee for the detailed revision of the technical details of the manuscript. All technical recommendations have been addressed, and we will only answer the ones that required specific comments.
Lines 70-72: As it approximates the droplet size distributions by two lognormal distributions, rather than using bins, I am not sure if "This approach simulates the explicit approach" is the accurate description. The strength of the authors' approach seems to be the time-varying parameters for the two lognormal distributions, in contrast to the conventional bulk schemes, which can be emphasized here.
Answer: As noted by the referee, the strength of the presented parameterization resides in the time-varying parameters for the distributions. However, is the authors' opinion that this approach could be considered a middle point between bin and bulk models, as it covers the entire size spectrum with continuous, non-truncated, distribution functions. However, we have followed the recommendations of the referee of emphasizing the main characteristic of the parameterization. Since the values from the explicit calculations are the "goal/right" values, I think they should be plotted on the y axis rather than on x (i.e., suggest swapping x and y axes). Also, the plots would look better if the x-and y ranges are identical within each plot (e.g., the plots for M1 and M4 seem to have different ranges for x and y axes).
Answer: The values from the Neural Network model are plotted in the y axis to achieve consistency across all figures in the manuscript. Since all results from the parameterization are plotted in the y axis, the authors consider that Figure 7 (renumbered Figure 6 in the revised manuscript) should not be the exception. Regarding the ranges of the axles, while it is true that the plots would look better if the axles were identical, it is necessary to reflect that each moment has different ranges according to their characteristics. Since the values of the moments' rates are not normalized, the axles cannot be in the identical for all plots in Figure 7. Answer: Following the same logic of the referee, it was the first intention of the authors to place the figure in the indicated way, prior to submission to the journal. However, after reviewing the manuscript, we noted that that configuration caused the plots to be deformed and the results could not be easily interpreted, so we opted for a left-and-right configuration of the panels.