Reply on RC1

: 1st sentence, “may cause” — I think the literature is pretty conclusive that warming does cause snow to melt earlier. Abstract should define what you mean by the 20th percentile of snowmelt days — this is meaningless to someone only reading the abstract. What do you mean by colder places are more sensitive than warmer places? In what way? Earlier snowmelt? If there’s no snow, of course it wouldn’t be sensitive to that. “climate

snow -a lot of the "snowmelt days" marked with purple circles in Figure 1 look like rain storms to me. The South Fork of the Tolt mostly gets rain, but also rain on snow. How do diurnal cycles that are identified but aren't really snow melt impact your results?
We agree that the auto-correlation metric can be confusing and can be better explained. We will add clarifications about its meaning in the context of Figure 1. About the effect of rainstorms, we have set up several checks in our method to limit false positive snowmelt days. First, we apply a more restricted monthly and site-specific window of lagged correlations based on clear-sky snowmelt-driven diel cycles only (see lines 140-145). This limits rainfall coming at a time different from typical snowmelt (or ET) causing a false positive melt day. Second, the rainstorm needs to have a specific diel cycle that will highly correlate with solar radiation. On a complete cloudy day, solar radiation will have a diurnal cycle like a clear sky day, so a rainstorm that produces a snowmelt-like response (depending on watershed's surface and subsurface connectivity and rainfall histogram) may potentially produce a false positive. On a partly cloudy day, where rainfall occurred but either before or after the event there were clear sky conditions, the chances to have a highly correlated rainfall-induced diel cycle with solar radiations are likely minimal as the shape of the solar radiation diel cycle can have several discrete changes.
However, our method cannot guarantee that rainfall-driven diel cycle will not be picked up (though we argue that the chances are small). To address the reviewer's concern, we propose screening the days that our method classifies as snowmelt-driven by whether precipitation occurred that day or not, using daily NLDAS precipitation. This will allow detecting whether there is a chance that the snowmelt-day was wrongly picked up. We think this screen will show an unbiased effect from rainfall to the streamflow diel from our method. The Tolt River example is an important one because it has a lower number of snowmelt days. We will better highlight in Figure 1E the two examples of diel patterns that are not recognized at snowmelt and screened by our method. Figure 1F may be misleading because the diel cycles are not observable in the line graph. We will better discuss this figure and highlight the strengths and weaknesses of the method.
1b. As an alternate approach to when snowmelt is significant, you could look at the power spectra of your time series. See Figure 6 in Lundquist and Cayan 2002. The days with a sharp increase in power at the once per day cycle indicate snowmelt, whereas rain exhibits a much more red spectra. I know that power spectra are commonly used by oceanographers and not hydrologists, so your method is likely easier to understand, but it would be nice to have an independent method to check.
We appreciate the recommendation of the reviewer about the power spectra, but do not feel that it will be an improvement from our custom method that adjusts for seasonal and basin-specific lags. We think that a power spectra method would also struggle to distinguish between rainfall and snowmelt diel cycle. Please, see previous comment about figure 1E, and how we propose to check our method for rainy days.
1c. In particular, I recommend clearer discussion about the strengths and weaknesses of this approach. It will miss rain-on-snow (signal dominated by rain), as well as early melt into dry soil (no streamflow response). It may also misclassify rain with a diurnal structure to it as snowmelt. Therefore (and you allude to this multiple times in the manuscript but should make it clearer), the method is best at detecting melt in non-rainy locations with fairly-saturated soils. With that in mind, which of your basins do you trust the signal the most.

The reviewer makes good points about what the method can and cannot do.
As detailed in previous answers, we will add an independent method to check for rainstorms. One way that we will strengthen our argument is to subset the basins where we are most confident that rain is not occurring and streamflow is tightly couple to snowmelt. This additional analysis will be discussed in the text and shown in a supplemental Figure or Table. 1d. Section 3.1 explains how well the DOS_20 is related to simpler magnitude metrics (DOQ_25 and DOQ_50) but doesn't really justify why the DOS_20 is helpful beyond those metrics -can you better explain what we gain by doing this extra analysis. This section also identifies some rain-dominated rivers wherein these metrics appear less correlated. Is this because the method breaks down? Or can we learn important information from this change in relationship?
DOS 20 aims to capture snowmelt-streamflow connectivity; however, it does not imply anything about the contribution (volume) of snowmelt to streamflow. As such, this metric can potentially be implemented as a relatively easy way to benchmark -hourly-hydrological and land-surface models beyond typical daily streamflow metrics or point-scale continuous SWE measurements. Specifically, we see potential to use this information to validate snowmelt dynamics of a model. We will be more specific about the value of DOS20 in the discussion.
About the value of section 3.1, we believe there are two key points to be stated. First, the diel method is more uncertain under rainier conditions as it may potentially misclassify snowmelt events due to rainfall (we now propose checking those), and second, under rainier conditions the timing of streamflow volume is likely to be more strongly controlled by the timing of rainfall as opposed to the timing of snowmelt, and thus those sites deviate from the 1:1 line in DOS20 vs DOQ25 and DOQ50.
2) You need to more explicitly discuss the difference between a stream's climate sensitivity of snowfall changing to rainfall vs. a climate sensitivity of earlier snowmelt.
2a. Many of the earlier papers on streamflow sensitivity to climate change highlighted basins in the transitional rain-snow zone as being most sensitive because snowfall shifts to rainfall. From my own experience, the diurnal cycle in streamflow is particularly hard to detect in these basins because rain-induced runoff is such a larger signal than snowinduced runoff, especially when both happen more or less at the same time. Therefore, I imagine that your snowmelt index uniquely does not work well in these basins (e.g., the Tolt example in your paper, or the NF American River example in Lundquist and Cayan 2002 Fig. 6). I could imagine that for these basins, you could even get DOS_20 moving later in the season with warming if early season events are all rain and only a later, nonrainy period exhibits snowmelt.
We agree that a better discussion of the effects of changes from snow to rain on our results is merited. We train a simple model to predict the date of DOS20 based only on basic climatological information. This model shows, as the reviewer suggests, smaller sensitivity in DOS20 to climate variation ( Figure 5) in warmer and more cloudy locations. However, it shows consistent trends to earlier DOS20 from our simple inter-annual regression-based metric, even in the warmest and rainiest basins.
We also agree that this effect could be better discussed with regards to Figure 6. The largest difference between NoahMP and the STS method were in the sunny, cold basins that would be least likely to see changes to rain and be biased by the rainier basins.
2b. I imagine that including rain-on-snow or rain-dominated basins would bias your correlations with humidity because these tend to be more humid basins but also may have spurious results.
We try to incorporate as much site and inter-annual variability in the dataset to increase the predictive power of the space-for-time approach, as historically cold sites will transition into warmer and more humid site as those with rainier conditions. That's being said, we recognize the challenges in reliably capturing snowmelt events where rainfall is important (as discussed in previous major comment). It's relevant to highlight that those sites in the Pacific Northwest (#24, 25 and 31) that have low snowfall contributions (as highlighted in Figure  3) are ultimately not used for the sensitivity analysis, and thus, do not impact the conclusions drawn in the study. Nonetheless, this will be further clarified and discussed in the revised version of the manuscript that includes an analysis of the importance of rainfall-cased diel fluctuations.

2c. I encourage the authors to think about rainfall vs snowfall and snowmelt sensitivities separately and to decide if they want to address both in this paper or only focus on the latter. Then, be very clear about this decision in the paper discussion.
It is not easy to disentangle the two, but we agree that our method is better suited to answer questions about snowmelt sensitivity and that should be the focus of the paper. However, we recognize our empirical analysis reflects both the effect of changing precipitation partitioning and snowmelt sensitivities.
3) You need to more clearly evaluate how well your NoahMP-WRF model set up is simulating streamflow timing in the current climate before examining the results of its climate sensitivity.
3a. It appears that you have a biased simulation of NoahMP-WRF -if the historic runoff date is off by 50 days (see line 260), the model is either simulating too much rain and too little snow or melting snow way too early. It's hard to draw conclusions on sensitivity when using a biased model. Of course, if the model has less snow than the real world, it will be less sensitive to that snow disappearing. The paper would be much more meaningful if you included some evaluation of your NoahMP-WRF simulations -how do they compare to baseline observations and to other models run over the domain (similar western US climate-change papers). (Liu et al., 2017; Scaff et al., 2020). We do agree with Dr. Lundquist in that one should make sure the model represents reliably a particular system before looking at its sensitivity to climate change. Nonetheless, this type of simulations have been used for climate change analyses (Musselman et al., 2017(Musselman et al., , 2018, but its runoff component has not been tested to our knowledge. Furthermore, the NoahMP model is the under the US National Water Model (https://water.noaa.gov/about/nwm) and thus its relevance to policy and research are high.

The reviewer makes a good point, and we will improve and highlight better the description of the model performance. Just to clarify, these simulations made by the National Center for Atmospheric Research (NCAR) presented by (Liu et al., 2017) have been previously tested in terms of its meteorology and snow components
Detailing the exact biases of NoahMP simulations in the past is beyond the scope of this study, but we will detail previous efforts in this arena. We will improve our discussion and analysis to demonstrate that the NoahMP-WRF is predicting an earlier historical DOQ25 compared to our STS method and historical observations (current Figure 6A), whereas prediction of DOQ50 is more similar between the methods and observations historically ( Figure 6B). A key finding is that NoahMP DOQ50 is less sensitive to change than the STS method in the snowier basins where the STS methods should be more reliable.
3b. Also, if the NoahMP-WF simulations perform better in certain regions (if I'm correct, these were only carefully vetted for Colorado), you may also want to focus your analysis on those regions separately. Do you get closer agreement in areas where the model represents snow processes more accurately? Might a check for space-for-time sensitivity against model sensitivity be a good check for model fidelity?
For the historical DOQ25 the NoahMP-WRF model actually performed the best in rainier sites (see circled blue symbols in Figure 6a) and a few other sites classified as 'cloudy' and 'partly cloudy', whereas the Rocky sites are characterized by 'sunny' snowmelt events were the most biased (see circles in Fig6a). This suggests that the timing of streamflow volume is better represented in areas where snowmelt processes are less important, though other variables like topographic (and thus climatic) gradient can also be important.
Discussion should be better streamlined and organized. This may be a good place to address major comments 1-3 above.
We will improve the discussion based on Dr. Lundquist suggestions, which will hopefully address her main concerns.

Minor:
Abstract: 1st sentence, "may cause" -I think the literature is pretty conclusive that warming does cause snow to melt earlier. Abstract should define what you mean by the 20th percentile of snowmelt days -this is meaningless to someone only reading the abstract. What do you mean by colder places are more sensitive than warmer places? In what way? Earlier snowmelt? If there's no snow, of course it wouldn't be sensitive to that.
We will change the abstract to read "climate change will cause …", and provide a more meaningful introduction to DOS20. We will clarify what we mean by "cold sites are more sensitive", which refers to the fact that the timing of early streamflow volume changes the most at cold sites compared to warmer sites.
Line 120: "DAYMET dataset (daymet.ornl.gov), which in turn is based on ground observations" -it's interpolated from existing ground observations -worth specifying as sometimes this is far from truth.
We will change it to read as suggested by the reviewer. We appreciate the references. Stewart is already mentioned in the discussion, and we will add Lundquist et al (2004).