Quantifying the nonlinear dependence of energetic electron ﬂuxes in the Earth’s radiation belts with radial diffusion drivers

. In this study, we use mutual information to characterise statistical dependencies of seed and relativistic electron ﬂuxes in the Earth’s radiation belts on ultra low frequency (ULF) wave power measured on the ground and at geostationary orbit. The beneﬁt of mutual information, in comparison to measures such as the Pearson correlation, lies in the capacity to distinguish nonlinear dependencies from linear ones. After reviewing the property of mutual information and its relationship with the Pearson correlation for Gaussian bivariates, we present a methodology to quantify and distinguish linear and nonlinear 5 statistical dependencies that can be generalised to a wide range of solar wind drivers and magnetospheric responses. We present an application of the methodology by revisiting the case events studied by Rostoker et al. (1998). Our results corroborate the conclusions of Rostoker et al. (1998) that ULF wave power and relativistic electron ﬂuxes are statistically dependent upon one another. We also estimate that the Pearson correlation is missing between 20% and 30% of the statistical dependency between ULF wave power and relativistic electron ﬂuxes. Thus, the Pearson correlation underestimates the impact of ULF waves one 10 energetic electron ﬂuxes. However, we ﬁnd that observed enhancements in relativistic electron ﬂuxes correlate modestly, both linearly and nonlinearly, with the ULF power spectrum when compared with values found in previous studies (Simms et al., 2014), and with correlational values found between seed electrons and ULF wave power for the same case events. Our results are indicative of the importance in incorporating data analysis tools that can quantify linear and nonlinear interdependencies of various solar wind drivers. we comparisons our useful in that us a reference to the strength of the correlation values we found. For another comparison, for all times during the years (Borovsky, found a Pearson correlation coefﬁcient of 0.34 between F e 1 . 2 and S gr , whereas they found a higher correlation coefﬁcient of 0.54 between F e 1 . 2 and the 123-hr time integral of S gr . Similarly it was found that the correlation coefﬁcient between F e 1 . 2 and S geo was 0.21 whereas the correlation coefﬁcient was 0.25 between F e 1 . 2 and the 138-hr time integral of S geo explored correlation coefﬁcients between S gr and S geo and a relativistic-electron ﬂux F that was calculated differently from F e 1 . 2 . They found correlations between F and S gr of 0.36 (with a time lag of 56 hr) and between F and the 126-hr time integral of S gr of 0.55. Likewise they found correlations between F and S geo of 0.28 (with a time lag of 71 hr) and between F and the 156-hr time integral of S gr of 0.32. Our results demonstrate that even though the events appear, at least visually, to show strong correlation between ULF waves and relativistic electron ﬂuxes, quantitatively the dependence 315 is comparable to other values found in the literature but nonetheless modest when compared with the correlation between ULF waves and seed electrons.


Introduction
The Earth's radiation belts are nonlinearly driven and weakly collisional plasma environments in which deposited energy and momentum leads to the energisation of electrons to relativistic energies (Van Allen et al., 1958;Walt, 2005). From a fundamental physics perspective, the acceleration of charged particles to supra-thermal energies is ubiquitous to astrophysical plasma environments. As the closest astrophysical accelerator of particles to the Earth, the radiation belts are amenable to detailed 20 in situ measurements of electromagnetic fields distribution functions. Their studies are therefore relevant to other astrophysical environments with comparable thermodynamical properties in which particles are confined by large-scale inhomogeneous magnetic fields (Kulsrud, 2005). From an applied perspective, a wide range of satellites' orbits overlap with the Earth's radi-ation belts, with the undesirable consequence that the energetic particles can damage the onboard electronics and shorten the lifespan of communication systems (Baker et al., 2018). Thus, the main focus of Earth's radiation belts' studies is to quantify 25 the processes scaling from electron kinetic scales to planetary scales that enhance and deplete the plasma (Ukhorskiy and Sitnov, 2012; Thorne et al., 2013;Lejosne and Kollmann, 2020).
It has been known for several decades that the Earth's radiation belts were driven far from thermodynamical equilibrium as a results of variable solar wind conditions (McCormac, 1965). This departure from thermodynamical equilibrium results in 30 kinetic distribution functions that are unstable and the production of fluctuations that can thermalise the plasma and accelerate particles. A growing number of in situ measurements and observational studies in the last two decades have demonstrated that the Earth's radiation belts' response to solar wind driving and fluctuations can also be nonlinear, and that nonlinearity ought to be accounted for in order to improve prediction capabilities (Wing et al., 2016;Simms et al., 2018). From a theoretical perspective, every self-consistent set of equations describing fluid and kinetic scales plasma physics are inherently nonlinear. 35 The departure of linearity in a dynamically evolving plasma translates into the appearance, and therefore measurements, of non-Gaussian fluctuations (Papoulis and Pillai, 2002). Even if a nonlinear system is initialised with Gaussian fluctuations, non-Gaussian fluctuations would eventually emerge. It is therefore not surprising that non-Gaussian fluctuations are commonly found across a wide-range of astrophysical plasma environments (Dudok de Wit and Krasnosel'skikh, 1996;Marsch and Tu, 1997;Stepanova et al., 2003;Osman et al., 2014;Osmane et al., 2015b). Taking into account the above theoretical constraints 40 and observational results, one quickly recognises that in order to quantify nonlinearity in the Earth's radiation belts, one has to use measures that can be sensitive to nonlinear dependencies, and are capable to distinguish it from linear ones.
In this study, we present an application of information theory to the search of dependencies between energetic electron fluxes measured in the Earth's radiation belts and ULF wave power measured both at geostationary orbit and on ground. Unlike more 45 commonly used measures like the Pearson correlation, information theoretic tools, such as mutual information, have the benefit to distinguish nonlinear dependencies from linear ones. In order to demonstrate the value in the use of information theoretic methods, we revisit the highly cited case studies of Rostoker et al. (1998). In their study, it was suggested that ULF pulsations can provide energy for acceleration of electrons to relativistic energies based on visual inspection of relativistic electron fluxes at geostationary orbit and ground ULF wave power. It should be stressed that Rostoker et al. (1998) conclusions are cautiously 50 stated and that a value for a correlation or any other measure is not provided. Nonetheless, it is not too uncommon to find citing authors describing their results as compelling and evidence of strong correlation between ULF wave power and relativistic electron fluxes. The impact of ULF fluctuations in the enhancement and loss of energetic electron fluxes also forms the basis of radial diffusion formalisms and is, as of today, understood as one of the two dominant transport mechanisms in planetary radiation belts (see Lejosne and Kollmann (2020) and references therein).

55
The application of information-theoretic measures to space plasma problems is not new but it has recently shown its utility for a wide-range of methodologies and problems (see De Michelis et al. (2011);Wing et al. (2016); Runge et al. (2018); the dependence of relativistic electron fluxes measured on geostationary orbits to a wide-range of solar wind drivers. In their study, Wing et al. (2016) demonstrate that the solar wind speed is the main driver and that the effect of the solar wind density, sometimes suggested as a dominant driver for relativistic electron fluxes (Balikhin et al., 2011), holds 30% lesser information content and operates on a different timescale. The main departure between the work presented hereafter and the study of Wing et al. (2016) lies in our introduction of a quantity called information-adjusted correlation and the use of a dataset that has a 1 hour resolution of geostationary-measured seed and relativistic electron fluxes. The information-adjusted correlation is defined as the correlation value that would be obtained from the mutual information under the assumption that the dependence between the two variables can be represented as a Gaussian bivariate. The choice of a Gaussian bivariate to distinguish linear and nonlinear dependences as hinted above stems from the fact that nonlinear equations produce non-Gaussian statistics even in the instance where a system is initialised with Gaussian distributed random variables (Papoulis and Pillai, 2002). We therefore 70 present a methodology that allows us to provide clear answer the following two questions: -(1) Are the events studied by Rostoker et al. (1998) evidence of statistical dependence between ULF wave power and electron fluxes?
-(2) Are nonlinearities present in the instance where the dependence between ULF wave power and electron fluxes is statistically significant?

75
Our report is presented as follows. Section 2 presents a brief a summary to the tools of information theory used for the analysis. We put a particular emphasis on the application of mutual information to the case of Gaussian random variables of arbitrary correlation which serves as a benchmark for linear dependencies. In Section 3 we describe the used dataset and the associated instruments' specificities relevant to our study. In Section 4, we present our results for geostationary-measured seed and relativistic electron fluxes measured during the events presented by Rostoker et al. (1998). In Section 5, we interpret and 80 compare our results in light of previous studies, and then conclude with suggestions for future studies and improvement of our methodologies for instances where statistical dependencies are difficult to extract.

Methodology
In this section we present a definition of mutual information in terms of the Shannon entropy and the specific mutual information of Gaussian bivariate random variables. The Gaussian bivariate case with arbitrary Pearson correlation ρ is used as 85 a toy model to benchmark the numerical estimate of mutual information and to distinguish linear from nonlinear statistical dependencies. A detailed description and derivation of mutual information for Gaussian bivariates is provided in the Appendix for the interested reader.

Mutual Information
It is preferable to introduce mutual information by first defining the Shannon entropy H(X) for a discrete random variable X 90 (Cover, 1999). The Shannon entropy is a measure of the uncertainty contained in a random variable. In communication theory it is the number of bits on average required to describe a message X ∈ X , in which X denotes the alphabet, or equivalently the discrete states that can be assigned for the random variable X. Practically speaking, if Nadia wants to send a message to Jorge, the Shannon entropy is the average number of binary questions (e.g., yes or no) one ought to ask in order to accurately decode a message X written in terms of a given alphabet X . Mathematically, it is written in terms of the probability mass function 95 p(x) as: (1) The Shannon entropy is a positive definite quantity H(X) ≥ 0 and is bounded by H(X) ≤ log(|X |) with equality if and only if the random variable X is distributed uniformly over X . Since the entropy is a measure of uncertainty (or equivalently knowledge), it is convenient to ask what happens to the amount of uncertainty if we are given additional information encoded 100 in terms of Y ∈ Y. In other words, do we gain or lose information about the likelihood of event X given Y ? Intuitively, one can assume that if X and Y are entirely independent, knowing one says nothing about the other 1 . On the other hand, if X and Y are contingent to one another, or share a causal relationship, it can then be shown that conditioning effectively reduces entropy, and therefore uncertainty. In the instance where X and Y are independent, the conditional entropy H(X|Y ), which should be read as the entropy of X given Y , reduces to H(X). On the other hand, if X and Y are statistically dependent, the entropy 105 will be reduced, with H(X|Y ) < H(X). For two random variables X and Y , this reduction in uncertainty is quantified by the mutual information: The mutual information is symmetric in X and Y and is a measure of the dependence between two random variables. It is 110 always nonnegative and only equal to zero if X and Y are independent, or equivalently if the joint distribution is the product of the marginals, i.e., p(x, y) = p(x)p(y). In our analysis the variable we use (i.e., electron fluxes and ULF wave power) are continuous, however, the use of Equation (2) requires binning and therefore discretisation. Thus, Equation (2) has been used to compute an estimator of mutual information for the dataset described in Section 3.

115
Even though probability distribution functions of electromagnetic fields and particle velocity distributions in space plasmas often depart from Gaussianity, it is useful to refer to the Gaussian bivariate case to develop an appreciation of mutual informa-tion for linear systems and as a benchmark to test numerical estimates. Conveniently, there is an exact analytical relationship between the Pearson correlation and mutual information of a Gaussian bivariate in terms of the Pearson correlation ρ: The interested reader can find a definition of mutual information for continuous random variables and the derivation of Equation (3) for Gaussian bivariates in the Appendix. Since the mutual information is a measure of how much we know from X given Y , and vice-versa, the nonlinear relationship between mutual information and the correlation is an indication that the Pearson correlation should not be interpreted linearly. Indeed, the difference in information between random variables of 0.75 and 0.5 correlation is not of order 50% (0.75/0.5 − 1 = .5) but rather 187%. Thus, two random variables with Pearson correlation of 125 0.75 carry a much larger amount of information upon one another than one with correlation of 0.5. An additional constraint with the Pearson correlation resides with fat-tailed random variables. For Gaussian bivariates, independence is synonymous with uncorrelated. However, for fat-tailed random variables, as commonly measured in space and astrophysical plasmas, strongly dependent random variables can have zero correlation (Taleb, 2020). Unlike the Pearson correlation, mutual information is able to quantify the dependence of random variables in the absence of correlation. As a simple example the reader can test for itself, 130 consider two random variables X and Z. X ∼ N (0, 1) is Gaussian random variable with zero mean and a standard deviation of 1. Z is the square of X, i.e. Z = X 2 . Thus the relationship between Z and X is nonlinear and there is no doubt that Z and X are statistically dependent with one another. However, computing the Pearson correlation is inconclusive as it gives a value of zero, whereas mutual information computed with the code described below indicates a large statistical dependence with a value of 1.42.

Numerical computation of mutual information
The procedure we follow to compute the mutual information for two time series consist in binning the data according to the Freedman-Diaconis rule (Freedman and Diaconis, 1981). Thus, even though the electron fluxes and ULF wave power are continuous, our procedure has the consequence to discretise the variables. This discretisation leads to biases in the estimation of mutual information that are dependent on the number of measurement points N and statistical dependence of the two variables.

140
For instance, two Gaussian random variables with a high correlation would require less measurement points to estimate the mutual information than two Gaussian random variables with a low correlation or two fat-tailed random variables with some arbitrary correlation. Using numerically produced Gaussian bivariates with N points and the analytical relationship between mutual information and correlation in Equation (3) one can therefore test mutual information estimators and quantify the error due to binning.

145
In Figure 1 we plotted the numerical estimate and analytical solution for 10 6 points extracted from Gaussian bivariates with correlations ranging between −1 and 1. Figure 1 is provided to show the correspondence between the Pearson correlation and mutual information and give an estimate of what values of mutual information is considered large. Figure 1 shows that greater Comparing the theoretical and numerical value of mutual information in Figure 1, we note that our estimator does well for low correlation values though it gains a discrepancy as large as 10% for correlation absolute values greater than 0.5. In order to estimate the error introduced by discretization we apply a shuffle test to the two time series and compute the average value of mutual information and its standard deviation for one hundred shuffles. We find that the error computed with the shuffling 155 procedure is Gaussian distributed and we interpret the average mutual information obtained from shuffling as the zero baseline level. This baseline for each events is plotted as an orange bold line in Figures 4-11 for panels (a) and (c). The shaded orange area represents the 3 standard deviation range from the mean. Estimates of mutual information for electron fluxes and ULF wave power above the shaded area are therefore interpreted as significant with ±3 standard deviation. More sophisticated methods to compute mutual information through non-parametric methodologies are possible (Kraskov et al., 2004), but for 160 our dataset, the statistical dependence between variables and the number of points are sufficient for us to answer the questions stated in the introduction.

Dataset
The data used in this study corresponds to the two events analysed by Rostoker et al. (1998). The first period extends from March second to May 31st in 1994 (91 days in total) and the second one spans from November first to 26th in 1993. During 165 the first period a big geomagnetic storm occurred on April 17th with minimum Dst of -201 nT, and the period featured also several moderate and intense storms. During the second period an intense storm peaked on November 4th with minimum Ds -119 nT. Another significant storm during this latter period was a moderate storm on November 18th with Dst minimum -82 nT. Both periods were thus geomagnetically active. Our choice to revisit the work of Rostoker et al. (1998) through mutual information stems from the fact that such methodology has not been used before, and that their study, highly cited in the 170 literature as evidence that radial diffusion, is a leading mechanism for the energisation of relativistic electrons, can serve as a benchmark for more involved methodologies. Additionally we have access to a comparable dataset with better resolution (1 hour resolution instead of 1 day), so we can not only revisit the results of Rostoker et al. (1998) with information theory but find a more accurate time lag for the electron's response to ULF wave power. In Rostoker et al. (1998)  The GOES data is the daily average flux and the ULF data is the average over a six hour period from dawn to noon.

ULF power spectrum
The ULF data used in this analysis was from National Aeronautics and Space Administration's (NASA) Virtual Radiation Belt Observatory (ViRBO) and the used ULF indices Sgr and Sgeo, both describing ULF spectral power from which noise has been 180 removed, are derived in Kozyreva et al. (2007) The ULF data is for Pc5 frequency range of 2 -10 mHz. The ULF indices used in this work are the logarithm in base 10 of the signal spectral power. The signal spectral power is the integral over the power spectral density above the noise level (Kozyreva et al., 2007

Seed and relativistic electron fluxes indices
In order to quantify the electron fluxes we use the indices F e1.2 and F e130 described in Borovsky and Yakymenko (2017) for electrons with energies near 1.

Results
Figures (2) and (3) show the 24 hour average of relativistic electron flux indices and ULF power indices as a function of time for the two events studied by Rostoker et al. (1998). In each figure the panel on the left has the geosynchronous ULF index plotted, whereas the panel on the right has the ground ULF index plotted. We remind the reader that our data sets have different time resolutions from those used by Rostoker et al. (1998) with 24 hours resolution, whereas we use 1 hour resolution and 24 hours moving averages. However, the visual comparison of Figures (2) and (3) to Figures (1) and (2) in Rostoker et al. (1998) show that they are very similar 3 . In the following we will look at each event separately and compare the values obtained for the mutual information and the Pearson correlation. The reader can also skip sections 4.1 to 4.4 and consult Table 1 which contains a summary of our results. Table 1 is extracted from the information found in Figures 4-11, and while the shape of the 215 statistical dependencies shown in Figures 4-11 are similar, differences between the Events are significant. The panels (a) and (b) in each figure show the dependence with ULF ground index S gr , whereas panels (c) and (d) are for the dependence with ULF geostationary index S geo . The orange line in the panels with mutual information represents the zero value on the basis of the shuffling procedure described in the methodology section. The shaded area overlapping the zero curve for mutual information represents the three standard deviation spread. Thus a value above the shaded area represents a measurement of mutual information that has at least six sigma significance. 225 We note that the peaks in mutual information and Pearson correlation occur between 48 and 50 hours time lag and have maximum values of I max 0.5 and ρ max 0.55−0.6. The mutual information and correlation of electron fluxes with geostationary ULF power S geo shows a prominent 24 hour modulation. As is typical for an index that measures magnetospheric quantities, Discussion section we show that the Pearson correlation is missing out about 20-30% of the statistical dependence due to its inability to capture nonlinearities and that differences in peaks between mutual information and Pearson correlation might be at least partially explained by the inability of the latter to measure nonlinear statistical dependencies. moving average introduces statistical dependence between points less than 12 hour lag apart, but is useful to denote long-term 3 Reducing our resolution to 24 hours for a strict comparison with (Rostoker et al., 1998) is not useful because the values of mutual information and correlation are low, and reducing the number of points would bring both measures to the noise level.
trends. The mutual information and correlation in Figure (5) have the same peaks and shape as in Figure (4) for the one hour resolution, but because of the averaging the modulation present in the high resolution data is lost.

255
We note that the value of the mutual information is once again significantly enhanced since the averaging introduces statistical dependencies between two points less than 12 hours apart, but we also notice that there is a different dependence than for the Pearson correlation. These differences between the two measures and their potential origin in nonlinear phenomena are discussed in the Discussion section. and (d) are for the dependence with ULF geostationary index S geo . We note that the time lag dependence of mutual information and correlation are comparable and that the peak in both occurs for a lag of τ = 0. The peak in the mutual information between 265 S gr and F e130 is I max 0.68, which is significantly greater than the mutual information between S gr and F e1.2 . On the other hand, the peak in the mutual information between S geo and F e130 is I max 0.4, which is comparable to the peak value we found for the mutual information between S geo and F e1.2 . As observed in Figure (4) we also note a modulation in the mutual information and correlation of electron fluxes with geostationary ULF power S geo not present in the dependence with the ground power index S gr . Figure (9) shows the same dependence as in Figure ( Rostoker et al. (1998). Similarly to Event 1, Event 2 shows that the time lag dependence of mutual information and correlation is comparable and that the peak in both occurs around a lag of τ = 0 and values of I max 0.6 − 0.68. Figure (11) 275 looks at the same dependence as in Figure (10)

Discussion
We are now ready to answer the two questions stated in the introduction: (1) Are the events studied by Rostoker et al. (1998) examples of strong ULF wave power and energetic electron dependence? (2) Is the statistical dependence between ULF wave 280 power and electron fluxes nonlinear? In order to answer these two questions we have tabulated the values of the maximum Pearson correlation and maximum mutual information for all events in Table 1. The columns denote, from the left to the right, the event year, the flux index, the ULF index, the maximum Pearson correlation, the maximum mutual information, the information-adjusted correlation, and the lag for the maximum mutual information, respectively. The information-adjusted correlation is defined as the correlation value that would be obtained from the mutual information under the assumption that 285 the dependence between the two variables can be represented as a Gaussian bivariate (cf. Equation 3). The choice of a Gaussian bivariate to distinguish linear and nonlinear dependences stems from the fact that nonlinear equations produce non-Gaussian statistics even in the instance where a system is initialised with Gaussian distributed random variables (Papoulis and Pillai, 2002). Mathematically, the information adjusted correlation can be defined by applying the inverse of Equation (3): The information-adjusted correlation ρ adj allows us to determine whether the Pearson correlation has underestimated the dependence between the random variables due to the presence of nonlinearity. The instance in which the adjusted correlation is statistically comparable to the Pearson correlation denotes that a linear dependence between the fluxes and ULF power dominates and that nonlinear dependencies are either too weak or non-existent. In the opposite case, an adjusted correlation larger than the Pearson correlation indicates that nonlinear dependencies between fluxes and ULF power are statistically significant.

295
Are the events evidence of strong ULF wave power and energetic electron dependence? For the two events studied, the Pearson correlation and the mutual information are both statistically significant, and well above the noise level. However showing weaker linear and nonlinear statistical dependence whereas the 1994 event has correlation values on par with events found over 11 years of data (Simms et al., 2014). The methodology of Simms et al. (2014) separates variables in terms of storm phases and defines a predictor variable, e.g., ULF wave power, as an average over an appropriate time period for a given storm phase. Since we are studying case events the statistical methodology of Simms et al. (2014) can not be explicitly reproduced, 305 but we find comparisons with our results useful in that it gives us a point of reference to judge the strength of the correlation values we found. For another comparison, for all times during the years 1995-2006 (Borovsky, 2017) found a Pearson correlation coefficient of 0.34 between F e1.2 and S gr , whereas they found a higher correlation coefficient of 0.54 between F e1.2 and the 123-hr time integral of S gr . Similarly it was found that the correlation coefficient between F e1.2 and S geo was 0.21 whereas the correlation coefficient was 0.25 between F e1.2 and the 138-hr time integral of S geo . For all times during the years 1995-2004 310 Borovsky and Denton (2014) explored correlation coefficients between S gr and S geo and a relativistic-electron flux F that was calculated differently from F e1.2 . They found correlations between F and S gr of 0.36 (with a time lag of 56 hr) and between F and the 126-hr time integral of S gr of 0.55. Likewise they found correlations between F and S geo of 0.28 (with a time lag of 71 hr) and between F and the 156-hr time integral of S gr of 0.32. Our results demonstrate that even though the events appear, at least visually, to show strong correlation between ULF waves and relativistic electron fluxes, quantitatively the dependence 315 is comparable to other values found in the literature but nonetheless modest when compared with the correlation between ULF waves and seed electrons.
Comparing between seed and relativistic electrons, the statistical dependence with ULF wave power of the 130 keV flux is significantly larger than for relativistic fluxes and ranges between 0.54 and 0.68 for the maximum Pearson correlation and 320 0.44 and 0.67 for the maximum mutual information. We also note that the time-lag for the maximum values is comparable whether one uses the mutual information or the Pearson correlation. 130 keV fluxes have a maximum dependence with time lags of less than a day whereas the relativistic electrons see a maximum for time lags considerably longer between 42 and 67 hours. Moreover, the ground ULF wave power gives a larger dependence than geostationary measured ULF wave power for the 1994 event. For the 1993 event the statistical dependence is the same whether one uses ground or geostationary indices.

325
The ground ULF index spans local daylight hours between 0500 and 1500, whereas the GOES ULF covers the full 24 hours period. This local time difference between ground and geostationary sampling of wave power makes the latter more susceptible to be influenced by substorm activity and the former by viscous processes and pressure pulses on the dayside magnetosphere during moderate geomagnetic activity (Borovsky and Funsten, 2003;Osmane et al., 2015a). However, and as pointed out by Simms et al. (2014), the most notable difference between ground and GOES data is that the ground magnetometers are better 330 positioned to catch ULF wave activity that would result in radial diffusion transport (Lejosne and Kollmann, 2020).
To address the second question, we compare the values of the information adjusted correlation with the Pearson correlation.
We note that the adjusted correlation is significantly larger than the Pearson correlation for all instances. In other words, though constrained to two case studies, our results demonstrate the presence of nonlinear statistical dependencies between energetic 335 electron fluxes and ULF wave power. By using information theory we make no assumptions about the functional form of the nonlinear dependence between the variables, but we can nonetheless state that nonlinearities have to be accounted for.
Our results are consistent with the study of Simms et al. (2018) in which they built regression models that assumed a quadratic dependence in the ULF wave power with a one day lag. Their results indicate that the response of relativistic electron fluxes can be a combination of linear and nonlinear dependence and that incorporating a quadratic term might provide better predictions.

340
Based on the values for the information adjusted correlation, the Pearson correlation might be missing between 20% and 30% of the statistical dependencies between ULF wave power and relativistic electron fluxes.

Conclusions
The Earth's inner magnetosphere is a nonlinearly driven plasma environment in which electrons can be collectively energised to relativistic energies by ULF fluctuations (Lejosne and Kollmann, 2020). The emergence of nonlinear processes translates 345 into non-Gaussian fluctuations in the electromagnetic fields and particle distribution functions. Thus, in order to quantify the processes at play to model the Earth's radiation belts accurately, one needs to determine whether nonlinear statistical dependencies between drivers, such as the solar wind speed and the ULF wave power, and quantities in which energy and momentum is deposited , such as electron fluxes, have to be accounted for.

350
In this study, we described the use of mutual information to characterise statistical dependencies of relativistic electron fluxes on ULF wave power. The benefit of mutual information, in comparison to the Pearson correlation, lies in the capacity to distinguish nonlinear dependencies from linear ones. In order to test our methodology, we revisited the case study of Rostoker et al. (1998) in which two events were shown, from a visual perspective, to indicate strong correlation between the rise of relativistic electron fluxes and ULF wave power. Our application of mutual information to the events presented by Rostoker et al. (1998) indicate 355 that relativistic electron fluxes are linearly and nonlinearly dependent on ULF wave power. However, the values that we found for both the Pearson correlation and mutual information of relativistic electron fluxes and ULF wave power are modest when compared to previous statistical results (Simms et al., 2014) and consistently smaller than the one found between seed electrons and ULF wave power. This result is counter-intuitive since seed electrons with long azimuthal periods can not experience driftorbit resonance with ULF wave fluctuations. and should therefore not be correlated with radial diffusion drivers more strongly 360 than relativistic electron fluxes. However, our results do not indicate a necessary causal physical relationship between seed electrons and ULF wave power, but it does point out to the necessity of not over-interpreting correlational measures, whether linear or nonlinear. The modest dependence of relativistic electron fluxes with ULF wave power could also originate in a shared dependence on solar wind drivers such as the solar wind speed. Our results are therefore indicative of the need to incorporate data analysis tools that can distinguish between interdependencies of various solar wind drivers. In the framework 365 of information theory, conditional mutual information is specifically built for that purpose and has been successfully used to resolve a long lasting question about the relative role of solar wind speed and density in driving relativistic electron fluxes (Wing et al., 2016). In future studies, we will also apply a comparable methodology presented in (Simms et al., 2014) to seek dependencies of relativistic electron fluxes on solar wind drivers for given storm phases and build non-parametric estimators for the probability density of random variables that do not require binning (Kraskov et al., 2004).

Mutual information for continuous variables
For a random variable X, if the cumulative distribution function F (x) is continuous, then X is said to be continuous as well.
Let's denote the probability distribution function f (x) = dF (x)/dx. The differential entropy of a continuous random variable X is defined as: where S is the support set where f (x) > 0. Differential entropy h(X), as in the discrete case with the Shannon entropy H, is also a measure of the uncertainty for a random variable X. However, unlike in the discrete case, the differential entropy can be negative. Consider for instance a random variable distributed uniformly from 0 to L, so that its density is f (X) = 1/L. Then its differential entropy is: Thus, for L < 1, log L < 0 and the differential entropy is negative. The mutual information I(X; Y ) can be extended to continuous variables as:

Derivation of mutual information for Gaussian bivariates
We consider a bivariate X = (X, Y ) T with mean vector and covariance matrix given by ] − µ 2 y and correlation coefficient ρ defined as: The probability density function of the X − Y bivariate is: For the sake of simplicity we focus on the case where µ x = µ y = 0 and σ x = σ y = σ, in which case the joint bivariate distri-400 bution takes the form: and the marginals f (x i ) = (2πσ 2 ) −1/2 exp (−x 2 i /2σ 2 ) for x i = (x, y). Using equation (7) we can compute the mutual information between X and Y . For h(x i ) we find: in which the logarithm is in base 2. And now for the joint differential entropy of a Gaussian bivariate: h(x, y) = − f (x, y) log f (x, y) dx dy 410 = − log (1 − ρ 2 ) −1/2 2πσ 2 f (x, y) dx dy − x 2 + y 2 − 2ρxy 2(1 − ρ 2 )σ 2 f (x, y) dx dy = 1 2 log(1 − ρ 2 ) + log 4πσ 2 .
Therefore, the mutual information of a Gaussian bivariate is a nonlinear function of the correlation ρ: I(x, y) = h(x) + h(y) − h(x, y)   (c) (d) Figure 11. Same as in Figure 9 but with a 24 hour moving average.