the Creative Commons Attribution 4.0 License.

the Creative Commons Attribution 4.0 License.

# Transfer entropy and cumulant-based cost as measures of nonlinear causal relationships in space plasmas: applications to *D*_{st}

### Jay R. Johnson

### Simon Wing

### Enrico Camporeale

It is well known that the magnetospheric response to the solar wind is
nonlinear. Information theoretical tools such as mutual information, transfer
entropy, and cumulant-based analysis are able to characterize the
nonlinearities in the system. Using cumulant-based cost, we show that
nonlinear significance of *D*_{st} peaks at 3–12 h lags that can be
attributed to *VB*_{s}, which also exhibits similar behavior.
However, the nonlinear significance that peaks at lags 25, 50, and 90 h can
be attributed to internal dynamics, which may be related to the relaxation of
the ring current. These peaks are absent in the linear and nonlinear
self-significance of *VB*_{s}. Our analysis with mutual
information and transfer entropy shows that both methods can establish that
there are strong correlations and transfer of information from
*V*_{sw} to *D*_{st} at a timescale that is consistent with
that obtained from the cumulant-based analysis. However, mutual information
also shows that there is a strong correlation in the backward direction, from
*D*_{st} to *V*_{sw}, which is counterintuitive. In contrast,
transfer entropy shows that there is no or little transfer of information
from *D*_{st} to *V*_{sw}, as expected because it is the solar
wind that drives the magnetosphere, not the other way around. Our case study
demonstrates that these information theoretical tools are quite useful for
space physics studies because these tools can uncover nonlinear dynamics that
cannot be seen with the traditional analyses and models that assume linear
relationships.

*D*

_{st}, Ann. Geophys., 36, 945–952, https://doi.org/10.5194/angeo-36-945-2018, 2018.

One of the most practically important concepts in dynamical systems is the notion of causality. It is particularly useful to organize observational datasets according to causal relationships in order to identify variables that drive the dynamics. Understanding causal dependencies can also help to simplify descriptions of highly complex physical processes because it constrains the coupling functions between the dynamical variables. Analysis of those coupling functions can lead to simplification of the underlying physical processes that are most important for driving the system. It is particularly useful from a practical standpoint to understand causal dependencies in systems involving natural hazards because monitoring of causal variables is closely linked with warning.

A common method to establish causal dependencies in a data stream of two
variables, e.g., [*a*(*t*)] and [*b*(*t*)], is to apply linear correlation
studies such as Strangeway et al. (2005), which showed the relationship
between the downward Poynting flux and ion outflows. Causal relationships are
typically identified by considering a time-shifted correlation function

where 〈…〉 is an ensemble average obtained by drawing samples at a set of measurement times, $\mathit{\{}{t}_{\mathrm{0}},{t}_{\mathrm{1}},\mathrm{\dots},{t}_{N}\mathit{\}}$. For example, Borovsky et al. (1998) used such a method to identify relationships between solar wind variables and plasma sheet variables. The causal dependency that the plasma sheet responds to changes in the solar wind can be identified from the time-shift of the peak of the cross-correlation indicating a response time. From this type of analysis it can be found that the plasma sheet generally responds from the tail to the inner magnetosphere, consistent with the notion of earthward convection. Such analysis has been particularly useful to help understand plasma sheet transport.

However, the procedure of detecting causal relationships based on linear
cross-correlation suffers from a number of limitations. First it should be
noted that the statistical accuracy of the correlation function is limited by
the resolution and length of the data stream. Second, the linear time series
analysis ignores nonlinear correlations, which may be important for energy
transfer in the magnetospheric system. For example, substorms are believed to
involve storage and release of energy in the magnetotail, which is a highly
nonlinear response. Similarly, magnetosphere–ionosphere coupling may also be
highly nonlinear, involving the nonlinear development of accelerating
potentials along auroral field lines and nonlinear current–voltage
relationships. Third, the cross-correlation may not be a particularly clear
measure when there are multiple peaks or if there is little or no asymmetry
in the forward (i.e., *λ*_{ab}(*τ*)) and backward directions (i.e.,
${\mathit{\lambda}}_{ba}\left(\mathit{\tau}\right)={\mathit{\lambda}}_{ab}(-\mathit{\tau})$). Finally, the cross-correlation
does not provide any way to clearly distinguish between two variables that
are passively correlated because of a common driver rather than causally
related.

In the remainder of this paper, we will discuss other methods to identify causal relationships based on entropy-based discriminating statistics such as mutual information and transfer entropy. We will also discuss the cumulant-based method. We will illustrate the shortcomings and strengths of the various methods for studying causality with examples from nonlinear dynamics and space physics.

It is well known that the magnetosphere responds to variation in the solar wind parameters (Baker et al., 1983; Clauer et al., 1981; Crooker and Gringauz, 1993; Johnson and Wing, 2015; Papitashvili et al., 2000; Wing and Johnson, 2015; Wing et al., 2016), and it has been established that the magnetosphere has a significant linear response to the solar wind. However, it is also expected that the magnetosphere has a nonlinear response (Balikhin et al., 2011; Klimas et al., 1998; Tsurutani et al., 1990; Valdivia et al., 2013; Vassiliadis et al., 1990). The nonlinear response may be driven by internal dynamics rather than being driven externally (Johnson and Wing, 2005; Wing et al., 2005). For example, the internal dynamics associated with loading and unloading of magnetic energy associated with storms and substorms is nonlinear (Johnson and Wing, 2014). Indeed, the data analysis of Bargatze et al. (1985) indicated that the dynamical response of the magnetosphere to solar wind input could not be entirely understood using linear prediction filters.

Suppose that we consider a set of variables ** a** and

**, which could be vectors of variables measured in time and we would like to measure their dependency. Instead of considering the covariance matrix or correlation function, we consider a more general measure of dependency between an input and output is obtained by considering whether**

*b*
where *P*(** a**,

**) is the joint probability of input**

*b***and output**

*a***, while**

*b**P*(

**) and**

*a**P*(

**) are the probability of**

*b***and**

*a***respectively. If the relationship holds, then the variables**

*b***and**

*a***are independent. For all other cases, there is some measure of dependency. In the case where the system output is completely known given the input, $P(\mathit{a},\mathit{b})=P\left(\mathit{a}\right)$. The advantage of considering Eq. (2) is that it is possible to detect the presence of higher order nonlinear dependencies between the input and output even in the absence of linear dependencies (Gershenfeld, 1998).**

*b*## 2.1 Mutual information and cumulant-based cost

Mutual information and cumulant-based cost are two useful measures that
quantify Eq. (2). Mutual information has the advantage that in
the limit of Gaussian joint probability distributions, it may be simply
related to the correlation coefficient *C*_{ab}(*τ*) defined in
Eq. (1) (Li, 1990). Cumulants have the advantage of good
statistics for limited datasets and noisy systems (Deco and Schürmann, 2000).
Moreover, for high-dimensional systems it is more efficient to compute
moments of the data rather than try to construct the probability density
function.

Correlation studies also only detect linear correlations, so if the feedback
involves nonlinear processes (highly likely in this case) then their
usefulness may be seriously limited. Alternatively, entropy-based measures
such as mutual information (Materassi et al., 2011; Prichard and Theiler, 1995) and cumulants
(Johnson and Wing, 2005) are useful for detecting linear as well as nonlinear
correlations. The mutual information is constructed from the probability
distribution function of the variables and may be computed using a
quantization procedure where data are binned such that the samples [*a*(*t*)]
are assigned discrete values $\widehat{a}\in \mathit{\{}{a}_{\mathrm{1}},{a}_{\mathrm{2}},\mathrm{\dots},{a}_{n}\mathit{\}}$ of an
alphabet ℵ_{1} and [*b*(*t*)] is assigned discrete values $\widehat{b}\in \mathit{\{}{b}_{\mathrm{1}},{b}_{\mathrm{2}},\mathrm{\dots},{b}_{m}\mathit{\}}$ of an alphabet ℵ_{2}. The ad hoc
time-shifted mutual entropy

has been used as an indicator of causality, but suffers from the same problems as the time-shifted cross-correlation when it has multiple peaks and long-range correlations.

Similarly, examination of time-shifted cumulants could be used as an indicator of causality in a nonlinear system. In this case, we can define a discriminating statistic

where

are the cumulants

of the joint probability distribution for variables ${z}_{\mathrm{1}},\mathrm{\dots},{z}_{j}$.

With only two variables, *a* and *b*, defined above, we can
consider the cost function

The presence of nonlinear dependence has been identified by comparing the
cumulant cost for a time series with the cumulant-based cost of surrogate
time series, which are constructed to have the same linear correlations as in
Johnson and Wing (2005). Significance measures the difference in the
discriminating statistic from the mean of the discriminating statistic of the
surrogates in terms of the spread of the surrogates, *σ*.

In Sect. 3, we will show an application of cumulant-based analysis to the disturbance storm time index (*D*_{st}). In
principle, the cross-correlation, mutual information, and cumulant-based cost
should be independent of the selection of measurement points if the system is
stationary; therefore, time stationarity can be examined by comparing these
discriminating statistics for groups of measurements drawn from different
windows of time as in Johnson and Wing (2005) and Wing et al. (2016).

## 2.2 Transfer entropy

Another method for determining causality is the one-sided transfer entropy (De Michelis et al., 2011; Materassi et al., 2014; Schreiber, 2000; Wing et al., 2016, 2018), which is based upon the conditional mutual information

The conditional mutual information measures the dependence of two variables,
*x* and *y*, given a conditioner variable, *z*. If either *x* or *y* are
dependent on *z*, the mutual information between *x* and *y* is reduced, and
this reduction of information provides a method to eliminate coincidental
dependence, or conversely to identify causal dependence.

Transfer entropy considers the conditional mutual information between two variables using the past history of one of the variables as the conditioner.

where ${\widehat{a}}^{\left(k\right)}\left(t\right)=\left[\widehat{a}\right(t),\widehat{a}(t-\mathrm{\Delta}),\mathrm{\dots},\widehat{a}(t-(k-\mathrm{1})\mathrm{\Delta}\left)\right]$. The standard definition of transfer entropy takes
*k*=1 (no lag), but keeping a higher embedding dimension could in principle
provide a more precise measure (for example, if *a* has periodicity, a
dimension of 2 may provide better prediction of future values of *a* from its
past time series and therefore lower the transfer entropy). Transfer entropy
as a discriminating statistic has the following advantages. First, in the
absence of information flow from *a* to *b* (i.e., *a*(*t*+*τ*) has no
additional dependence from *b*(*t*) beyond what is known from the past history
of *a*^{(k)}(*t*)) so that $p\left(\widehat{a}\right(t+\mathit{\tau}\left)\mathrm{|}{\widehat{a}}^{\left(k\right)}\right(t),\widehat{b}(t\left)\right)=p\left(\widehat{a}\right(t+\mathit{\tau}\left)\mathrm{|}{\widehat{a}}^{\left(k\right)}\right(t\left)\right)$ and the transfer entropy
vanishes. The transfer entropy is also highly directional so that
${\mathcal{T}}_{a\to b}\ne {\mathcal{T}}_{b\to a}$. The advantage can
be clearly seen for dynamical systems in which variables are forward differenced
and the transfer entropy is clearly one-sided while mutual information and
correlation functions can even be symmetric (Schreiber, 2000). This
measure also accounts for static internal correlations, which can be used to
determine whether two variables are driven by a common driver or whether the
variable *b* is causally driving the variable *a*.

Both mutual information and transfer entropy require binning of data. As
mentioned in Wing et al. (2016), the number of bins (*n*_{b}) needs to
be chosen properly and there are some guidelines that can be followed. In
general, we would like to maximize the amount of information. Having too few
bins would lump too many points into the same bin, leading to loss of
information. Conversely, having too many bins would leave many bins with 0 or
a few number of points, which also would lead to loss of information.
Sturges (1926) proposed that for a normal distribution, optimal
${n}_{\mathrm{b}}={\mathrm{log}}_{\mathrm{2}}\left(n\right)+\mathrm{1}$ and bin width
*w* = range∕*n*_{b}, where *n* is the number of points in
the dataset and range is the maximum value minus the minimum value of the points. In
practice, there is usually a range of *n*_{b} that would work.

*D*

_{st}analysis

*D*_{st} (disturbance storm time index) is an hourly index that gives
a measure of the strength of the symmetric ring current that, in turn,
provides a measure of the dynamics of geomagnetic storms
(Dessler and Parker, 1959). Because of its global nature, *D*_{st} is
often used as one of the several indices that represent the state of the
magnetosphere. For example, Balasis et al. (2011) used the cumulative square
amplitude of the *D*_{st} time series as a proxy for energy dissipation
rate in the magnetosphere and found that it fits a power law well with
log-periodic oscillations, which was interpreted as evidence for discrete-scale invariance in the *D*_{st} dynamics.

When plasma sheet ions are injected into the Earth's inner magnetosphere, they
drift westward around the Earth, forming the ring current. Studies have shown
that the substorm occurrence rate increases with solar wind velocity (high
speed streams) (Kissinger et al., 2011; Newell et al., 2016). An increase in
the solar wind electric field, *VB*_{z}, can increase the dawn–dusk
electric field in the magnetotail, which in turn determines the number of
plasma sheet particles that move to the inner magnetosphere
(Friedel et al., 2001). Studies have shown that the electric field,
*VB*_{s} (*V*_{sw} × southward IMF *B*_{z})
or *VB*_{z}, has a strong effect on the ring current dynamics
(Burton et al., 1975; McPherron and O'Brien, 2001; O'Brien and McPherron, 2000; Weygand and McPherron, 2006).

For the present study, we examine the relationships between solar wind
velocity (*V*_{sw}) and *VB*_{s} with *D*_{st}.
We use *D*_{st} records in the period 1974–2001 obtained from Kyoto
University World Data Center for Geomagnetism
(http://swdcwww.kugi.kyoto-u.ac.jp/index.html, last access:
18 January 2018). The corresponding solar wind
data are obtained from IMP-8, ACE, WIND, ISEE1, and ISEE3 observations. The
ACE SWEPAM and MAG data and the WIND MAG data are obtained from CDAWeb
(http://cdaweb.gsfc.nasa.gov/, last access: 18 January 2018). The WIND 3DP data are obtained from the 3DP team
directly. The ISEE1 and ISEE3 data are obtained from UCLA (these datasets are
also available at NASA NSSDC; http://nssdc.gsfc.nasa.gov/space/, last
access: 18 January 2018). The IMP8 data come
directly from the IMP teams. The solar wind is propagated with the minimum
variance technique (Weimer et al., 2003) to GSM (*X*, *Y*, *Z*) = (17, 0,
0) *R*_{E} to produce 1 min files, from which hourly averaged solar
wind parameters are constructed.

## 3.1 Cumulant-based analysis

Section 2.1 presents the method of cumulant-based cost. Here,
we show an application of cumulant-based cost to detect nonlinear dynamics in
*D*_{st}. We consider the forward coupling between a solar wind
variable such as *VB*_{s} and *D*_{st}, which
characterizes the ring current response to the solar wind driver. We
therefore consider the nonlinear cross-correlations of the vector

The generalization of cost is based on realizations of *{**z*_{1},*z*_{2}*}*. In
this case, each variable is Gaussianized with unit variance to eliminate
static nonlinearities (i.e., higher order self-correlations in
*VB*_{s} and *D*_{st} are eliminated so that the cost
measures only cross-dependence between *VB*_{s} and
*D*_{st}). This procedure is explained in the next paragraph.

The distributions of *D*_{st} and *VB*_{s} are
generally non-Gaussian. As such, the raw distributions (e.g., distribution of
values of *D*_{st}) may have nonzero higher order cumulants (e.g.,
they can have a skew and kurtosis). This property makes it more difficult to
interpret whether the higher order cumulants in the time evolution arise from
the overall shape of the distribution of data points or from the
time-ordering of the data. To eliminate the inherent nonzero cumulants in the
overall distribution of data, we construct a rank-ordered map from the
original dataset to a proxy dataset of the same length drawn from a Gaussian
distribution (Deco and Schürmann, 2000; Kennel and Isabelle, 1992; Schreiber and Schmitz, 1996). The
distribution of the proxy dataset ensures that all cumulants of the
distribution beyond second order should in principle vanish. However, the
time-ordering of the data can still lead to nonzero cumulants because the
joint probability distribution of *D*_{st}(*t*+*τ*) and
*D*_{st}(*t*) may be non-Gaussian even if the distribution of
*D*_{st} is Gaussian. Moreover, it is simple to construct surrogate
data from the Gaussianized data that share the same autocorrelation by using
the same power spectrum but randomly shifting the phases of the Fourier
coefficients. The surrogate data therefore have the same autocorrelation as
the original data. Any deviation from the linear statistic is apparent from
comparison with the surrogate data, and we interpret these deviations as
evidence of nonlinear dependence because we have falsified the hypothesis
that the data can be adequately described by linear statistics. This method
has been successfully employed in Johnson and Wing (2005), in which the *K*_{p} record was
analyzed with mutual information and cumulants.

In Fig. 1 we plot the significance obtained from the year 1999
as a function of time delay, *τ*. Significance extracted from
$\mathit{\left\{}{\mathit{\text{VB}}}_{\mathrm{s}}\right(t),{D}_{\mathrm{st}}(t+\mathit{\tau}\left)\mathit{\right\}}$ and
$\mathit{\left\{}{\mathit{\text{VB}}}_{\mathrm{s}}\right(t),{\mathit{\text{VB}}}_{\mathrm{s}}(t+\mathit{\tau}\left)\mathit{\right\}}$ for 1999 is
plotted in panels (a) and (b), respectively. It should be noted that there is
a strong linear response at around 3 h time delay. As shown in
Fig. 1a, there is a clear nonlinear response with peaking
around 3–10, 25, 50, and 90 h, lasting for approximately 1 week. In contrast,
in Fig. 1b, the nonlinearity only has one broad peak around
3–12 h in the self-significance for *VB*_{s}, suggesting
that the nonlinear and linear peaks at *τ*=3–12 h in
Fig. 1a may be associated with
*VB*_{s}. We will revisit the solar wind causal relationship
with *D*_{st} using transfer entropy in
Sect. 3.2.

The absence of the nonlinear peaks at *τ* = 25, 50, and 90 h in the
self-significance for *VB*_{s} (Fig. 1b)
suggests that these nonlinearities in
$\mathit{\left\{}{\mathit{\text{VB}}}_{\mathrm{s}}\right(t),{D}_{\mathrm{st}}(t+\mathit{\tau}\left)\mathit{\right\}}$ are related to internal
magnetospheric dynamics. As the *D*_{st} index is thought to reflect
storm activity, it is reasonable that nonlinear significance would decay on
the order of 1 week as storms commonly last around that time. The strong
nonlinear responses at *τ* = 25, 50, and 90 h are likely related to
multiple modes of relaxation of the ring current following the commencement
of storms. It should also be noted that other nonlinearities detected by even
higher order cumulants may also be present; however, the calculation
demonstrates the nonlinear nature of the underlying dynamics.

A common scenario for storm–ring current interaction is the following. A
storm compresses the magnetosphere, intensifies the magnetic field in the
magnetosphere, and injects energetic particles into the ring current region.
The ring current intensifies during the main phase of the storm, which can
last ∼ 6 h (Weygand and McPherron, 2006). Once the injection stops, the ring
current begins to decay and the storm enters the recovery phase. Conservation
of the magnetic moment implies that anisotropies develop in the ring current and
plasma sheet. Anisotropy drives the ring current plasma unstable to ion
cyclotron waves. The ion cyclotron waves scatter energetic ions into the loss
cone so that they are lost from the ring current. Nonlinear interaction
between waves and particles keeps the plasma near marginal stability with a
steady loss of energetic particles due to wave–particle scattering. Other
loss mechanisms include charge exchange, Coulomb scattering, and convection
of ions to the front of the magnetopause. The ring current decay can have two
stages (Kozyra et al., 2002). In the first stage, the ring current decays
rapidly and the loss mechanisms can be attributed to convective outflow,
pitch-angle scattering in the ring current, and O^{+} charge exchange
(Hamilton et al., 1988; Weygand and McPherron, 2006). The second stage may typically
begin about 1 day from the commencement of the storm (see, for example,
Fig. 7 of Kozyra et al., 2002). In the second stage, the decay rate is
slower and is attributed mainly to H^{+} charge exchange
(Hamilton et al., 1988) and can take several days to deplete the ring current
to the baseline level (Smith et al., 1976). We can speculate that the
multiple nonlinear response lag times that are detected with the
cumulant-based approach are likely the relaxation of the ring current due to
the complex interplay of multiple loss processes.

## 3.2 Transfer entropy

As mentioned in Sect. 2.2, transfer entropy gives a measure of
how much information is transferred from one variable to another. We have
applied transfer entropy and mutual information to the relationship between
the *V*_{sw} and *D*_{st} for the period 1974–2001. The result
is shown in Fig. 2. Note that the mutual information
measure suggests strong correlations between prior values of *D*_{st}
and *V*_{sw}. This finding suggests that *D*_{st} could be a
driver of *V*_{sw}, which is counterintuitive. On the other hand, the
transfer entropy clearly shows that this information transfer in the backward
direction (*D*_{st}→*V*_{sw}) does not rise above the
noise level (the horizontal blue lines indicate mean and standard deviation
of 100 surrogate datasets for which the data were randomly reordered.) This
result is expected because it is the solar wind that drives the
magnetosphere, not the other way around. The transfer of information from
*V*_{sw} to *D*_{st} peaks at *τ*=8–11 h. The cumulant-based analysis in Sect. 3.1 shows that the
response of *D*_{st} to *VB*_{s} has a similar timescale. This timescale
is consistent with the 4 to 15 h transport time for the solar wind to reach
the midnight and noon regions of the geosynchronous orbit, respectively, from
the dayside magnetopause (Borovsky et al., 1998). The analysis presented here
illustrates the power of the transfer entropy for accessing causality.

We recently used mutual information, transfer entropy, and conditional mutual
information to discover the solar wind drivers of the outer radiation belt
electrons (Wing et al., 2016). Because *V*_{sw} anticorrelates with
solar wind density (*n*_{sw}), it is hard to isolate the effects of
*V*_{sw} on radiation belt electrons, given *n*_{sw} and vice
versa. However, using conditional mutual information, we were able to
determine the information transfer from *n*_{sw} or any other solar
wind parameters to radiation belt electrons, given *V*_{sw} (or any
other solar wind parameters). We also showed that the triangle distribution
in the radiation belt electron vs. solar wind velocity plot
(Reeves et al., 2011) can be understood better when we consider that
*V*_{sw} and *n*_{sw} transfer information to radiation belt
electrons with lags of 2 and 0 days (< 24 h), respectively. Also recently,
we used transfer entropy to better understand the causal parameters in the
solar cycle dynamo and their response lag times (Wing et al., 2018).

As a follow-up to Wing et al. (2016, 2018), the present study
demonstrates further how information theoretical tools can be useful for
space physics and space weather studies. Cumulant-based analysis can be used
to distinguish internal vs. external driving of the system. Both mutual
information and transfer entropy give a measure of shared information between
two variables (or vectors). However, unlike mutual information, transfer
entropy is highly directional. To illustrate, we apply mutual information,
transfer entropy, and cumulant-based analysis to investigate the dynamics of
the *D*_{st} index.

Our analysis with mutual information and transfer entropy indicates that
there are strong linear and nonlinear correlations and transfer of
information, respectively, in the forward direction between *V*_{sw}
and *D*_{st} (*V*_{sw} → *D*_{st}). However,
mutual information indicates that there is also a strong correlation in the
backward direction (*D*_{st} → *V*_{sw}), which is
puzzling and counterintuitive. In contrast, the transfer entropy indicates
that there is no information transfer in the backward direction
(*D*_{st}→*V*_{sw}), as expected because it is the
solar wind that drives the magnetosphere, not the other way around. The
transfer of information from *V*_{sw} to *D*_{st} peaks at
*τ*=8–11 h.

Using the cumulant-based significance, we have established that the
underlying dynamics of *D*_{st} is in general nonlinear, exhibiting a
quasiperiodicity which is detectable only if nonlinear correlations are taken
into account. The strong nonlinear responses of *D*_{st} to
*VB*_{s} at *τ*=25, 50, and 90 h are likely related to
multiple modes of relaxation of the ring current from multiple loss
mechanisms following the commencement of storms. It is, of course, possible
that these nonlinearities are caused by solar wind drivers other than
*VB*_{s}. However, the timing of these nonlinearities would
put them well in the recovery phase of a storm, and previous studies suggested
that the ring current decays in the recovery phase are strongly influenced by
*VB*_{s} (Burton et al., 1975; McPherron and O'Brien, 2001; O'Brien and McPherron, 2000).
The nonlinearities at *τ*=3–12 h are not caused by internal dynamics
but rather by the solar wind driver, which is similar to the timescale for
the solar wind transport time from the dayside magnetopause to the inner
magnetosphere. This timescale is consistent with the timescale for the
information transfer from the solar wind to *D*_{st} obtained from
transfer entropy analysis.

Although linear models are useful, our results indicate that these models have to be used with caution because the solar wind–magnetosphere system is inherently nonlinear. Hence, nonlinearities generally need to be taken into account in order to describe the system accurately. Local linear models (which include slow evolution of parameters) may be able to handle some nonlinearities, but it is expected that these local linear models would have difficulties if the dynamics suddenly and rapidly change.

All the derived data products in this paper are available upon request by email (simon.wing@jhuapl.edu).

The authors declare that they have no conflict of interest.

Simon Wing acknowledges support from JHU/APL Janney Fellowship, NSF grant
AGS-1058456, and NASA grants (NNX13AE12G, NNX15AJ01G, NNX16AR10G, and
NNX16AQ87G). Jay R. Johnson acknowledges support from NASA grants
(NNH11AR07I, NNX14AM27G, NNH14AY20I, NNX16AC39G), NSF grants (ATM0902730,
AGS-1203299, AGS-1405225), and DOE contract DE-AC02-09CH11466.
Enrico Camporeale is partially funded by the NWO Vidi grant no. 639.072.716.
We thank James M. Weygand for the solar wind data processing. The raw solar
wind data from ACE, Wind, ISEE1, and ISEE3 were obtained from NASA CDAW and
NSSDC.

The topical editor, Georgios
Balasis, thanks one anonymous referee for help in evaluating this paper.

Baker, D. N., Zwickl, R. D., Bame, S. J., Hones, E. W., Tsurutani, B. T., Smith, E. J., and Akasofu, S.-I.: An ISEE 3 high time resolution study of interplanetary parameter correlations with magnetospheric activity, J. Geophys. Res., 88, 6230, https://doi.org/10.1029/ja088ia08p06230, 1983. a

Balasis, G., Papadimitriou, C., Daglis, I. A., Anastasiadis, A., Athanasopoulou, L., and Eftaxias, K.: Signatures of discrete scale invariance in Dst time series, Geophys. Res. Lett., 38, L13103, https://doi.org/10.1029/2011GL048019, 2011. a

Balikhin, M. A., Boynton, R. J., Walker, S. N., Borovsky, J. E., Billings, S. A., and Wei, H. L.: Using the NARMAX approach to model the evolution of energetic electrons fluxes at geostationary orbit, Geophys. Res. Lett., 38, L18105, https://doi.org/10.1029/2011GL048980, 2011. a

Bargatze, L. F., Baker, D. N., Hones, E. W., and McPherron, R. L.: Magnetospheric impulse response for many levels of geomagnetic activity, J. Geophys. Res., 90, 6387–6394, 1985. a

Borovsky, J. E., Thomsen, M. F., and Elphic, R. C.: The driving of the plasma sheet by the solar wind, J. Geophys. Res., 103, 17617–17640, https://doi.org/10.1029/97JA02986, 1998. a, b

Burton, R. K., McPherron, R. L., and Russell, C. T.: An Emperical Relationship Between Interplanetary Conditions and Dst, J. Geophys. Res., 80, 4204–4214, 1975. a, b

Clauer, C. R., McPherron, R. L., Searls, C., and Kivelson, M. G.: Solar wind control of auroral zone geomagnetic activity, Geophys. Res. Lett., 8, 915–918, https://doi.org/10.1029/gl008i008p00915, 1981. a

Crooker, N. U. and Gringauz, K. I.: On the low correlation between long-term averages of solar wind speed and geomagnetic activity after 1976, J. Geophys. Res., 98, 59–62, https://doi.org/10.1029/92ja01978, 1993. a

Deco, G. and Schürmann, B.: Information Dynamics, Springer-Verlag, New York, 2000. a, b

De Michelis, P., Consolini, G., Materassi, M., and Tozzi, R.: An information theory approach to the storm-substorm relationship, J. Geophys. Res.-Space, 116, A08225, https://doi.org/10.1029/2011JA016535, 2011. a

Dessler, A. J. and Parker, E. N.: Hydromagnetic theory of geomagnetic storms, J. Geophys. Res., 64, 2239–2252, https://doi.org/10.1029/JZ064i012p02239, 1959. a

Friedel, R. H. W., Korth, H., Henderson, M. G., Thomsen, M. F., and Scudder, J. D.: Plasma sheet access to the inner magnetosphere, J. Geophys. Res.-Space, 106, 5845–5858, https://doi.org/10.1029/2000ja003011, 2001. a

Gershenfeld, N.: The Nature of Mathematical Modeling, Cambridge University Press, Cambridge, 1998. a

Hamilton, D., Gloeckler, G., Ipavich, F., Stüdemann, W., Wilken, B., and Kremser, G.: Ring current development during the great geomagnetic storm of February 1986, J. Geophys. Res.-Space, 93, 14343–14355, 1988. a, b

Johnson, J. R. and Wing, S.: A solar cycle dependence of nonlinearity in magnetospheric activity, J. Geophys. Res., 110, A04211, https://doi.org/10.1029/2004ja010638, 2005. a, b, c, d, e

Johnson, J. R. and Wing, S.: External versus internal triggering of substorms: An information-theoretical approach, Geophys. Res. Lett., 41, 5748–5754, https://doi.org/10.1002/2014gl060928, 2014. a

Johnson, J. R. and Wing, S.: The dependence of the strength and thickness of field-aligned currents on solar wind and ionospheric parameters, J. Geophys. Res.-Space, 120, 3987–4008, https://doi.org/10.1002/2014ja020312, 2015. a

Kennel, M. B. and Isabelle, S.: Method to Distinguish Possible Chaos from Colored Noise and to Determine Embedding Parameters, Phys. Rev. A, 46, 3111–3118, 1992. a

Kissinger, J., McPherron, R. L., Hsu, T.-S., and Angelopoulos, V.: Steady magnetospheric convection and stream interfaces: Relationship over a solar cycle, J. Geophys. Res.-Space, 116, A00I19, https://doi.org/10.1029/2010ja015763, 2011. a

Klimas, A. J., Vassiliadis, D., and Baker, D. N.: Dst index prediction using data-derived analogues of the magnetospheric dynamics, J. Geophys. Res., 103, 20435–20448, 1998. a

Kozyra, J., Liemohn, M., Clauer, C., Ridley, A., Thomsen, M., Borovsky, J., Roeder, J., Jordanova, V., and Gonzalez, W.: Multistep Dst development and ring current composition changes during the 4–6 June 1991 magnetic storm, J. Geophys. Res.-Space, 107, SMP 33-1-SMP 33-22, https://doi.org/10.1029/2001JA000023, 2002. a, b

Li, W.: Mutual information functions versus correlation functions, J. Stat. Phys., 60, 823, https://doi.org/10.1007/BF01025996, 1990. a

Materassi, M., Ciraolo, L., Consolini, G., and Smith, N.: Predictive Space Weather: An information theory approach, Adv. Space Res., 47, 877–885, https://doi.org/10.1016/j.asr.2010.10.026, 2011. a

Materassi, M., Consolini, G., Smith, N., and De Marco, R.: Information theory analysis of cascading process in a synthetic model of fluid turbulence, Entropy, 16, 1272–1286, 2014. a

Mcpherron, R. L. and O'Brien, P.: Predicting Geomagnetic Activity: The DstIndex, in: Space Weather, edited by: Song, P., Singer, H. J., and Siscoe, G. L., https://doi.org/10.1029/GM125p0339, 2001. a, b

Newell, P., Liou, K., Gjerloev, J., Sotirelis, T., Wing, S., and Mitchell, E.: Substorm probabilities are best predicted from solar wind speed, J. Atmos. Sol.-Terr. Phy., 146, 28–37, https://doi.org/10.1016/j.jastp.2016.04.019, 2016. a

O'Brien, T. P. and McPherron, R. L.: An empirical phase space analysis of ring current dynamics: Solar wind control of injection and decay, J. Geophys. Res., 105, 7707–7720, 2000. a, b

Papitashvili, V. O., Papitashvili, N. E., and King, J. H.: Solar cycle effects in planetary geomagnetic activity: Analysis of 36-year long OMNI dataset, Geophys. Res. Lett., 27, 2797–2800, https://doi.org/10.1029/2000gl000064, 2000. a

Prichard, D. and Theiler, J.: Generalized redundancies for time series analysis, Phys. D, 84, 476–493, https://doi.org/10.1016/0167-2789(95)00041-2, 1995. a

Reeves, G. D., Morley, S. K., Friedel, R. H. W., Henderson, M. G., Cayton, T. E., Cunningham, G., Blake, J. B., Christensen, R. A., and Thomsen, D.: On the relationship between relativistic electron flux and solar wind velocity: Paulikas and Blake revisited, J. Geophys. Res.-Space, 116, A02213, https://doi.org/10.1029/2010ja015735, 2011. a

Schreiber, T.: Measuring Information Transfer, Phys. Rev. Lett., 85, 461–464, https://doi.org/10.1103/PhysRevLett.85.461, 2000. a, b

Schreiber, T. and Schmitz, A.: Improved Surrogate Data for Nonlinearity Tests, Phys. Rev. Lett., 77, 635–639, 1996. a

Smith, P. H., Hoffman, R. A., and Fritz, T. A.: Ring current proton decay by charge exchange, J. Geophys. Res., 81, 2701–2708, https://doi.org/10.1029/JA081i016p02701, 1976. a

Strangeway, R., Ergun, J. R. E., Su, Y.-J., Carlson, C. W., and Elphic, R. C.: Factors controlling ionospheric outflows as observed at intermediate altitudes, J. Geophys. Res., 110, A03221, https://doi.org/10.1029/2004ja010829, 2005. a

Sturges, H. A.: The choice of class interval, J. Am. Stat. Assoc., 21, 65–66, https://doi.org/10.1080/01621459.1926.10502161, 1926. a

Tsurutani, B. T., Sugiura, M., Iyemori, T., Goldstein, B. E., Gonzalez, W. D., Akasofu, S. I., and Smith, E. J.: The nonlinear response of AE to the IMF Bs driver: A spectral break at 5 hours, Geophys. Res. Lett., 17, 279–282, 1990. a

Valdivia, J. A., Rogan, J., Muñoz, V., Toledo, B. A., and Stepanova, M.: The magnetosphere as a complex system, Adv. Space Res., 51, 1934–1941, https://doi.org/10.1016/j.asr.2012.04.004, 2013. a

Vassiliadis, D. V., Sharma, A. S., Eastman, T. E., and Papadopoulos, K.: Low-dimensional chaos in magnetospheric activity from AE time series, Geophys. Res. Lett., 17, 1841–1844, 1990. a

Weimer, D. R., Ober, D. M., Maynard, N. C., Collier, M. R., McComas, D. J., Ness, N. F., Smith, C. W., and Watermann, J.: Predicting interplanetary magnetic field (IMF) propagation delay times using the minimum variance technique, J. Geophys. Res., 108, 1026, https://doi.org/10.1029/2002ja009405, 2003. a

Weygand, J. M. and McPherron, R. L.: Dependence of ring current asymmetry on storm phase, J. Geophys. Res.-Space, 111, A11221, https://doi.org/10.1029/2006JA011808, 2006. a, b, c

Wing, S. and Johnson, J. R.: Theory and observations of upward field-aligned currents at the magnetopause boundary layer, Geophys. Res. Lett., 42, 9149–9155, https://doi.org/10.1002/2015gl065464, 2015. a

Wing, S., Johnson, J. R., Jen, J., Meng, C.-I., Sibeck, D. G., Bechtold, K., Freeman, J., Costello, K., Balikhin, M., and Takahashi, K.: Kp forecast models, J. Geophys. Res., 110, A04203, https://doi.org/10.1029/2004ja010500, 2005. a

Wing, S., Johnson, J. R., Camporeale, E., and Reeves, G. D.: Information theoretical approach to discovering solar wind drivers of the outer radiation belt, J. Geophys. Res.-Space, 121, 9378–9399, https://doi.org/10.1002/2016ja022711, 2016. a, b, c, d, e, f

Wing, S., Johnson, J. R., and Vourlidas, A.: Information Theoretic Approach to Discovering Causalities in the Solar Cycle, Astrophys. J., 854, 2, https://doi.org/10.3847/1538-4357/aaa8e7, 2018. a, b, c

*D*

_{st}peaks at lags of 3–12 hours which can be attributed to

*VB*

_{s}, which also exhibits similar behavior. However, the nonlinear significance that peaks at lags of 25, 50, and 90 hours can be attributed to internal dynamics, which may be related to the relaxation of the ring current.