1. Introduction
In the current era of big data, there is an exponential increase in observed data from complex signals, and efficient tools to manage this modern deluge of data are intensively looked for in the time series analysis community. Particularly, the brain is probably the most paradigmatic example of a complex system with an enormously large number of neurons (
) and interconnections (
) between them [
1]. Emergent collective states are achieved through the synchronous neuronal activity. The popularization of different brain recording techniques, such as electroencephalography, magnetoencephalography, and functional magnetic resonance imaging, has enabled researchers to have free access to a practically unlimited amount of experimental data sets [
2]. It is also true that our understanding of the functioning of the brain is quite far from being complete. Actually, despite the relevant progress achieved in recent decades, many aspects of the research in the field are still in its infancy, and it is widely recognized that to obtain a better handle on the brain dynamics is one of the most pressing, urgent and challenging issues of our time [
3].
Taking into account that a more thorough comprehension of simple brain conditions is vital to better understand more complex ones, any step forward on this matter is highly appreciated. Indeed, it has previously been suggested that the brain at rest can be interpreted as a fundamental default mode and that the brain activity during a specific task may be described in terms of generic properties of the background brain activity [
4]. Moreover, it has been demonstrated that the activity of the brain at rest is affected by neurological and psychiatric disorders like autism, epilepsy, Parkinson’s disease, schizophrenia, and obsessive-compulsive disorder [
5,
6,
7,
8]. In the same line, resting-state EEG measures have also been shown to be useful for the detection of cognitive decline in mild cognitive impairment and Alzheimer’s disease [
9] and for decoding and predicting cognitive performance [
10]. The influence of the eye’s resting condition (closed/open) on the early detection of Parkinson’s disease has been carefully tested in order to determine the most suitable biomarkers for real-time applications [
11]. Likewise, better classification rates are reached in the eyes open condition in a resting state EEG for the diagnosis of Alzheimer’s disease when using permutation entropy as a neuromarker [
12]. From a more practical perspective, a better knowledge of the differences between simple resting conditions, such as keeping the eyes closed (EC) and the eyes open (EO), could also be potentially useful for brain–computer interfaces. In particular, this may be helpful in enhancing switching mechanisms for assistive technologies (environmental control systems) designed to assist people with severe disabilities in daily living tasks [
13].
Even when the attenuation of the alpha waves over the human posterior cortex in the EO condition is a well-known recognized fact for almost a century [
14], different approaches have more recently been proposed to obtain an improved discrimination and classification of these baseline brain states [
2,
7,
15,
16,
17,
18,
19]. It has been found, for example, that different entropy measures (permutation entropy, approximate entropy, multiscale entropy, spatial permutation entropy) achieve higher values in the EO condition compared with the EC one [
2,
16,
18,
20,
21]. This can be explained by the increasing disorder and desynchronization of brain activity due to visual information receiving and processing [
16]. For a similar reason, there is a differential modulation of the time-reversal symmetry by these two experimental conditions, and the time irreversibility also increases in the EO resting state with respect to the EC resting one [
7].
It is also worth highlighting here the well-recognized potentiality of the ordinal pattern-based methodologies for characterizing EEG data. Without being exhaustive, the first application for detecting the onset of epileptic seizures in intracranial EEG data [
22] and the foundational contributions by Li et al. [
23,
24] and Keller et al. [
25,
26] in this regard deserve to be especially mentioned. The reason for its outstanding performance likely lies in the robustness of ordinal patterns to the data artefacts and low-frequency perturbations that usually contaminate EEG records [
27].
In this work, a recently introduced ordinal metric tool, the permutation Jensen–Shannon distance [
28], is used to try to unravel the intricacies and interplay of linear and nonlinear temporal correlations that are inextricably present in any kind of brain activity. The results obtained prove that the implementation of the aforementioned ordinal measure on the original resting EEG records and different surrogate realizations generated from them provide relevant information about the temporal scales and spatial localizations where these linear and nonlinear components play a significant and non-negligible role. Consequently, and contrasting with previous studies on the same EEG data set, an enhanced characterization and discrimination of these simple baseline brain states is achieved.
The rest of the work is structured as follows. In
Section 2, the main tool implemented along this work, namely the permutation Jensen–Shannon distance, is briefly introduced. After that, a numerical analysis is included in
Section 3 to illustrate how linear and nonlinear components are unveiled through the proposed analysis. Then, in
Section 4, the distinction between EC and EO resting brain states is carefully analyzed. Finally, the main findings obtained from this study and some open problems are detailed in the last
Section 5.
2. Permutation Jensen–Shannon Distance
The permutation Jensen–Shannon distance (PJSD) is an ordinal metric introduced to quantify the degree of discernability between two arbitrary time series [
28]. It has been shown that, by appropriately choosing the time series taken as a reference, a wide range of complex phenomena can be handled in a robust way. Thanks to this versatility, the PJSD represents a useful addition to the complex signals analysis methods. Moreover, this concept has very recently been generalized to evaluate the heterogeneity of multiple time series from an ordinal approach [
29].
The PJSD is basically based on two constitutive ingredients: the Jensen–Shannon divergence [
30] and the ordinal symbolization proposed by Bandt and Pompe (BP) [
31]. The Jensen–Shannon divergence is a dissimilarity measure between two arbitrary probability distributions
and
given by
where
S is the Shannon entropy function, i.e.,
, and as usual, the convention
is assumed in accordance with its mathematical limit. It is a quantity bounded between 0 and
. The minimum value is reached when the two distributions under comparison are identical while the maximum value is obtained whenever their supports are disjoints (that is,
for
). The Jensen–Shannon divergence stands out for satisfying several desirable mathematical properties: non-negativity (
), identity of indiscernibles (
if and only if
), and symmetry (
) [
30]. In addition, it is also well defined even when
vanishes without vanishing
or vice versa. More importantly, the square root of the Jensen–Shannon divergence,
, is a metric since it also satisfies the triangle inequality:
with
R a third arbitrary probability distribution [
32]. Actually, more recently, it has been shown that
for
is also a metric [
33]. Extensions of this divergence to weigh differently the compared distributions and/or to quantify the overall difference among any finite number of probability distributions are also available. Interpretations and statistical properties of the Jensen–Shannon divergence have been exhaustively analyzed by Grosse et al. [
34].
The estimation of the Jensen–Shannon distance,
, between two time series requires first to know the corresponding probability distributions
P and
Q associated with the two time series under analysis. It is, therefore, necessary to map continuous time series into discrete sequences. For such a purpose, the BP coarse-grained representation [
31], introduced more than 20 years ago, is implemented. This symbolization methodology has been successfully applied in heterogeneous fields. The main reasons behind this success are its wide applicability and the fact that the BP discretization method is naturally able to capture the presence of underlying temporal correlations in the dynamics of the process that generates the time series. Interested readers are referred to the reviews [
35,
36,
37,
38,
39,
40] for further details. The BP recipe for symbolizing a time series can be briefly summarized as follows. Given a real-valued time series
, vectors of equally spaced
D values of the form
with
are mapped to one of the
possible ordinal permutations of the same size that describe the order relation between these elements. For example, the vector
is mapped to the permutation pattern
, replacing each element in the original vector by its respective ranking in the subset. Assigning a symbol
to each ordinal pattern, the original time series is mapped to the coarse-graining sequence
, with
the set of permutations of length
D. Just for illustrative purpose,
. There are therefore two parameters which need to be fixed before the method is implemented: the number of elements in the permutation patterns
D (called order or embedding dimension,
with
) and the time separation between the elements in the subsequence
(called lag or embedding delay,
). Consecutive data are considered if
, while
spaced data samples are analyzed if
. An associated ordinal probability distribution,
can then be straightforwardly computed with
the probability of each ordinal pattern estimated by its relative frequency of occurrence in the symbolized sequence. Regarding the selection of the involved parameters, a rule of thumb requires the condition
, with
L the number of data in the original time series, for a robust estimation of
. It is also clear that larger values of
D offer an improved characterization of the underlying temporal structures. On the other hand, a value of the lag
is often used in discrete systems and also when the chosen sampling frequency is the optimal one to characterize the underlying dynamics of continuous systems [
38]. However, this arbitrary choice can lead to erroneous conclusions especially for systems with scale-dependent dynamics [
41]. A multiscale analysis, by analyzing how descriptors of the ordinal distribution change with
, gives a more complete picture in these instances [
42]. Finally, a word of caution when the number of equalities in the time series is non-negligible. Since the BP recipe is based on sorted data, it is needed to clarify how they will be handled. This is particularly relevant when dealing with low-resolution time series. Regarding this issue, it has been shown that the estimation of the ordinal probability distribution could be significantly biased in such cases, and that the addition of a small random perturbation mitigates this unwanted effect [
43].
Different statistics can be computed from the resulting permutation probability distribution given in Equation (
3). The permutation entropy (PE) [
31], which is just defined as the Shannon entropy of this ordinal distribution,
is undoubtedly the most representative and widely used descriptor. It quantifies the variety of permutation patterns in the ordinal sequence obtained from a time series. The maximum value,
, is obtained for a totally random stochastic process (white noise), while the minimum value,
, is reached for a completely regular (monotonically increasing or decreasing) time series. The normalized permutation entropy
, such that
, is often implemented in practical analysis.
The PJSD is defined as the Jensen–Shannon distance between the ordinal pattern distributions associated with two arbitrary time series,
. More precisely, normalized PJSD values with respect to its upper bound, i.e.,
, are estimated. It can be used to compare the ordinal mapping between different time series. Numerical and real data applications included in Ref. [
28] confirm its utility for characterizing, classifying and discriminating different dynamics from the analysis of time series generated by them. Moreover, its robustness to noise and invariance under scaling of the data make this ordinal distance especially appropriate for the analysis of experimental signals. When calculating the PJSD, the order
D chosen to implement the BP symbolization recipe should be the same for both time series in order to have the same number of possible permutation patterns in the ordinal probability distributions to be compared. However, different lags (
and
) can be potentially selected. This opens up the possibility to contrast the ordinal mappings of only one or two time series at two different temporal scales.
3. Numerical Analysis
Trying to illustrate how the PJSD can be implemented to unveil linear and nonlinear underlying dynamics from a time series, a numerically controlled analysis on a toy model is developed below. More precisely, realizations of the logistic map in the fully chaotic regime, a paradigmatic example of a nonlinear complex system, are mixed with simulations from a linear first-order autoregressive AR(1) process according to the following model:
with
,
(
are pseudorandom values drawn from the standard normal distribution), and
m is the fraction of nonlinear dynamics included in the mixed time series (
). When
and
, the pure AR(1) and logistic models are recovered, respectively, while as
m goes from 0 to 1, the mixed time series include less linear stochastic and more nonlinear deterministic structures. The parameter
of the AR(1) model should satisfy the condition
to have a stationary process, and larger correlations and anti-correlations are realized as this parameter moves away from zero in the positive and negative direction, respectively. The parameter
is fixed to be equal to 0.9 in the present analysis. A similar, but not exactly the same, mixed model has previously been proposed to evaluate the performance of network tools in statistical tests to detect weak nonlinearities [
44].
Next, surrogate data sets are generated to extract information about the linear and/or nonlinear nature of the original time series. Particularly, shuffled and Fourier transform based surrogates are considered in the performed analysis. On the one hand, in the first generating algorithm, also known as scrambled or random permutation, the surrogate time series is conducted by shuffling the time order of the original time series and, consequently, any underlying temporal structure (linear and/or nonlinear) is destroyed. These randomized resampled sequences are fully random data that have exactly the same amplitude distribution as the original time series, and the associated null hypothesis is that the data under analysis are just uncorrelated noise with an arbitrary amplitude distribution. On the other hand, with the Fourier transform (FT)-based algorithm, a data sequence with the same autocorrelation function as the original time series is created by randomizing the phases of the discrete Fourier transform of the original time series and then computing the inverse transform [
45]. In this way, the linear behavior contained in the original data are preserved, while the nonlinear temporal structure is fully removed. The null hypothesis is hence that the data that come from a linear Gaussian process. Some generalizations of the FT surrogate generation method, such as the amplitude adjusted Fourier transform (AAFT) and iterative amplitude adjusted Fourier transform (IAAFT), seek to reproduce both the autocorrelation function and the amplitude distribution of the original data [
46]. Interested readers are referred to Ref. [
47] for a thorough review on surrogate data testing.
By estimating the PJSD between the primary time series and its shuffled surrogate counterpart, the influence that any kind of temporal correlation (linear and nonlinear) has on the system that generates the original data can be directly measured. Similarly, the PJSD between the original time series and its FT surrogate quantifies the relevance of the role played by nonlinearity on the underlying system dynamics. Finally, doing the same, i.e., estimating the ordinal distance, but between the two surrogate realizations (shuffled and FT), allows us to quantify the impact that linearity, solely, has on the dynamics. Thus, these three surrogate-based ordinal distances offer a practical and easy way to identify and quantify the linear and/or nonlinear nature of a system from the analysis of a time series generated by it.
Figure 1 shows the results obtained when applying the previously described analysis to the simple mixed model defined by Equation (
5). Mean and standard deviation (as an error bar) of the PJSD estimations with
and
from an ensemble of 100 independent realizations with length
data for the mixed model with
are depicted as a function of the fraction
m. The choice of the lag parameter is plenty justified by the discrete nature of the toy numerical model. It is important to note that in the simulation process, both involved signals in the mixed model, i.e.,
and
, have been standardized (mean equal to 0 and standard deviation equal to 1) previously to be added, in order to make the comparison between both dynamics more equitable. The estimated ordinal distances behave consistently when they are plotted as a function of the fraction of nonlinearity
m (
m goes from 0 to 1 with step 0.05). That is, the PJSD between the original time series and its shuffled surrogate,
(blue curve), that quantifies both, linear and nonlinear, structures, decreases with
m up to reach a minimum value around
m∼0.25–0.3. At this particular fraction value, it is possible to conjecture that there is a maximum interplay between the linear and nonlinear natures occurring in the mixed model. After it, the PJSD increases monotonically due to the growing impact of the deterministic nonlinear dynamics, while the other two proposed ordinal distances that characterize nonlinear and linear components, i.e.,
(red curve) and
(green curve), increases and decreases monotonically with
m, respectively, as was expected. Moreover, these two curves overlap consistently with the more general one that contemplates the effect of both components:
for
m larger than 0.35, whereas
when
m is smaller or equal than 0.2. According to these findings, the linear stochastic and nonlinear deterministic components govern the dynamics of the mixed model in the ranges
and
of
m, respectively. A transition region, where the effect of both components seems to be weakened, happens in the intermediate range. The PJSD baseline reference resulting from the analysis of a pair of shuffled surrogate realizations from each model simulation,
(black curve), has also been included to take finite-size effects into consideration. It is clearly evidenced that there exists a systematic shift or bias that depends on the order
D.
Just for the sake of completeness, and taking into account that the PE is the most representative and established descriptor within the ordinal encoding recipe,
Figure 2 shows the behavior of this entropic measure for the mixed model as a function of the parameter
m. Once again, mean and standard deviation (as error bars) for PE estimations with
and
from the same ensemble of 100 independent realizations for each value of
m are depicted. By comparing
Figure 1 and
Figure 2, it can be concluded that the behavior observed for PE is opposed to the one obtained for
, achieving a maximum where the latter reaches a minimum, and increasing and decreasing in the same range of
m in which the ordinal distance between the original time series and its shuffled surrogate decreases and increases, respectively. Essentially, PE decreases when linear and/or nonlinear temporal correlations are included in the dynamics. Consequently, PE and
offer the same qualitative information when they are used to examine the proposed mixed model. Therefore, PE quantifies the presence of any kind of temporal correlation in a global sense, and when implementing it as a statistic, it is not directly feasible to discriminate between the linear and nonlinear components occurring in the system dynamics.
Previous analysis has been repeated for other values of the correlation parameter and, also, for the two aforementioned improved FT surrogate methods (AAFT and IAAFT). Results obtained are qualitatively and quantitatively similar in the former and latter cases, respectively. It is worth noting that, when considering other values, the ordinal distances that take linear correlations into account ( and ) change accordingly. That is, they increase for low values of m as moves from 0 to 1.
4. Resting Brain States Characterization
EEG data from 109 healthy subjects in two different states: resting with their eyes closed and resting with their eyes open have been analyzed. These data are freely available from PhysioNet [
48], and they include a set of 64-channel EEGs sampled at 160 samples per second from subjects who performed a series of different motor/imagery tasks [
49], including the two baseline runs of resting state conditions, both recorded during a period of one minute. The EEGs were recorded from 64 electrodes according to the international 10-10 system. Please see
https://physionet.org/content/eegmmidb/1.0.0/ (accessed on 31 March 2024) for further details. It is worth remembering here that EEG is a noninvasive and portable technique able to capture brain electric activity with high temporal resolution.
It is well known that the brain is a multiscale system in which typical rhythmic patterns are located at specific regions. Actually, a detailed atlas of the natural frequencies of the human brain at rest have recently been developed [
50]. Bearing in mind this inherent spatio-temporal architecture of the resting human brain, a PJSD analysis per channel varying the lag
, i.e., at different time scales, has been performed. The main goal is to simultaneously look for brain regions and temporal scales that maximize the discrimination between the two resting states (EC and EO). Moreover, trying to disentangle the influence that linear and nonlinear temporal correlations have on the discriminative power, an analysis analogous to the one detailed in the previous section has been carried out.
Figure 3 shows the results obtained when the
is implemented to discriminate between the EC and EO resting states. Differences in average between the EO and EC conditions,
−
, as a function of the channel and lag
(
) are depicted in
Figure 3a while
p-values (in logarithmic base 10 scale), calculated by using the Wilcoxon rank sum test, are shown in
Figure 3b. This non-parametric approach can be used to identify significant differences between two independent samples. The Matlab code
ranksum was implemented for such a purpose (Matlab Version 9.5, R2018b). For further details, interested readers are referred to
https://www.mathworks.com/help/stats/ranksum.html (accessed on 31 March 2024). Topographic visualizations of these
p-values in logarithmic base 10 scale for two specific lags,
and
, are displayed in
Figure 3c and
Figure 3d, respectively. Several interesting conclusions can be drawn from these plots. First, the optimal discrimination between the EO and EC states are obtained for the occipital electrodes when the EEG records are mapped with a lag
, i.e., at a sampling time very close to the original one (
). This finding is clearly not surprising in view of the fact that the primary visual cortex is located in the occipital lobe [
2,
16]. Additionally,
is lower in the EO than in the EC condition for low values of the lag
. Putting this in another way, temporal correlations (linear and/or nonlinear) are stronger in the latter resting condition for these temporal scales. However, the situation changes drastically when larger values of lag (smaller sampling times) are considered since frontal electrodes achieve the lowest
p-values for these temporal scales with larger values of the quantifier in the EO resting state. This is a much more novel and appealing result, which could be potentially correlated with the critical role played by the frontal lobe in visual perception [
16,
51]. Further tests are obviously required to validate this hypothesis. On the other hand, it is also important to highlight that the optimal discrimination powers in the frontal channels are not as high as the one obtained for the occipital ones.
In order to check if the well-known stronger alpha activity in the eye-closed condition plays a main role in the previously obtained result, a simple numerical analysis has been developed. A sine wave with a frequency of 10 Hz was sampled at a sampling rate of 160 Hz and an appropriate fraction of random numbers from a standard normal distribution was then added to simulate the noise stochastic component. It is worth remembering here that alpha waves have frequencies from 8 to 12 Hz, approximately. After performing the same analysis by using
as a discriminative statistic between the simulated noisy oscillations and fully random dynamics, it has been found that the optimal lag scales that minimize the
p-value are
,
and
. They coincide with the odd multiples of the half-period of the sinusoidal waveform. Actually, the improved recognition of periodicities from periodic dynamics at these particular values of the lag
has been carefully confirmed by Bandt by using ordinal autocorrelation functions [
52]. Thus, those are the lag scales in which the alpha rhythm is better resolved, and they consequently optimize the discrimination between the resting states (please see
Figure 3b). There are, however, other lags for which the distinction is enhanced, and the explanation behind this better discrimination is not as direct and immediate. Further research that is beyond the scope of this study is needed to fully understand them.
With the intention to explore the impact that solely nonlinear temporal correlations have on the distinction between the two resting states, an analogous analysis using the
has been developed.
Figure 4 shows the results obtained in this case. Frontal channels have more intense nonlinear correlations in the EO than in the EC resting condition (
Figure 4a), and an enhanced discrimination is found in this brain region practically for all
values under consideration (
Figure 4b). The lowest
p-values, reached for for
(
Figure 4d), are several orders of magnitude lower than those estimated in the previous analysis for
. This result is of high relevance just because a much more robust, spatially located discrimination of the resting states is observed. Even when the reasons behind the presence of stronger nonlinear signatures in the frontal channels for the EO condition are not clear at all and that an interpretation from a neurophysiological perspective is highly necessary, leveraging on this heuristic finding, further insights can be achieved.
Analogously,
has been implemented as a discriminative statistic to unveil the role played by linear temporal correlations (solely) in the EO vs. EC conditions. The main findings are displayed in
Figure 5. It is found that channels in the frontal region of the brain with lag
have the highest discrimination powers. Under these circumstances, linear correlations in the EO resting state are stronger than those in the EC one. It is also worth noting that this behavior changes in the opposite way, i.e., the EO condition has weaker linear correlations than in the EC condition, for occipital and some particular frontal channels at small and large
values, respectively. Indeed, significant differences between both resting conditions are also achieved in such cases, but the associated
p-values are not the smallest ones.
Finally, and for comparison purposes,
Figure 6 illustrates the results obtained when PE is estimated from the resting EEG raw data. On the one hand, it is found that the behavior observed when using the PE as discriminative statistic is strongly correlated with that found for the
(
Figure 3), attaining very similar levels of discrimination in both cases. Once again, as it happens in the numerical analysis developed in
Section 3, both measures, PE and
, give essentially the same qualitative information. On the other hand, by comparing with
Figure 4 and
Figure 5, it is clear that
and
are able to provide useful additional information about the linear and/or nonlinear temporal structures occurring in the brain resting dynamics. Moreover, temporal scales and regions of the brain where these components undertake a relevant role are more robustly identified.
Throughout all of this section, results obtained for the original EEG records with order are illustrated. However, it is worth remarking here that very similar findings are obtained for other orders (, and ), and also when a tiny random perturbation is added to the raw EEGs for breaking equalities. Moreover, it has also been checked that obtained p-values are qualitatively analogous when other statistical tests, such as the two-sample t-test or the Welch’s t-test, are applied.
It is clear that the comparison with other nonlinear techniques would be very useful to evaluate the advantages of the proposed methodology in a more thorough way. Furthermore, in recent years, more powerful ordinal tools have been introduced, such as the generalized statistical complexity measure based on ordinal transition frequencies instead of ordinal pattern frequencies [
53] and some time irreversibility metrics based on permutation patterns [
54], that can help to reach an enhanced performance in the discrimination task. However, the results included in this section confirm that the
and
offer improved discrimination powers than the PE, which is the most widely used ordinal measure, by several orders of magnitude. It is then quite reasonable to conjecture that the multiscale surrogate approach by implementing the PJSD could overcome some very recently observed limitations of the PE for distinguishing eyes-open and eyes-closed brain states when the same dataset is analyzed [
55].