Abstract
Physicists are starting to work in areas where noisy signal analysis is required. In these fields, such as Economics, Neuroscience, and Physics, the notion of causality should be interpreted as a statistical measure. We introduce to the lay reader the Granger causality between two time series and illustrate ways of calculating it: a signal X “Granger-causes” a signal Y if the observation of the past of X increases the predictability of the future of Y when compared to the same prediction done with the past of Y alone. In other words, for Granger causality between two quantities it suffices that information extracted from the past of one of them improves the forecast of the future of the other, even in the absence of any physical mechanism of interaction. We present derivations of the Granger causality measure in the time and frequency domains and give numerical examples using a non-parametric estimation method in the frequency domain. Parametric methods are addressed in the Appendix. We discuss the limitations and applications of this method and other alternatives to measure causality.
Keywords:
Granger causality; autoregressive process; conditional Granger causality; non-parametric estimation
1. Introduction
The notion of causality has been the concern of thinkers at least since the ancient Greeks [1][1] A. Falcon, in: The Stanford Encyclopedia of Philosophy, edited by E.N. Zalta (Stanford University, Palo Alto, 2019).. More recently, Clive Granger [2][2] C.W.J Granger, Econometrica 37, 424 (1969)., in his paper entitled “Investigating Causal Relations by Econometric Models and Cross-spectral Methods” from 1969, elaborated a mathematical framework to describe a form of causality – henceforth called Granger Causality1 1 It is also referred as Wiener-Granger causality. (GC) in order to distinguish it from other definitions of causality. Given two stochastic variables, and , there is a causal relationship (in the sense of Granger) between them if the past observations of Y help to predict the current state of X, and vice-versa. If so, then we say that Y Granger-causes X. Granger was inspired by the definition of causality from Norbert Wiener [3][3] N. Wiener, in: Modern Mathematics for Engineers edited by E.F. Beckenbach (McGraw-Hill, New York, 1956), v. 1., in which Y causes X if knowing the past of Y increases the efficacy of the prediction of the current state of when compared to the prediction of by the past values of X alone2 2 Other notions of causality have been defined, one worth mentioning is Pearl's causality [4]. Over the years, Pearl's causality has been revised by him and colleagues in a series of published works [5,6]. .
In the multidisciplinary science era, more and more physicists are involved in research in other areas, such as Economics and Neuroscience. These areas usually have big data sets. Data analysis tools, such as GC, come in handy to extract meaningful knowledge from these sets.
Causality inference via GC has been widely applied in different areas of science, such as: prediction of financial time series [7[7] T. Vỳrost, Š. Lyócsa and E. Baumöhl, Physica A. 427, 262 (2015). [8] B. Candelon and S. Tokpavi, J. Bus. Econ. Stat. 34, 240 (2016).-9[9] H. Ding, H. Kim and S.Y. Park, Energ. Econ. 59, 58 (2016).], earth systems [10][13] G. Tissot, A. Lozano–Durán, L. Cordier, J. Jiménez and B.R. Noack, J. Phys. Conf. Ser. 506, 012006 (2014)., atmospheric systems [11][11] D.A. Smirnov and I.I. Mokhov, Phys. Rev. E. 80, 016208, (2009)., solar indices [12][12] P.O. Amblard and O.J.J. Michel, Entropy 15, 113 (2013)., turbulence [12[12] P.O. Amblard and O.J.J. Michel, Entropy 15, 113 (2013)., 13[13] G. Tissot, A. Lozano–Durán, L. Cordier, J. Jiménez and B.R. Noack, J. Phys. Conf. Ser. 506, 012006 (2014).], inference of information flow in the brain of different animals [14[14] A. Brovelli, M. Ding, A. Ledberg, Y. Chen, R. Nakamura and S.L. Bressler, P. Natl. A. Sci. 101, 9849 (2004). [15] M. Ding, Y. Chen and S.L. Bressler, in: Handbook of Time Series Analysis: Recent Theoretical Developments and Applications, edited by B. Schelter, M. Winterhalder and J. Timmer (Wiley-VCH, Weinheim, 2006). [16] F.S. Matias, L.L. Gollo, P.V. Carelli, S.L Bressler, M. Copelli and C.R. Mirasso, Neuroimage 99, 411 (2014). [17] S. Hu, H. Wang, J. Zhang, W. Kong, Y. Cao and R. Kozma, IEEE T. Neur. Net. Lear. 27, 1429 (2015).-18[18] A.K. Seth, A.B. Barrett and L. Barnett, J. Neurosci. 35, 3293 (2015).], and inference of functional networks of the brain using fMRI [19[19] M. Havlicek, J. Jan, M. Brazdil and V.D. Calhoun, Neuroimage 53, 65 (2010)., 20[20] W. Liao, D. Mantini, Z. Zhang, Z. Pan, J. Ding, Q. Gong, Y. Yang and H. Chen, Biol. Cybern. 102, 57 (2010).], MEG [21][21] L. Pollonini, U. Patidar, N. Situ, R. Rezaie, A.C. Papanicolaou and G. Zouridakis, in: 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology (Buenos Aires, 2010)., and EEG [22][22] A.B. Barrett, M. Murphy, M.A. Bruno, Q. Noirhomme, M. Boly, S. Laureys and A.K. Seth, PLoS ONE 7, e29072 (2012).. It appears as an alternative to measures like linear correlations [23][23] M.E. Lynall, D.S. Bassett, R. Kerwin, P.J. McKenna, M. Kitzbichler, U. Muller and E. Bullmore, J. Neurosci. 30, 9477 (2010)., mutual information [24[24] S.H. Jin, P. Lin, S. Auh and M. Hallett, Movement Disord. 26, 1274 (2011)., 25[25] V. Lima, R.F.O. Pena, C.A.C. Ceballos, R.O. Shimoura and A.C. Roque, Rev. Bras. Ensino. Fis. 41, e20180197 (2019).], partial directed coherence [26][26] L.A. Baccalá and K. Sameshima, Biol. Cybern. 84, 463 (2001)., ordinary coherence [27][27] P. Fries, Trends Cogn. Sci. 9, 474 (2005)., directed transfer function [28][28] M.J. Kamiński and K.J. Blinowska, Biol. Cybern. 65, 203 (1991)., spectral coherence [29][29] D. La Rocca, P. Campisi, B. Vegso, P. Cserti, G. Kozmann, F. Babiloni and F. De Vico Fallani, IEEE T. Bio-Med. Eng. 61, 2406 (2014)., and transfer entropy [30[30] T. Schreiber, Phys. Rev. Lett. 85, 461 (2000)., 31[31] R. Vicente, M. Wibral, M. Lindner and G. Pipa, J. Comput. Neurosci. 30, 45 (2011).], being usually easier to calculate since it does not rely on the estimation of probability density functions of one or more variables.[10] J. Runge, S. Bathiany, E. Bollt, G. Camps-Valls, D. Coumou, E. Deyle, C. Glymour, M. Kretschmer, M.D. Mahecha, J. Muñoz-Marí et al., Nat. Commun. 10, 1 (2019).
The definition of GC involves the prediction of future values of stochastic time series (see Fig. 1). The measurement of the GC between variables may be done in both the time and the frequency domains [26[26] L.A. Baccalá and K. Sameshima, Biol. Cybern. 84, 463 (2001)., 32[32] J. Geweke, J. Am. Stat. Assoc. 77, 304 (1982). [33] J.F. Geweke, J. Am. Stat. Assoc. 79, 907 (1984).-34[34] M. Dhamala, G. Rangarajan and M. Ding, Phys. Rev. Lett. 100, 018701 (2008).].
When time series Granger-causes time series , the patterns in are approximately repeated in after some time lag (two examples are indicated with arrows). Thus, past values of X can be used for the prediction of future values of Y.
In the present work, we will focus on the frequency domain representation of the GC [26[26] L.A. Baccalá and K. Sameshima, Biol. Cybern. 84, 463 (2001)., 32[32] J. Geweke, J. Am. Stat. Assoc. 77, 304 (1982).] and, for pedagogical purposes, will discuss illustrative examples from previous works by other authors [26[26] L.A. Baccalá and K. Sameshima, Biol. Cybern. 84, 463 (2001)., 34[34] M. Dhamala, G. Rangarajan and M. Ding, Phys. Rev. Lett. 100, 018701 (2008).]. Our main goal is to provide a basic notion of the GC measure to a reader not yet introduced to this subject.
This work is organized as follows: in Section 2, we present the concept of an autoregressive process – a model of linear regression in which GC is based (it is also possible to formulate GC for nonlinear systems, however such a formulation results in a more complex analysis which is beyond the scope of this work [35[35] A. Seth, Scholarpedia. 2, 1667 (2007)., 36[36] D. Marinazzo, W. Liao, H. Chen and S. Stramaglia, Neuroimage 58, 330 (2011).]). Sections 3 and 4 are used to develop the mathematical concepts and definitions of the GC both in the time and frequency domains. In Section 5.1, we introduce the nonparametric method to estimate GC through Fourier and wavelet transforms [34][34] M. Dhamala, G. Rangarajan and M. Ding, Phys. Rev. Lett. 100, 018701 (2008).. In Section 6 we introduce examples of the conditional GC (cGC) to determine known links between the elements of a simple network. We then close the paper by discussing applications, implications and limitations of the method.
2. Autoregresive process
Autoregressive processes form the basis for the parametric estimation of the GC, so in this section we introduce the reader to the basic concepts of such processes [37][37] G.E.P. Box, G.M. Jenkins, G.C. Reinsel and G.M. Ljung, Time Series Analysis: Forecasting and Control (Wiley, Hoboken, 2015), 5ª ed.. A process is autoregressive of order n (i.e., ) if its state at time t is a function of its n past states:
where t is the integer time step, and the real coefficients indicate the weighted contribution from i steps in the past, to the current state t of X. The term is a noise source with variance that models any external additive contribution to the determination of . If is large, then the process is weakly dependent on its past states and may be regarded as just noise. Fig. 2 shows examples of an (a) and an (b).
Autoregressive processes. (a) time series of an process with coefficients . (b) time series of an process with coefficients .
Fitting the autoregressive coefficients and the noise variance , for a recorded signal, is usually done by solving a Yule-Walker set of equations [15[15] M. Ding, Y. Chen and S.L. Bressler, in: Handbook of Time Series Analysis: Recent Theoretical Developments and Applications, edited by B. Schelter, M. Winterhalder and J. Timmer (Wiley-VCH, Weinheim, 2006)., 38[38] G. Eshel, Internet resource 2, 68 (2003).]. For a brief review on this topic see the Section A of the Appendix.
3. Granger causality in time domain
In this section we develop the mathematical concepts and definitions of GC in time domain. Consider two stochastic signals, and . We assume that these signals may be modeled by autoregressive stochastic processes of order n, independent of each other, such that their states in time t could be estimated by their n past values:
where the variances of and are, respectively, and , and the coefficients and are adjusted in order to minimize and .
However, we may also assume that the signals and are each modeled by a combination of one another, yielding
where the covariance matrix is given by
Here, , are the variances of and respectively, and is the covariance of and . Again, the coefficients , , and are adjusted to minimize the variances and .
If , then the addition of to generated a better fit to , and thus enhanced its predictability. In this sense, we may say there is a causal relation from to , or simply that Granger-causes . The same applies for the other signal: if , then Granger-causes because adding to the dynamics of enhanced its predictability.
We may summarize this concept into the definition of the total causality index, given by
If , there is some Granger-causal relation between and , because either or , otherwise there is correlation between and due to . If neither Granger-causality nor correlations are present, then .
To know specifically whether there is Granger causality from 1 to 2 or from 2 to 1, we may use the specific indices:
such that
where defines the causality from to , is the causality from to , and is called instantaneous causality due to correlations between and . Just as for the total causality case, these specific indices are greater than zero if there is Granger causality, or zero otherwise.
4. Granger causality in frequency domain
In order to derive the GC in frequency domain, we first define the lag operator , such that
delays by k time steps, yielding . We may then rewrite equations (4) and (5) as:
and rearrange their terms to collect and :
We define the coefficients , , and , and rewrite equations (15) and (16) into matrix form:
where and .
We apply the Fourier transform to equation (17) in order to switch to the frequency domain,
where is the frequency and is the coefficient matrix whose elements are given by
The expressions above are obtained by representing the lag operator in the spectral domain as . This derives from the z-transform, where the representation of the z variable3 3 The lag operator L is similar to the z-transform. However, z is treated as a variable, and is often used in signal processing, while L is an operator [39]. in the unit circle () is [40[40] R. Takalo, H. Hytti and H. Ihalainen, J. Clin. Monit. Comput. 19, 401 (2005).,41[41] R. Meddins, Introduction to Digital Signal Processing (Newnes, Oxford, 2000), 1ª ed.].
To obtain the power spectra of and , we first isolate in equation (18):
where is called the transfer matrix, resulting in the following spectra:
where is the ensemble average, the transposed conjugate of the matrix, and is the spectral matrix defined as:
In equation (21), and are called the autospectra, and the elements and are called the cross-spectra.
We can expand the product in equation (20) to obtain and (see Section B of the Appendix for details) as:
where the symbols and are used to differentiate the terms below from the variables , , and , as follows:
Once we have the and spectra as the sum of an intrinsic and a causal term, we may define indices to quantify GC in frequency domain just as we did in the time domain (Section 3). For instance, to calculate the causal index, we divide the spectra by their respective intrinsic term in order to eliminate its influence. Thus, the causality index is defined as:
and analogously, ,
The instantaneous causality index is defined as:
In equations (24) to (26), we have one index for each value of the frequency. Conversely, in the time domain there was a single index for the GC between the two signals and . Just as discussed in Section 3, the indices , and are greater than zero if there is any relation between the time series. They are zero otherwise.
Just like in the time domain, the total GC in the frequency domain is the sum of its individual components:
The total GC is related to the so-called coherence between signals (see Section C of the Appendix):
Moreover, we recover the GC in time domain through [15[15] M. Ding, Y. Chen and S.L. Bressler, in: Handbook of Time Series Analysis: Recent Theoretical Developments and Applications, edited by B. Schelter, M. Winterhalder and J. Timmer (Wiley-VCH, Weinheim, 2006)., 34[34] M. Dhamala, G. Rangarajan and M. Ding, Phys. Rev. Lett. 100, 018701 (2008).]:
5. Estimating Granger causality from data
In the last two sections we have mathematically defined the GC in both time and frequency domains. Here, we discuss how to calculate GC. In Section 5.1, we address a non-parametric estimation method that involves computing the Fourier and wavelet transforms of , and [42[42] I. Daubechies, Ten Lectures on Wavelets (Society for Industrial and Applied Mathematics, Philadelphia, 1992). [43] C. Torrence and G.P. Compo, Bull. Amer. Meteor. Soc. 79, 61 (1998).-44[44] P. Stoica and R.L. Moses, Spectral Analysis of Signals (Pearson Prentice Hall, Upper Saddle River, 2005).]. In Section D of the Appendix, we address the parametric estimation of GC, which involves fitting the signals , and to auto-regressive models (Section 2).
5.1. Calculating GC through Fourier and Wavelet Transforms
Here, we will give a numerical example for calculating and interpreting GC using a nonparametric estimation approach based on Fourier and wavelet transforms [34][34] M. Dhamala, G. Rangarajan and M. Ding, Phys. Rev. Lett. 100, 018701 (2008).. Our example consists of calculating the spectral matrix through the Fourier transform of the signals. For two stationary4 4 Stationarity, by definition, refers to time shift invariance of the underlying process statistics, which implies that all its statistical moments are constant over time [45]. There are several types of stationarity. Here, the required stationarity conditions for defining power spectral densities are constant means and that the covariance between any two variables apart by a given time lag is constant regardless of their absolute position in time. signals and , the element of the spectral matrix in equation (21) is
where T is the total duration of the signal, is the discrete Fourier transform of (calculated by a fast fourier transform, FFT, algorithm) and is its complex conjugate.
The variable contains the values of the frequency in the interval corresponding to where the FFT was calculated. If is the sampling time interval of the original signals, then the sampling frequency is and , whereas the frequency interval contains points. Then, for m signals ( in our example), we have a total of spectral matrices of dimensions . Recall that the diagonal elements of are called the autospectra, whereas the other elements are called the cross-spectra.
The transfer matrix, and the covariance matrix are given by the decomposition of into the product of equation (20). The Wilson algorithm [34[34] M. Dhamala, G. Rangarajan and M. Ding, Phys. Rev. Lett. 100, 018701 (2008)., 46[46] G.T. Wilson, SIAM J. Appl. Math. 23, 420 (1972)., 47[47] G.T. Wilson, J. Multivariate Anal. 8, 222 (1978).] (see also Section E of the Appendix) may be used for the decomposition of spectral matrices.
After determining these two matrices, we may calculate the GC indices through the direct application of equations (24) to (26).
For example, consider the autoregressive system studied in Ref. [34][34] M. Dhamala, G. Rangarajan and M. Ding, Phys. Rev. Lett. 100, 018701 (2008)., which is given by:
Here, and are . The variable t is the time step index, such that the actual time is . Besides, we know by construction that influences through the coupling constant C (although the opposite does not happen). The terms and are defined to have variance and covariance (they are independent random processes). To obtain a smooth power spectrum, we simulated 5000 trials of the system in equation (31) and computed the average spectra across trials. We set the parameters as , Hz and s, resulting in 5000 data points.
When , , both processes are independent, and oscillate mainly in 40 Hz (Fig. 3a). For , the process receives input from , generating a causal coupling that is captured by the GC index in equations (24) and (25): a peak in 40 Hz in indicates that process 2 (which oscillates in 40 Hz) is entering process 1 in this very frequency (Fig. 3c). The flat spectrum of indicates that, on the other hand, process 2 does not receive input from 1. The absolute value of C changes the intensity of the GC peak. The instant causality index, from equation (26), because for all . The total GC in the system is obtained from the spectral coherence, equation (28). However, only the specific GC index reveal the directionality of the inputs between 1 and 2.
GC of a pair of autoregressive processes. GC for the system given in equation (31): by construction, the process 2 causes 1 by providing it input through the coupling constant . Parameters: total time s and sampling frequency Hz, resulting in 5000 time steps. a. Spectral matrix components calculated via equation (30). b. Coherence between signals 1 and 2, equation. (28). c. GC from 2 to 1 and 1 to 2, equation. (24) and (25): a peak in 40 Hz in the GC index indicates that 2 Granger-causes 1, whereas the flat zero shows, as expected, that 1 does not influence 2. The peak is in 40 Hz because process 2 has its main power in this frequency (see panel a).
This simple example illustrates the meaning of causality in the GC framework: a Granger causal link is present if a process runs under the temporal influence of the past of another signal. We could have assumed C as a time-varying function, , or even different parameters for the autoregressive part of each process alone; or the processes 1 and 2 could have been of different orders, implying in complex individual power spectra. These scenarios are more usual for any real world application [43[43] C. Torrence and G.P. Compo, Bull. Amer. Meteor. Soc. 79, 61 (1998)., 48[48] P. Holme and J. Saramäki, Phys. Rep. 519, 97 (2012).]. Then, instead of observing a clear peak for the GC indices, we could observe a more complex pattern with peaks that vary in time.
Instead of using the Fourier transform (which yields a single static spectrum for the whole signal), we may use the Wavelet transform [43[43] C. Torrence and G.P. Compo, Bull. Amer. Meteor. Soc. 79, 61 (1998)., 49[49] A.D. Poularikas, Transforms and Applications Handbook (CRC press, Florida, 2010)., 50[50] P.S. Addison, The Illustrated Wavelet Transform Handbook: Introductory Theory and Applications in Science, Engineering, Medicine and Finance (CRC press, Florida, 2017).] to yield time-varying spectra [51[51] M.J.A. Bolzan, Rev. Bras. Ensino. Fis. 28, 563 (2006)., 52[52] M.O. Domingues, O. Mendes, M.K. Kaibara, V.E. Menconi and E. Bernardes, Rev. Bras. Ensino. Fis. 58, 228 (2016).]. Then, the auto and cross-spectra from equation (30) may be written as
where is the Wavelet transform of and is its complex conjugate. To compute the Wavelet transform, we use a Morlet kernel [43][43] C. Torrence and G.P. Compo, Bull. Amer. Meteor. Soc. 79, 61 (1998)., with scale oscillation cycles within a wavelet – a typical value for this parameter [53][53] M. Farge, Annual review of fluid mechanics 24, 395 (1992).. Similarly to what we did to the power spectrum, we measure the wavelet transforms for 5000 trials of the system in equation (31) in order to average the results. It is important to stress that a wavelet transform is applicable in this case because ensemble averages are being taken. Otherwise, estimates would be too unreliable for any meaningful inference.
In practice, there is one matrix for each pair ; or more intuitively, we have matrices , each one for a given time step t. The decomposition of in equation (20) is done through Wilson's algorithm. Then, we may calculate GC's indices via equations (24) to (26) for each of the matrices with fixed t. This calculation results in , and for each time step t. Finally, we concatenate these spectra across the temporal dimension, yielding , and .
For example, consider the same set of processes in equation (31), but with time-varying , where for (zero otherwise) is the Heaviside step function. The parameter is the time step index in which the coupling from 2 to 1 is turned off. This scenario is equivalent to having a set of concatenated constant C processes, such that the processes with have . Then, we expect the analysis in Fig. 3 to be valid for all the time steps , and no coupling should be detected whatsoever for .
That is exactly what is shown in Fig. 4: a sharp transition in the happens exactly at when C is turned off. The index remains zero for all the simulation. Again, this illustrates the meaning of GC in our system: whenever there is a directional coupling from a variable to another, there is nonzero GC in that link, in the example from signal to .
Time-varying GC in the frequency domain. GC of the system defined in equation (31), but with time-varying . The spectral matrix is calculated via a Wavelet transform, equation (32), and decomposed for each time step t, yielding a temporal decomposition of the frequencies of the signals. a. Coupling constant as function of time. b. GC index from 2 to 1, . c. GC index from 1 to 2, .
6. Conditional Granger Causality
The concepts developed so far may be applied to a case with m variables. In this case, in order to try and infer the directionality5 5 Whether i Granger-causes j or vice-versa. of the interaction between two signals, in a system with m signals, we may use the so-called conditional Granger causality (cGC) [15[15] M. Ding, Y. Chen and S.L. Bressler, in: Handbook of Time Series Analysis: Recent Theoretical Developments and Applications, edited by B. Schelter, M. Winterhalder and J. Timmer (Wiley-VCH, Weinheim, 2006).,33[33] J.F. Geweke, J. Am. Stat. Assoc. 79, 907 (1984).,54[54] Y. Chen, S.L. Bressler and M. Ding, J. Neurosci. Meth. 150, 228 (2006).,55[55] S. Malekpour and W.A. Sethares, Biol. Cybern. 109, 627 (2015).]. The idea is to infer the GC between signals i and j given the knowledge of all the other signals of the system. This is done by comparing the variances obtained considering only i and j to the variances obtained considering all the other signals in the system. The model from Eqs (4) and (5) ends up having a total of m variables.
We may write the cGC in time domain as
or in the frequency domain as
But one may ask: “isn't it simpler to just calculate the standard GC between every pair of signals in the system, always reducing the problem to a two-variable case?”
To answer that question, consider the case depicted in Fig. 5a: node 1 () sends input to node 2 () with a delay and sends input to node 3 () with a delay . Measuring the pairwise GC between and suggests the existence of a coupling between them even if it does not physically exist (as in Fig. 5b). This occurs because signals and are correlated due to their common input from , and the simple pairwise GC between and fails to represent the correct relationship between the three nodes of Fig. 5a. The cGC solves this issue by considering the contribution of a third signal ( on this example) onto the analyzed pair ( and ), as described below.
A system that GC fails to describe. a. Node 1 () sends input to node 2 () with delay and to node 3 () with delay . b. A simple GC calculation wrongly infer a link from to if , or from to if . These links are not physically present in the system and appear only due to the cross-correlation between and caused by the common input .
To describe the system in (Fig. 5), equation (19) may be written as
where has the noise term with variance . The corresponding spectral matrix is
and the noise covariance matrix is
We want to calculate the cGC from to given , i.e. in the time domain and in the frequency domain. The first step is to build a partial system from equation (35) ignoring the coefficients related to the probe signal , resulting in the partial spectral matrix :
From this partial system, we can calculate and using the nonparametric methods already discussed above. Suppose that for , we obtain the transfer matrix and the covariance matrix (equation (37)), whereas for we obtain the transfer matrix and the covariance matrix :
The matrices and are . The matrices and are always one dimension less than the original ones, because they are built from the leftover rows and columns of the original system without the coefficients of the probe signal.
In the time domain, is defined as
or, in general,
which is used to calculate the cGC from i to j given k, in time domain. Note that if the link between i and j is totally mediated by k, , yielding . However, the standard GC between i and j would result in a link between these variables. For our example in Fig. 5, we obtain , meaning that the influence of to is conditioned on signal , and hence is almost null.
In the frequency domain, we first must define the transfer matrix . However, the dimensions of matrix do not match the dimensions of matrix . To fix that, we add rows and columns from an identity matrix to the rows and columns that were removed from the total system in equation (35) when we built the partial system (i.e. we add the identity rows and columns to the rows and columns corresponding to signal
that was
removed
for generating ), such that:
We can now safely calculate , from where we obtain :
or, in general,
To illustrate the procedures for determining cGC, consider the system defined in Fig. 6 composed of 5 interacting elements [26][26] L.A. Baccalá and K. Sameshima, Biol. Cybern. 84, 463 (2001). 6 5 Baccalá and Sameshima [26] analysed this system using partial directed coherence and directed transfer function. :
Many interacting components system. Illustration of a system with 5 interacting signals, having physical relations between them. We want to check whether GC or cGC is capable of capturing these interactions.
Here, sends its signal to , and with coupling intensities , and , respectively. Also, sends input to and vice-versa with couplings and respectively. Note that sends signals to and with 2 time steps of delay, and to with 3 time steps of delay. and exchange signals with only 1 time step of delay.
Calculating the cGC index through equations (41) and (44), we recover the expected structure of the network (Figs. 7a and 8, respectively). The gray shades in Fig. 7 and the amplitude of the peaks in Fig. 8 are proportional to the coupling constants between each pair of elements. For comparison, Fig. 7b shows the simple pairwise GC, which detects connections that are not physically present in the system. Again, this occurs because the hierarchy of the network generates correlations between many pairs of signals that are not directly connected, as discussed in the example of Fig. 5.
cGC and GC matrices in the time domain for a system of many interacting components. The system is given by equation (45) and is depicted in Fig. 6. a. Conditional GC, equation (41), captures exactly the physical interactions of the system. b. Simple pairwise GC, equation (29), captures the interactions, but also captures underlying correlations coming from the hierarchy of the network. Real systems often do not have a clear cGC matrix as given in panel a due to second order effects. The covariance matrices and were calculated using the nonparametric method.
It is important to note that the cGC connectivity not always reflects the underlying physical (or structural) connectivity between elements [35][35] A. Seth, Scholarpedia. 2, 1667 (2007).. The example system in Fig. 6 is an illustrative simple case in which we obtained a neat result. However, real-world applications, such as inferring neuronal connectivity from brain signals, result in a cGC matrix that is more noisy due to multiple incoming signals and multiple delays. Thus, cGC is most generally referred to as giving “functional” connectivity, instead of structural connectivity.
7. Conclusion
Granger causality is becoming increasingly popular as a method to determine the dependence between signals in many areas of science. We presented its mathematical formulation and showed examples of its applications in general systems of interacting signals. This article also gives a contemporary scientific application of the Fourier transform – a subject that is studied in theoretical physics courses, but usually lacks practical applications in the classroom. We also used wavelet transforms, which may motivate students to learn more about the decomposition of signals in time and frequency domain, and its limitations through the uncertainty principle.
We showed numerical examples, and explained them in an algorithmic way. We included the inference of steady and time-varying coupling, and the inference of connectivity in hierarchical networks via the conditional GC. A limitation of the GC is that it is ultimately based on linear regression models of stochastic processes (the AR models introduced in Section 2). Other measures, such as the transfer entropy, are more suitable to describe nonlinear interactions, and do not need to be fitted to an underlying model. Even though the nonparametric estimation of GC does not rely on fitting, it is still a measure of linear interaction. It is also possible to show that both GC and transfer entropy yield the same results for Gaussian variables [56][56] L. Barnett, A.B. Barrett and A.K. Seth, Phys. Rev. Lett. 103, 238701 (2009)..
In spite of the existing debate about what exactly the GC captures, specially in the neuroscience community [57[57] P.A. Stokes and P.L. Purdon, P. Natl. A. Sci. 114, E7063 (2017)., 58[58] L. Barnett, A.B. Barrett and A.K. Seth, Neuroimage 178, 744 (2018).], GC has become a well-established measurement for the flux of information in the nervous system [59][59] S.L. Bressler and A.K. Seth, Neuroimage 58, 323 (2011).. And here, we hope to have provided the necessary tools to those who wish to learn the basic principles and applications underlying GC.
Acknowledgments
This article was produced as part of the S. Paulo Research Foundation (FAPESP) Research, Innovation and Dissemination Center for Neuromathematics (CEPID NeuroMat, Grant No. 2013/07699-0). The authors also thank FAPESP support through Grants No. 2013/25667-8 (R.F.O.P.), 2015/50122-0 (A.C.R.), 2016/03855-5 (N.L.K.), 2017/07688-9 (R.O.S), 2018/20277-0 (A.C.R.) and 2018/09150-9 (M.G.-S.). V.L. is supported by a CAPES PhD scholarship. A.C.R. thanks financial support from the National Council of Scientific and Technological Development (CNPq), Grant No. 306251/2014-0. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.
Supplementary material
The following online material is available for this article:
Apendix A Apendix B Apendix C Apendix D Apendix EAll codes were developed in Python and are available in: https://github.com/ViniciusLima94/pyGC.
References
- [1] A. Falcon, in: The Stanford Encyclopedia of Philosophy, edited by E.N. Zalta (Stanford University, Palo Alto, 2019).
- [2] C.W.J Granger, Econometrica 37, 424 (1969).
- [3] N. Wiener, in: Modern Mathematics for Engineers edited by E.F. Beckenbach (McGraw-Hill, New York, 1956), v. 1.
- [4] J. Pearl, Biometrika 82, 669 (1995).
- [5] J.Y Halpern and J. Pearl, Br. J. Philos. Sci. 56, 843 (2005).
- [6] J. Halpern, in: 24th International Joint Conference on Artificial Intelligence (Buenos Aires, 2015).
- [7] T. Vỳrost, Š. Lyócsa and E. Baumöhl, Physica A. 427, 262 (2015).
- [8] B. Candelon and S. Tokpavi, J. Bus. Econ. Stat. 34, 240 (2016).
- [9] H. Ding, H. Kim and S.Y. Park, Energ. Econ. 59, 58 (2016).
- [10] J. Runge, S. Bathiany, E. Bollt, G. Camps-Valls, D. Coumou, E. Deyle, C. Glymour, M. Kretschmer, M.D. Mahecha, J. Muñoz-Marí et al., Nat. Commun. 10, 1 (2019).
- [11] D.A. Smirnov and I.I. Mokhov, Phys. Rev. E. 80, 016208, (2009).
- [12] P.O. Amblard and O.J.J. Michel, Entropy 15, 113 (2013).
- [13] G. Tissot, A. Lozano–Durán, L. Cordier, J. Jiménez and B.R. Noack, J. Phys. Conf. Ser. 506, 012006 (2014).
- [14] A. Brovelli, M. Ding, A. Ledberg, Y. Chen, R. Nakamura and S.L. Bressler, P. Natl. A. Sci. 101, 9849 (2004).
- [15] M. Ding, Y. Chen and S.L. Bressler, in: Handbook of Time Series Analysis: Recent Theoretical Developments and Applications, edited by B. Schelter, M. Winterhalder and J. Timmer (Wiley-VCH, Weinheim, 2006).
- [16] F.S. Matias, L.L. Gollo, P.V. Carelli, S.L Bressler, M. Copelli and C.R. Mirasso, Neuroimage 99, 411 (2014).
- [17] S. Hu, H. Wang, J. Zhang, W. Kong, Y. Cao and R. Kozma, IEEE T. Neur. Net. Lear. 27, 1429 (2015).
- [18] A.K. Seth, A.B. Barrett and L. Barnett, J. Neurosci. 35, 3293 (2015).
- [19] M. Havlicek, J. Jan, M. Brazdil and V.D. Calhoun, Neuroimage 53, 65 (2010).
- [20] W. Liao, D. Mantini, Z. Zhang, Z. Pan, J. Ding, Q. Gong, Y. Yang and H. Chen, Biol. Cybern. 102, 57 (2010).
- [21] L. Pollonini, U. Patidar, N. Situ, R. Rezaie, A.C. Papanicolaou and G. Zouridakis, in: 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology (Buenos Aires, 2010).
- [22] A.B. Barrett, M. Murphy, M.A. Bruno, Q. Noirhomme, M. Boly, S. Laureys and A.K. Seth, PLoS ONE 7, e29072 (2012).
- [23] M.E. Lynall, D.S. Bassett, R. Kerwin, P.J. McKenna, M. Kitzbichler, U. Muller and E. Bullmore, J. Neurosci. 30, 9477 (2010).
- [24] S.H. Jin, P. Lin, S. Auh and M. Hallett, Movement Disord. 26, 1274 (2011).
- [25] V. Lima, R.F.O. Pena, C.A.C. Ceballos, R.O. Shimoura and A.C. Roque, Rev. Bras. Ensino. Fis. 41, e20180197 (2019).
- [26] L.A. Baccalá and K. Sameshima, Biol. Cybern. 84, 463 (2001).
- [27] P. Fries, Trends Cogn. Sci. 9, 474 (2005).
- [28] M.J. Kamiński and K.J. Blinowska, Biol. Cybern. 65, 203 (1991).
- [29] D. La Rocca, P. Campisi, B. Vegso, P. Cserti, G. Kozmann, F. Babiloni and F. De Vico Fallani, IEEE T. Bio-Med. Eng. 61, 2406 (2014).
- [30] T. Schreiber, Phys. Rev. Lett. 85, 461 (2000).
- [31] R. Vicente, M. Wibral, M. Lindner and G. Pipa, J. Comput. Neurosci. 30, 45 (2011).
- [32] J. Geweke, J. Am. Stat. Assoc. 77, 304 (1982).
- [33] J.F. Geweke, J. Am. Stat. Assoc. 79, 907 (1984).
- [34] M. Dhamala, G. Rangarajan and M. Ding, Phys. Rev. Lett. 100, 018701 (2008).
- [35] A. Seth, Scholarpedia. 2, 1667 (2007).
- [36] D. Marinazzo, W. Liao, H. Chen and S. Stramaglia, Neuroimage 58, 330 (2011).
- [37] G.E.P. Box, G.M. Jenkins, G.C. Reinsel and G.M. Ljung, Time Series Analysis: Forecasting and Control (Wiley, Hoboken, 2015), 5ª ed.
- [38] G. Eshel, Internet resource 2, 68 (2003).
- [39] W. Van Drongelen, Signal Processing for Neuroscientists, (Academic Press, London, 2018), 2ª ed.
- [40] R. Takalo, H. Hytti and H. Ihalainen, J. Clin. Monit. Comput. 19, 401 (2005).
- [41] R. Meddins, Introduction to Digital Signal Processing (Newnes, Oxford, 2000), 1ª ed.
- [42] I. Daubechies, Ten Lectures on Wavelets (Society for Industrial and Applied Mathematics, Philadelphia, 1992).
- [43] C. Torrence and G.P. Compo, Bull. Amer. Meteor. Soc. 79, 61 (1998).
- [44] P. Stoica and R.L. Moses, Spectral Analysis of Signals (Pearson Prentice Hall, Upper Saddle River, 2005).
- [45] W.A. Woyczyński, A First Course in Statistics for Signal Analysis (Birkhäser, Boston, 2019), 3ª ed.
- [46] G.T. Wilson, SIAM J. Appl. Math. 23, 420 (1972).
- [47] G.T. Wilson, J. Multivariate Anal. 8, 222 (1978).
- [48] P. Holme and J. Saramäki, Phys. Rep. 519, 97 (2012).
- [49] A.D. Poularikas, Transforms and Applications Handbook (CRC press, Florida, 2010).
- [50] P.S. Addison, The Illustrated Wavelet Transform Handbook: Introductory Theory and Applications in Science, Engineering, Medicine and Finance (CRC press, Florida, 2017).
- [51] M.J.A. Bolzan, Rev. Bras. Ensino. Fis. 28, 563 (2006).
- [52] M.O. Domingues, O. Mendes, M.K. Kaibara, V.E. Menconi and E. Bernardes, Rev. Bras. Ensino. Fis. 58, 228 (2016).
- [53] M. Farge, Annual review of fluid mechanics 24, 395 (1992).
- [54] Y. Chen, S.L. Bressler and M. Ding, J. Neurosci. Meth. 150, 228 (2006).
- [55] S. Malekpour and W.A. Sethares, Biol. Cybern. 109, 627 (2015).
- [56] L. Barnett, A.B. Barrett and A.K. Seth, Phys. Rev. Lett. 103, 238701 (2009).
- [57] P.A. Stokes and P.L. Purdon, P. Natl. A. Sci. 114, E7063 (2017).
- [58] L. Barnett, A.B. Barrett and A.K. Seth, Neuroimage 178, 744 (2018).
- [59] S.L. Bressler and A.K. Seth, Neuroimage 58, 323 (2011).
-
1
It is also referred as Wiener-Granger causality.
-
2
Other notions of causality have been defined, one worth mentioning is Pearl's causality [4][4] J. Pearl, Biometrika 82, 669 (1995).. Over the years, Pearl's causality has been revised by him and colleagues in a series of published works [5[5] J.Y Halpern and J. Pearl, Br. J. Philos. Sci. 56, 843 (2005).,6[6] J. Halpern, in: 24th International Joint Conference on Artificial Intelligence (Buenos Aires, 2015).].
-
3
The lag operator L is similar to the z-transform. However, z is treated as a variable, and is often used in signal processing, while L is an operator [39][39] W. Van Drongelen, Signal Processing for Neuroscientists, (Academic Press, London, 2018), 2ª ed..
-
4
Stationarity, by definition, refers to time shift invariance of the underlying process statistics, which implies that all its statistical moments are constant over time [45][45] W.A. Woyczyński, A First Course in Statistics for Signal Analysis (Birkhäser, Boston, 2019), 3ª ed.. There are several types of stationarity. Here, the required stationarity conditions for defining power spectral densities are constant means and that the covariance between any two variables apart by a given time lag is constant regardless of their absolute position in time.
-
5
Whether i Granger-causes j or vice-versa.
-
5
Baccalá and Sameshima [26][26] L.A. Baccalá and K. Sameshima, Biol. Cybern. 84, 463 (2001). analysed this system using partial directed coherence and directed transfer function.
Publication Dates
-
Publication in this collection
14 Sept 2020 -
Date of issue
2020
History
-
Received
03 Jan 2020 -
Reviewed
25 June 2020 -
Accepted
27 July 2020