`muti`

- An R package for computing mutual information**Mark D. Scheuerell**

*Fish Ecology Division, Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, Seattle, WA USA, mark.scheuerell@noaa.gov*

`muti`

is an `R`

package that computes the mutual information \((\mathrm{MI})\) between two discrete random variables. `muti`

was developed with time series analysis in mind, but there is nothing tying the methods to a time index *per se*.

Mutual Information estimates the amount of information about one variable contained in another; it can be thought of as a nonparametric measure of the covariance between the two variables. \(\mathrm{MI}\) is a function of entropy, which is the expected amount of information contained in a variable. The entropy of \(X\), \(\mathrm{H}(X)\), given its probability mass function, \(p(X)\), is

\[ \begin{align} \mathrm{H}(X) &= \mathrm{E}[-\log(p(X))]\\ &= -\sum_{i=1}^{L} p(x_i) \log_bp(x_i), \end{align} \]

where \(L\) is the length of the time series and \(b\) is the base of the logarithm. `muti`

uses base-2 logarithms for calculating the entropies, so \(\mathrm{MI}\) measures information content in units of “bits”. In cases where \(p(x_i) = 0\), then \(\mathrm{H}(X) = 0\).

The joint entropy of \(X\) and \(Y\) is

\[ \mathrm{H}(X,Y) = -\sum_{i=1}^{L} p(x_i,y_i) \log_b p(x_i,y_i). \]

where \(p(x_i,y_i)\) is the probability that \(X = x_i\) and \(Y = y_j\). The mutual information between \(X\) and \(Y\) is then

\[ \mathrm{MI}(X;Y) = \mathrm{H}(X) + \mathrm{H}(Y) - \mathrm{H}(X,Y). \]

One can normalize \(\mathrm{MI}\) to the interval [0,1] as

\[ \mathrm{MI}^*(X;Y) = \frac{\mathrm{MI}(X;Y)}{\sqrt{\mathrm{H}(X)\mathrm{H}(Y)}}. \]

`muti`

**Input**. At a minimum `muti`

requires two vectors of class `numeric`

or `integer`

. See `?muti`

for all of the other function arguments.

**Output**. The output of `muti`

is a data frame with the \(\mathrm{MI}\) `MI_xy`

and respective significance threshold value `MI_tv`

at different lags. Note that a negative (positive) lag means *X* leads (trails) *Y*. For example, if `length(x) == length(y) == TT`

, then the \(\mathrm{MI}\) in `x`

and `y`

at a lag of -1 would be based on `x[1:(TT-1)]`

and `y[2:TT]`

.

Additionally, `muti`

produces a 3-panel plot of

- the original data (top);
- their symbolic or discretized form (middle);
- \(\mathrm{MI}\) values (solid line) and their associated threshold values (dashed line) at different lags (bottom).

The significance thresholds are based on a bootstrap of the original data. That process is relatively slow, so please be patient if asking for more than the default `mc=100`

samples.

`muti`

computes \(\mathrm{MI}\) based on 1 of 2 possible discretizations of the data in a vector `x`

:

**Symbolic**. (Default) For`1 < i < length(x)`

,`x[i]`

is translated into 1 of 5 symbolic representations based on its value relative to`x[i-1]`

and`x[i+1]`

: “peak”, “decreasing”, “same”, “trough”, or “increasing”. For example, the symbolic translation of the vector`c(1.1,2.1,3.3,1.2,3.1)`

would be`c("increasing","peak","trough")`

. For additional details, see Cazelles (2004).**Binned**. Each datum is placed into 1 of`n`

equally spaced bins as in a histogram. If the number of bins is not specified, then it is calculated according to Rice’s Rule where`n = ceiling(2*length(x)^(1/3))`

.

You can install the development version using `devtools`

.

```
if(!require("devtools")) {
install.packages("devtools")
library("devtools")
}
devtools::install_github("mdscheuerell/muti")
```

Here’s an example with significant information between two numeric vectors. Notice that none of the symbolic values are the “same”.

```
set.seed(123)
TT <- 30
x1 <- rnorm(TT)
y1 <- x1 + rnorm(TT)
muti(x1, y1)
```