Dynamic Factor Analysis

11 Feb 2019

Topics

Dynamic Factor Analysis (DFA)

Forms of covariance matrix
Constraints for model fitting
Interpretation of results

Simple model for 2+ time series

Random walk observed with error

\[ y_{i,t} = x_{i,t} + a_i + v_{i,t} \\ x_{i,t} = x_{i,t-1} + w_{i,t} \]

with

\(v_{i,t} \sim \text{N}(0, R)\)

\(w_{i,t} \sim \text{N}(0, Q)\)

Random walk observed with error

\[ y_{1,t} = x_{1,t} + a_1 + v_{1,t} \\ y_{2,t} = x_{2,t} + a_2 + v_{2,t} \\ \vdots \\ y_{n,t} = x_{n,t} + a_2 + v_{n,t} \\ \]

\[ x_{1,t} = x_{1,t-1} + w_{1,t} \\ x_{2,t} = x_{2,t-1} + w_{2,t} \\ \vdots \\ x_{n,t} = x_{n,t-1} + w_{n,t} \]

Random walk observed with error

In matrix form

\[ \mathbf{y}_t = \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]

with

\(\mathbf{v}_t \sim \text{MVN}(\mathbf{0}, \mathbf{R})\)

\(\mathbf{w}_t \sim \text{MVN}(\mathbf{0}, \mathbf{Q})\)

Environmental time series

We often observe covariance among environmental time series, especially for those close to one another

Are there some common patterns here?

Common patterns in time series

State-space model

Ex: population structure

\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]

State-space model

Ex: Harbor seal population structure

\[ \begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5 \end{bmatrix}_t = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ \end{bmatrix} \times \begin{bmatrix} x_{JF} \\ x_N \\ x_S \end{bmatrix}_t + \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ a_4 \\ a_5 \end{bmatrix} + \begin{bmatrix} v_1 \\ v_2 \\ v_3 \\ v_4 \\ v_5 \end{bmatrix}_t \]

\[ \begin{bmatrix} x_{JF} \\ x_N \\ x_S \end{bmatrix}_t = \begin{bmatrix} x_{JF} \\ x_N \\ x_S \end{bmatrix}_{t-1} + \begin{bmatrix} w_{JF} \\ w_N \\ w_S \end{bmatrix}_t \]

Finding common patterns

What if our observations were instead a mixture of 2+ states?

For example, we sampled haul-outs located between several breeding sites

Mixtures of states

\[ \begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5 \end{bmatrix}_t = \begin{bmatrix} 0.8 & 0.2 & 0 \\ 0.2 & 0.7 & 0.1 \\ 0 & 0.9 & 0.1 \\ 0 & 0.3 & 0.7 \\ 0 & 0.1 & 0.9 \\ \end{bmatrix} \times \begin{bmatrix} x_{JF} \\ x_N \\ x_S \end{bmatrix}_t + \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ a_4 \\ a_5 \end{bmatrix} + \begin{bmatrix} v_1 \\ v_2 \\ v_3 \\ v_4 \\ v_5 \end{bmatrix}_t \]

\[ \begin{bmatrix} x_{JF} \\ x_N \\ x_S \end{bmatrix}_t = \begin{bmatrix} x_{JF} \\ x_N \\ x_S \end{bmatrix}_{t-1} + \begin{bmatrix} w_{JF} \\ w_N \\ w_S \end{bmatrix}_t \]

Finding common patterns

What if our observations were a mixture of states, but we didn't know how many or the weightings?

\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]

What are the dimensions of \(\mathbf{Z}\)?

What are the elements within \(\mathbf{Z}\)?

Dynamic Factor Analysis (DFA)

DFA is a dimension reduction technique, which models \(n\) observed time series as a function of \(m\) hidden states (patterns), where \(n \gg m\)

Dynamic Factor Analysis (DFA)

State-space form

\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]

data: \(\mathbf{y}_t\) is \(n \times 1\)

loadings: \(\mathbf{Z}\) is \(n \times m\) with \(n > m\)

states: \(\mathbf{x}_t\) is \(m \times 1\)

Dimension reduction

Principal Components Analysis (PCA)

Goal is to reduce some large number of correlated variates into a few uncorrelated factors

Principal Components Analysis (PCA)

Calculating the principal components requires us to estimate the covariance of the data

\[ \text{PC} = \text{eigenvectors}(\text{cov}(\mathbf{y})) \]

There will be \(n\) principal components (eigenvectors) for an \(n \times T\) matrix \(\mathbf{y}\)

We reduce the dimension by selecting a subset of the components that explain much of the variance (eg, the first 2)

Principal Components Analysis (PCA)

Relationship between PCA & DFA

We need to estimate the covariance in the data \(\mathbf{y}\)

\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t, ~ \text{with} ~ \mathbf{v}_t \sim \text{MVN}(\mathbf{0}, \mathbf{R}) \]

\[ \text{cov}(\mathbf{y}_t) = \mathbf{Z} \text{cov}(\mathbf{x}_t) \mathbf{Z}^\top + \mathbf{R} \] In PCA, we require \(\mathbf{R}\) to be diagonal, but not so in DFA

Principal Components Analysis (PCA)

Forms for \(\mathbf{R}\) with \(n = 4\)

\[ \mathbf{R} \stackrel{?}{=} \begin{bmatrix} \sigma & 0 & 0 & 0 \\ 0 & \sigma & 0 & 0 \\ 0 & 0 & \sigma & 0 \\ 0 & 0 & 0 & \sigma \end{bmatrix} ~\text{or}~~ \mathbf{R} \stackrel{?}{=} \begin{bmatrix} \sigma_1 & 0 & 0 & 0 \\ 0 & \sigma_2 & 0 & 0 \\ 0 & 0 & \sigma_3 & 0 \\ 0 & 0 & 0 & \sigma_4 \end{bmatrix} \]

Dynamic Factor Analysis (DFA)

Forms for \(\mathbf{R}\) with \(n = 4\)

\[ \mathbf{R} \stackrel{?}{=} \begin{bmatrix} \sigma & \gamma & \gamma & \gamma \\ \gamma & \sigma & \gamma & \gamma \\ \gamma & \gamma & \sigma & \gamma \\ \gamma & \gamma & \gamma & \sigma \end{bmatrix} ~\text{or}~~ \mathbf{R} \stackrel{?}{=} \begin{bmatrix} \sigma_1 & 0 & 0 & 0 \\ 0 & \sigma_2 & 0 & \gamma_{2,4} \\ 0 & 0 & \sigma_3 & 0 \\ 0 & \gamma_{2,4} & 0 & \sigma_4 \end{bmatrix} \]

Dynamic Factor Analysis (DFA)

\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]

What form should we use for \(\mathbf{Z}\)?

\[ \mathbf{Z} \stackrel{?}{=} \begin{bmatrix} z_1 \\ z_2 \\ z_3 \\ z_4 \\ z_5 \end{bmatrix} ~\text{or}~~ \mathbf{Z} \stackrel{?}{=} \begin{bmatrix} z_{1,1} & z_{2,1} \\ z_{1,2} & z_{2,2} \\ z_{1,3} & z_{2,3} \\ z_{1,4} & z_{2,4} \\ z_{1,5} & z_{2,5} \end{bmatrix} ~\text{or}~~ \mathbf{Z} \stackrel{?}{=} \begin{bmatrix} z_{1,1} & z_{2,1} & z_{3,1} \\ z_{1,2} & z_{2,2} & z_{3,2} \\ z_{1,3} & z_{2,3} & z_{3,3} \\ z_{1,4} & z_{2,4} & z_{3,4} \\ z_{1,5} & z_{2,5} & z_{3,5} \end{bmatrix} \]

Dynamic Factor Analysis (DFA)

\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]

What form should we use for \(\mathbf{Z}\)?

\[ \mathbf{Z} \stackrel{?}{=} \begin{bmatrix} z_1 \\ z_2 \\ z_3 \\ \vdots \\ z_5 \end{bmatrix} ~\text{or}~~ \mathbf{Z} \stackrel{?}{=} \begin{bmatrix} z_{1,1} & z_{2,1} \\ z_{1,2} & z_{2,2} \\ z_{1,3} & z_{2,3} \\ \vdots & \vdots \\ z_{1,n} & z_{2,n} \end{bmatrix} ~\text{or}~~ \mathbf{Z} \stackrel{?}{=} \begin{bmatrix} z_{1,1} & z_{2,1} & z_{3,1} \\ z_{1,2} & z_{2,2} & z_{3,2} \\ z_{1,3} & z_{2,3} & z_{3,3} \\ \vdots & \vdots & \vdots \\ z_{1,n} & z_{2,n} & z_{3,n} \end{bmatrix} \]

We'll use model selection criteria to choose (eg, AICc)

Fitting DFA models

It turns out that there are an infinite number of combinations of \(\mathbf{Z}\) and \(\mathbf{x}\) that will equal \(\mathbf{y}\)

Therefore we need to impose some constraints on the model

Constraints on DFA models

The offset \(\mathbf{a}\)

\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]

\[ \mathbf{a} = \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ \vdots \\ a_n \end{bmatrix} \]

We will set the first \(m\) elements of \(\mathbf{a}\) to 0

Constraints on DFA models

The offset \(\mathbf{a}\)

For example, if \(n = 5\) and \(m = 2\)

\[ \mathbf{a} = \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ a_4 \\ a_5 \end{bmatrix} \Rightarrow \mathbf{a} = \begin{bmatrix} 0 \\ 0 \\ a_3 \\ a_4 \\ a_5 \end{bmatrix} \]

Constraints on DFA models

The offset \(\mathbf{a}\)

For example, if \(n = 5\) and \(m = 2\)

\[ \mathbf{a} = \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ a_4 \\ a_5 \end{bmatrix} \Rightarrow \mathbf{a} = \begin{bmatrix} 0 \\ 0 \\ a_3 \\ a_4 \\ a_5 \end{bmatrix} \Rightarrow \mathbf{a} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} \]

Note, however, that this causes problems for the EM algorithm so we will often de-mean the data and set \(a_i = 0\) for all \(i\)

Constraints on DFA models

The loadings \(\mathbf{Z}\)

\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]

\[ \mathbf{Z} = \begin{bmatrix} z_{1,1} & z_{2,1} & \dots & z_{m,1} \\ z_{1,2} & z_{2,2} & \dots & z_{m,2} \\ z_{1,3} & z_{2,3} & \dots & z_{m,3} \\ \vdots & \vdots & \ddots & z_{m,4} \\ z_{1,n} & z_{2,n} & \dots & z_{m,n} \end{bmatrix} \]

We will set the upper right triangle of \(\mathbf{Z}\) to 0

Constraints on DFA models

The loadings \(\mathbf{Z}\)

For example, if \(n = 5\) and \(m = 3\)

\[ \mathbf{Z} = \begin{bmatrix} z_{1,1} & 0 & 0 \\ z_{1,2} & z_{2,2} & 0 \\ z_{1,3} & z_{2,3} & z_{3,3} \\ z_{1,4} & z_{2,3} & z_{3,4} \\ z_{1,5} & z_{2,5} & z_{3,5} \end{bmatrix} \]

For the first \(m - 1\) rows of \(\mathbf{Z}\), \(z_{i,j} = 0\) if \(j > i\)

Constraints on DFA models

The loadings \(\mathbf{Z}\)

An additional constraint is necessary in a Bayesian context

\[ \mathbf{Z} = \begin{bmatrix} \underline{z_{1,1}} & 0 & 0 \\ z_{1,2} & \underline{z_{2,2}} & 0 \\ z_{1,3} & z_{2,3} & \underline{z_{3,3}} \\ z_{1,4} & z_{2,3} & z_{3,4} \\ z_{1,5} & z_{2,5} & z_{3,5} \end{bmatrix} \]

Diagonal of \(\mathbf{Z}\) is positive: \(z_{i,j} > 0\) if \(i = j\)

Constraints on DFA models

The state variance \(\mathbf{Q}\)

\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]

\[ \mathbf{w}_t \sim \text{MVN}(\mathbf{0}, \mathbf{Q}) \]

We will set \(\mathbf{Q}\) equal to the Identity matrix \(\mathbf{I}\)

Constraints on DFA models

The state variance \(\mathbf{Q}\)

For example, if \(m = 4\)

\[ \mathbf{Q} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \]

This allows our random walks to have a lot of flexibility

Dynamic Factor Analysis (DFA)

Including \(p\) covariates

\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \underline{\mathbf{D} \mathbf{d}_t} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]

\(\mathbf{d}_t\) is a \(p \times 1\) vector of covariates at time \(t\)

\(\mathbf{D}\) is an \(n \times p\) matrix of covariate effects

Dynamic Factor Analysis (DFA)

Form for \(\mathbf{D}\)

\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \underline{\mathbf{D}} \mathbf{d}_t + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]

Careful thought must be given a priori as to the form for \(\mathbf{D}\)

Should the effect(s) vary by site, species, etc?

Dynamic Factor Analysis (DFA)

Form for \(\mathbf{D}\)

For example, given 2 covariates, \(\text{Temp}\) and \(\text{Salinity}\)

\[ \mathbf{D} = \underbrace{ \begin{bmatrix} d_{\text{Temp}} & d_{\text{Salinity}} \\ d_{\text{Temp}} & d_{\text{Salinity}} \\ \vdots & \vdots \\ d_{\text{Temp}} & d_{\text{Salinity}} \\ \end{bmatrix} }_{\text{effects same by site/species}} ~ \text{or} ~ \mathbf{D} = \underbrace{ \begin{bmatrix} d_{\text{Temp}, 1} & d_{\text{Salinity}, 1} \\ d_{\text{Temp}, 2} & d_{\text{Salinity}, 2} \\ \vdots & \vdots \\ d_{\text{Temp}, n} & d_{\text{Salinity}, n} \\ \end{bmatrix} }_{\text{effects differ by site/species}} \]

A note on model selection

Earlier we saw that we could use model selection criteria to help us choose among the different forms for \(\mathbf{Z}\)

However, caution must be given when comparing models with and without covariates, and varying numbers of states

A note on model selection

Think about the model form

\[ \mathbf{y}_t = \mathbf{Z} \underline{\mathbf{x}_t} + \mathbf{a} + \mathbf{D} \underline{\mathbf{d}_t} + \mathbf{v}_t \\ \]

\(\mathbf{x}_t\) is an undetermined random walk

\(\mathbf{d}_t\) is a predetermined covariate

Unless \(\mathbf{d}\) is highly correlated with \(\mathbf{y}\), then the inclusion of a state \(\mathbf{x}\) will be favored over \(\mathbf{d}\)

A note on model selection

Thus, work out fixed effects (covariates) while keeping the random effects (states) constant, and vice versa

For example, compare data support for models with different combinations of covariates, only one state (\(m\) = 1), and a "diagonal and equal" \(\mathbf{R}\)

Interpreting DFA results

Recall that we had to constrain the form of \(\mathbf{Z}\) to fit the model

\[ \mathbf{Z} = \begin{bmatrix} z_{1,1} & 0 & \dots & 0 \\ z_{1,2} & z_{2,2} & \ddots & 0 \\ \vdots & \vdots & \ddots & 0 \\ \vdots & \vdots & \vdots & z_{m,m} \\ \vdots & \vdots & \vdots & \vdots \\ z_{1,n} & z_{2,n} & z_{3,n} & z_{m,n} \end{bmatrix} \]

So, the 1st common factor is determined by the 1st variate, the 2nd common factor by the first two variates, etc.

Interpreting DFA results

To help with this, we can use a basis rotation to maximize the loadings on a few factors

If \(\mathbf{H}\) is an \(m \times m\) non-singular matrix, these 2 DFA models are equivalent

\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{D} \mathbf{d}_t + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]

\[ \mathbf{y}_t = \mathbf{Z} \mathbf{H}^{-1} \mathbf{x}_t + \mathbf{a} + \mathbf{D} \mathbf{d}_t + \mathbf{v}_t \\ \mathbf{H} \mathbf{x}_t = \mathbf{H} \mathbf{x}_{t-1} + \mathbf{H} \mathbf{w}_t \]

How should we choose \(\mathbf{H}\)?

Basis rotation

Varimax

A varimax rotation will maximize the variance of the loadings in \(\mathbf{Z}\) along a few of the factors

PCA of 5 wines with 8 attributes

Rotated loadings

Fitting DFA models

Estimating the parameters in a DFA model can be challenging and time intensive

Methods include

maximum likelihood (eg, Chap 10 in MARSS Manual)
Bayesian (eg, Stan or greta)