Dynamic Factor Analysis (DFA)
Forms of covariance matrix
Constraints for model fitting
Interpretation of results
11 Feb 2019
Dynamic Factor Analysis (DFA)
Forms of covariance matrix
Constraints for model fitting
Interpretation of results
\[ y_{i,t} = x_{i,t} + a_i + v_{i,t} \\ x_{i,t} = x_{i,t-1} + w_{i,t} \]
with
\(v_{i,t} \sim \text{N}(0, R)\)
\(w_{i,t} \sim \text{N}(0, Q)\)
\[ y_{1,t} = x_{1,t} + a_1 + v_{1,t} \\ y_{2,t} = x_{2,t} + a_2 + v_{2,t} \\ \vdots \\ y_{n,t} = x_{n,t} + a_2 + v_{n,t} \\ \]
\[ x_{1,t} = x_{1,t-1} + w_{1,t} \\ x_{2,t} = x_{2,t-1} + w_{2,t} \\ \vdots \\ x_{n,t} = x_{n,t-1} + w_{n,t} \]
\[ \mathbf{y}_t = \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
with
\(\mathbf{v}_t \sim \text{MVN}(\mathbf{0}, \mathbf{R})\)
\(\mathbf{w}_t \sim \text{MVN}(\mathbf{0}, \mathbf{Q})\)
We often observe covariance among environmental time series, especially for those close to one another
Are there some common patterns here?
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
\[ \begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5 \end{bmatrix}_t = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ \end{bmatrix} \times \begin{bmatrix} x_{JF} \\ x_N \\ x_S \end{bmatrix}_t + \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ a_4 \\ a_5 \end{bmatrix} + \begin{bmatrix} v_1 \\ v_2 \\ v_3 \\ v_4 \\ v_5 \end{bmatrix}_t \]
\[ \begin{bmatrix} x_{JF} \\ x_N \\ x_S \end{bmatrix}_t = \begin{bmatrix} x_{JF} \\ x_N \\ x_S \end{bmatrix}_{t-1} + \begin{bmatrix} w_{JF} \\ w_N \\ w_S \end{bmatrix}_t \]
What if our observations were instead a mixture of 2+ states?
For example, we sampled haul-outs located between several breeding sites
\[ \begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5 \end{bmatrix}_t = \begin{bmatrix} 0.8 & 0.2 & 0 \\ 0.2 & 0.7 & 0.1 \\ 0 & 0.9 & 0.1 \\ 0 & 0.3 & 0.7 \\ 0 & 0.1 & 0.9 \\ \end{bmatrix} \times \begin{bmatrix} x_{JF} \\ x_N \\ x_S \end{bmatrix}_t + \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ a_4 \\ a_5 \end{bmatrix} + \begin{bmatrix} v_1 \\ v_2 \\ v_3 \\ v_4 \\ v_5 \end{bmatrix}_t \]
\[ \begin{bmatrix} x_{JF} \\ x_N \\ x_S \end{bmatrix}_t = \begin{bmatrix} x_{JF} \\ x_N \\ x_S \end{bmatrix}_{t-1} + \begin{bmatrix} w_{JF} \\ w_N \\ w_S \end{bmatrix}_t \]
What if our observations were a mixture of states, but we didn't know how many or the weightings?
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
What are the dimensions of \(\mathbf{Z}\)?
What are the elements within \(\mathbf{Z}\)?
DFA is a dimension reduction technique, which models \(n\) observed time series as a function of \(m\) hidden states (patterns), where \(n \gg m\)
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
data: \(\mathbf{y}_t\) is \(n \times 1\)
loadings: \(\mathbf{Z}\) is \(n \times m\) with \(n > m\)
states: \(\mathbf{x}_t\) is \(m \times 1\)
Goal is to reduce some large number of correlated variates into a few uncorrelated factors
Calculating the principal components requires us to estimate the covariance of the data
\[ \text{PC} = \text{eigenvectors}(\text{cov}(\mathbf{y})) \]
There will be \(n\) principal components (eigenvectors) for an \(n \times T\) matrix \(\mathbf{y}\)
We reduce the dimension by selecting a subset of the components that explain much of the variance (eg, the first 2)
We need to estimate the covariance in the data \(\mathbf{y}\)
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t, ~ \text{with} ~ \mathbf{v}_t \sim \text{MVN}(\mathbf{0}, \mathbf{R}) \]
so
\[ \text{cov}(\mathbf{y}_t) = \mathbf{Z} \text{cov}(\mathbf{x}_t) \mathbf{Z}^\top + \mathbf{R} \] In PCA, we require \(\mathbf{R}\) to be diagonal, but not so in DFA
\[ \mathbf{R} \stackrel{?}{=} \begin{bmatrix} \sigma & 0 & 0 & 0 \\ 0 & \sigma & 0 & 0 \\ 0 & 0 & \sigma & 0 \\ 0 & 0 & 0 & \sigma \end{bmatrix} ~\text{or}~~ \mathbf{R} \stackrel{?}{=} \begin{bmatrix} \sigma_1 & 0 & 0 & 0 \\ 0 & \sigma_2 & 0 & 0 \\ 0 & 0 & \sigma_3 & 0 \\ 0 & 0 & 0 & \sigma_4 \end{bmatrix} \]
\[ \mathbf{R} \stackrel{?}{=} \begin{bmatrix} \sigma & 0 & 0 & 0 \\ 0 & \sigma & 0 & 0 \\ 0 & 0 & \sigma & 0 \\ 0 & 0 & 0 & \sigma \end{bmatrix} ~\text{or}~~ \mathbf{R} \stackrel{?}{=} \begin{bmatrix} \sigma_1 & 0 & 0 & 0 \\ 0 & \sigma_2 & 0 & 0 \\ 0 & 0 & \sigma_3 & 0 \\ 0 & 0 & 0 & \sigma_4 \end{bmatrix} \]
\[ \mathbf{R} \stackrel{?}{=} \begin{bmatrix} \sigma & \gamma & \gamma & \gamma \\ \gamma & \sigma & \gamma & \gamma \\ \gamma & \gamma & \sigma & \gamma \\ \gamma & \gamma & \gamma & \sigma \end{bmatrix} ~\text{or}~~ \mathbf{R} \stackrel{?}{=} \begin{bmatrix} \sigma_1 & 0 & 0 & 0 \\ 0 & \sigma_2 & 0 & \gamma_{2,4} \\ 0 & 0 & \sigma_3 & 0 \\ 0 & \gamma_{2,4} & 0 & \sigma_4 \end{bmatrix} \]
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
What form should we use for \(\mathbf{Z}\)?
\[ \mathbf{Z} \stackrel{?}{=} \begin{bmatrix} z_1 \\ z_2 \\ z_3 \\ z_4 \\ z_5 \end{bmatrix} ~\text{or}~~ \mathbf{Z} \stackrel{?}{=} \begin{bmatrix} z_{1,1} & z_{2,1} \\ z_{1,2} & z_{2,2} \\ z_{1,3} & z_{2,3} \\ z_{1,4} & z_{2,4} \\ z_{1,5} & z_{2,5} \end{bmatrix} ~\text{or}~~ \mathbf{Z} \stackrel{?}{=} \begin{bmatrix} z_{1,1} & z_{2,1} & z_{3,1} \\ z_{1,2} & z_{2,2} & z_{3,2} \\ z_{1,3} & z_{2,3} & z_{3,3} \\ z_{1,4} & z_{2,4} & z_{3,4} \\ z_{1,5} & z_{2,5} & z_{3,5} \end{bmatrix} \]
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
What form should we use for \(\mathbf{Z}\)?
\[ \mathbf{Z} \stackrel{?}{=} \begin{bmatrix} z_1 \\ z_2 \\ z_3 \\ \vdots \\ z_5 \end{bmatrix} ~\text{or}~~ \mathbf{Z} \stackrel{?}{=} \begin{bmatrix} z_{1,1} & z_{2,1} \\ z_{1,2} & z_{2,2} \\ z_{1,3} & z_{2,3} \\ \vdots & \vdots \\ z_{1,n} & z_{2,n} \end{bmatrix} ~\text{or}~~ \mathbf{Z} \stackrel{?}{=} \begin{bmatrix} z_{1,1} & z_{2,1} & z_{3,1} \\ z_{1,2} & z_{2,2} & z_{3,2} \\ z_{1,3} & z_{2,3} & z_{3,3} \\ \vdots & \vdots & \vdots \\ z_{1,n} & z_{2,n} & z_{3,n} \end{bmatrix} \]
We'll use model selection criteria to choose (eg, AICc)
It turns out that there are an infinite number of combinations of \(\mathbf{Z}\) and \(\mathbf{x}\) that will equal \(\mathbf{y}\)
Therefore we need to impose some constraints on the model
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
\[ \mathbf{a} = \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ \vdots \\ a_n \end{bmatrix} \]
We will set the first \(m\) elements of \(\mathbf{a}\) to 0
For example, if \(n = 5\) and \(m = 2\)
\[ \mathbf{a} = \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ a_4 \\ a_5 \end{bmatrix} \Rightarrow \mathbf{a} = \begin{bmatrix} 0 \\ 0 \\ a_3 \\ a_4 \\ a_5 \end{bmatrix} \]
For example, if \(n = 5\) and \(m = 2\)
\[ \mathbf{a} = \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ a_4 \\ a_5 \end{bmatrix} \Rightarrow \mathbf{a} = \begin{bmatrix} 0 \\ 0 \\ a_3 \\ a_4 \\ a_5 \end{bmatrix} \Rightarrow \mathbf{a} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} \]
Note, however, that this causes problems for the EM algorithm so we will often de-mean the data and set \(a_i = 0\) for all \(i\)
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
\[ \mathbf{Z} = \begin{bmatrix} z_{1,1} & z_{2,1} & \dots & z_{m,1} \\ z_{1,2} & z_{2,2} & \dots & z_{m,2} \\ z_{1,3} & z_{2,3} & \dots & z_{m,3} \\ \vdots & \vdots & \ddots & z_{m,4} \\ z_{1,n} & z_{2,n} & \dots & z_{m,n} \end{bmatrix} \]
We will set the upper right triangle of \(\mathbf{Z}\) to 0
For example, if \(n = 5\) and \(m = 3\)
\[ \mathbf{Z} = \begin{bmatrix} z_{1,1} & 0 & 0 \\ z_{1,2} & z_{2,2} & 0 \\ z_{1,3} & z_{2,3} & z_{3,3} \\ z_{1,4} & z_{2,3} & z_{3,4} \\ z_{1,5} & z_{2,5} & z_{3,5} \end{bmatrix} \]
For the first \(m - 1\) rows of \(\mathbf{Z}\), \(z_{i,j} = 0\) if \(j > i\)
An additional constraint is necessary in a Bayesian context
\[ \mathbf{Z} = \begin{bmatrix} \underline{z_{1,1}} & 0 & 0 \\ z_{1,2} & \underline{z_{2,2}} & 0 \\ z_{1,3} & z_{2,3} & \underline{z_{3,3}} \\ z_{1,4} & z_{2,3} & z_{3,4} \\ z_{1,5} & z_{2,5} & z_{3,5} \end{bmatrix} \]
Diagonal of \(\mathbf{Z}\) is positive: \(z_{i,j} > 0\) if \(i = j\)
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
\[ \mathbf{w}_t \sim \text{MVN}(\mathbf{0}, \mathbf{Q}) \]
We will set \(\mathbf{Q}\) equal to the Identity matrix \(\mathbf{I}\)
For example, if \(m = 4\)
\[ \mathbf{Q} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \]
This allows our random walks to have a lot of flexibility
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \underline{\mathbf{D} \mathbf{d}_t} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
\(\mathbf{d}_t\) is a \(p \times 1\) vector of covariates at time \(t\)
\(\mathbf{D}\) is an \(n \times p\) matrix of covariate effects
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \underline{\mathbf{D}} \mathbf{d}_t + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
Careful thought must be given a priori as to the form for \(\mathbf{D}\)
Should the effect(s) vary by site, species, etc?
For example, given 2 covariates, \(\text{Temp}\) and \(\text{Salinity}\)
\[ \mathbf{D} = \underbrace{ \begin{bmatrix} d_{\text{Temp}} & d_{\text{Salinity}} \\ d_{\text{Temp}} & d_{\text{Salinity}} \\ \vdots & \vdots \\ d_{\text{Temp}} & d_{\text{Salinity}} \\ \end{bmatrix} }_{\text{effects same by site/species}} ~ \text{or} ~ \mathbf{D} = \underbrace{ \begin{bmatrix} d_{\text{Temp}, 1} & d_{\text{Salinity}, 1} \\ d_{\text{Temp}, 2} & d_{\text{Salinity}, 2} \\ \vdots & \vdots \\ d_{\text{Temp}, n} & d_{\text{Salinity}, n} \\ \end{bmatrix} }_{\text{effects differ by site/species}} \]
Earlier we saw that we could use model selection criteria to help us choose among the different forms for \(\mathbf{Z}\)
However, caution must be given when comparing models with and without covariates, and varying numbers of states
Think about the model form
\[ \mathbf{y}_t = \mathbf{Z} \underline{\mathbf{x}_t} + \mathbf{a} + \mathbf{D} \underline{\mathbf{d}_t} + \mathbf{v}_t \\ \]
\(\mathbf{x}_t\) is an undetermined random walk
\(\mathbf{d}_t\) is a predetermined covariate
Unless \(\mathbf{d}\) is highly correlated with \(\mathbf{y}\), then the inclusion of a state \(\mathbf{x}\) will be favored over \(\mathbf{d}\)
Thus, work out fixed effects (covariates) while keeping the random effects (states) constant, and vice versa
For example, compare data support for models with different combinations of covariates, only one state (\(m\) = 1), and a "diagonal and equal" \(\mathbf{R}\)
Recall that we had to constrain the form of \(\mathbf{Z}\) to fit the model
\[ \mathbf{Z} = \begin{bmatrix} z_{1,1} & 0 & \dots & 0 \\ z_{1,2} & z_{2,2} & \ddots & 0 \\ \vdots & \vdots & \ddots & 0 \\ \vdots & \vdots & \vdots & z_{m,m} \\ \vdots & \vdots & \vdots & \vdots \\ z_{1,n} & z_{2,n} & z_{3,n} & z_{m,n} \end{bmatrix} \]
So, the 1st common factor is determined by the 1st variate, the 2nd common factor by the first two variates, etc.
To help with this, we can use a basis rotation to maximize the loadings on a few factors
If \(\mathbf{H}\) is an \(m \times m\) non-singular matrix, these 2 DFA models are equivalent
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{D} \mathbf{d}_t + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{H}^{-1} \mathbf{x}_t + \mathbf{a} + \mathbf{D} \mathbf{d}_t + \mathbf{v}_t \\ \mathbf{H} \mathbf{x}_t = \mathbf{H} \mathbf{x}_{t-1} + \mathbf{H} \mathbf{w}_t \]
How should we choose \(\mathbf{H}\)?
A varimax rotation will maximize the variance of the loadings in \(\mathbf{Z}\) along a few of the factors
Estimating the parameters in a DFA model can be challenging and time intensive
Methods include
maximum likelihood (eg, Chap 10 in MARSS Manual)