Mark D. Scheuerell
Fish Ecology Division, Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, Seattle, WA USA, mark.scheuerell@noaa.gov
Casey P. Ruff
Skagit River System Cooperative, Mt. Vernon, WA USA, cruff@skagitcoop.org
Eric R. Buhle
Ocean Associates, Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, Seattle, WA USA, eric.buhle@noaa.gov
James T. Thorson
Fishery Resource Analysis and Monitoring Division, Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, Seattle, WA USA, james.thorson@noaa.gov
This is version 0.17.04.26.
ASSESSOR incorporates data on spawners (escapement), harvest, and age composition into a retrospective run reconstruction and probabilistic forecast under a Bayesian framework. The general structure follows that of Fleischman et al. (2013), but ASSESSOR also allows for the inclusion of specific external drivers of productivity, both natural (e.g., climate variability) and anthropogenic (e.g., flow alteration). The model is composed of two primary pieces: a process model that governs the true population dynamics, and an observation model that relates the data in hand to the true process.
We begin with our process model that describes the true, but unknown production of offspring from their parents. In any given year t, spawning adults produce some number of surviving offspring, which follows a general Ricker model, such that
\[\log(R_t) = \log(S_t) + a_t \ – bS_t + w_t.\]
Here \(R_t\) is the total number of subsequent recruits (offspring) born in year t; \(S_t\) is the true, but unobserved, number of spawning adults; \(a_t\) is the annual density-independent productivity; \(b\) is the strength of density dependence; and \(w_t\) is a process error representing environmental stochasticity, which is autocorrelated over time according to \(w_t \sim \text{N}(\phi w_{t-1}, q_a)\).
Previous applications of time-varying productivity (e.g., Peterman et al. 2003; Dorner et al. 2008) have used a Markov form where \(a_t \sim \text{N}(a_{t-1}, \sigma_a)\), but we will model \((a_t)\) as a function of time-varying covariates. Specifically,
\[a_t = \bar{a} + \sum_{i=1}^{M} c_{i,t} \ X_{i,t+h} \]
Here \(\bar{a}\) is the underlying mean productivity, and \(c_{i,t}\) is the effect of covariate \(i\) at time \(t\), \(X_{i,t+h}\). To allow for direct comparison of effect sizes, the covariates are typically standardized to have a zero mean and unit variance.
The estimated number of fish of age \(a\) returning in year \(t\) \((N_{a,t})\) is then product of the total number of brood-year recruits in year \(t – a\) and the proportion of mature fish from that brood year that returned to spawn at age \(a\) \((p_{a,t-a})\), such that
\[N_{a,t} = R_{t-a} \ p_{a,t-a}.\]
The vector of age-specific return rates for brood year \(t\) \((\mathbf{p}_t)\) has length \(A\), which equals the number of adult age classes. That is, \(\mathbf{p}_t\) is a combination of the probability of surviving to, and maturing in years \(t + a_\min\) to \(t + a_\max\). We modeled \((\mathbf{p}_t)\) as a random effect using a hierarchical form of the Dirichlet distribution, where
\[\mathbf{p}_t \sim \text{Dirichlet}(\boldsymbol{\mu},\pi).\]
In this formulation, the mean vector \(\boldsymbol{\mu}\) is itself distributed as a Dirichlet, and therefore has a total of \(A\) elements that are all greater than zero. The precision parameter \(\pi\) affects each of the elements in \(\boldsymbol{\mu}\), such that large values of \(\pi\) results in values of \(\mathbf{p}_t\) that are very close to \(\boldsymbol{\mu}\).
Estimates of the number of spawning adults necessarily contain some sampling or observation errors due to incomplete censuses, mis-identification, etc. Therefore, we will assume that the estimates of escapement \((E_t)\) are log-normally distributed about the true number of spawners \((S_t)\)
\[\log(E_t) \sim \text{Normal}\left(\log(S_t), r_s\right).\]
We do not have the ability to estimate the observation variances for both the escapement and harvest without any additional prior information. Therefore, we will assume the harvest is recorded without error and calculate \(S_t\) as the difference between the estimated total run size \((N_t)\) and harvest \((H_t)\)
\[S_t = N_t - H_t.\]
and \(N_t\) is the sum of \(N_{a,t}\) over all age classes.
The age composition data include the number of fish in each age class \(a\) in year \(t\) \((O_{a,t})\). The age data are then modeled as a multinomial process with order \(Y_t\) and proportion vector \(\mathbf{d}_t\), such that
\[\mathbf{O}_t \sim \text{Multinomial}(Y_t, \mathbf{d}_t).\]
The order of the multinomial is simply the sum of the observed numbers of fish across all ages returning in year \(t\):
\[Y_t = \sum_{a=a_\min}^{a_\max} O_{a,t}\]
The proportion vector \(\mathbf{d}_t\) for the multinomial is based on the age-specific, model-derived estimates of adult returns in year \(t\) \((N_{a,t})\), such that
\[d_{a,t} = {N_{a,t} \over \displaystyle \sum_{a=a_\min}^{a_\max} N_{a,t}}.\]
All analyses require the R software (v3.2.3) for data retrieval, data processing, and summarizing model results, and the JAGS software (v4.2.0) for Markov chain Monte Carlo (MCMC) simulation. Please note that some of the R code below may not work with older versions of JAGS due to some changes in the ways that arrays are handled.
We also need a few packages that are not included with the base installation of R, so we begin by installing them (if necessary) and then loading them.
if(!require("R2jags")) {
install.packages("R2jags")
library("R2jags")
}
if(!require("RCurl")) {
install.packages("RCurl")
library("RCurl")
}
if(!require("gsl")) {
install.packages("gsl")
library("gsl")
}
The last thing we’ll need is the following function for trimming real numbers to a desired precision when plotting output.
Re2prec <- function(x,map="round",prec=1) {
## 'map' can be round, floor, or ceiling
## 'prec' is nearest value (eg, 0.1 means to nearest tenth); default 1 gives normal behavior
if(prec<=0) { stop("\"prec\" cannot be less than or equal to 0") }
do.call(map,list(x/prec))*prec
}
We begin by specifying the names of four necessary data files that contain the following information:
Let’s also define the following parameters, which will be referenced throughout the analysis.
n_yrs
: number of calendar years of dataA
: number of age classesM
: number of covariates## 1. file with escapement data
## [n_yrs x 2] matrix of obs counts; 1st col is calendar yr
fn_esc <- "SkagitSthdEsc.csv"
## 2. file with age comp data
## [n_yrs x (1+A)]; 1st col is calendar yr
fn_age <- "SkagitSthdAge.csv"
## min & max ages
age_min <- 3
age_max <- 8
## years, if any, of age-comp to skip; see below
age_skip <- 2
## 3. file with harvest data
## [n_yrs x 2] matrix of obs catch; 1st col is calendar yr
fn_harv <- "SkagitSthdCatch.csv"
## 4. file with covariate data
## [n_yrs x (1+MM)]; 1st col is calendar yr
fn_hrel <- "SkagitHatchRel.csv"
## time lags (years) for covariates
flow_lag <- 1
marine_lag <- 2
hrel_lag <- 2
## number of years of forecasts
n_fore <- 1
## file where to save JAGS model
fn_jags <- "SkagitSthd_RR_JAGS.txt"
## upper threshold for Gelman & Rubin's (1992) potential scale reduction factor (Rhat).
Rhat_thresh <- 1.1
## URL for example data files
## set to NULL if using a local folder/directory
ex_url <- "https://raw.githubusercontent.com/mdscheuerell/ASSESSOR/master/datafiles/"
Here we load in the first three data files and do some simple calculations and manipulations. First the spawner data:
## escapement
dat_esc <- read.csv(text = getURL(paste0(ex_url,fn_esc)))
## years of data
dat_yrs <- dat_esc$year
## number of years of data
n_yrs <- length(dat_yrs)
## get first & last years
yr_frst <- min(dat_yrs)
yr_last <- max(dat_yrs)
## log of escapement
ln_dat_esc <- c(log(dat_esc[,-1]),rep(NA,n_fore))
Next the age composition data:
## age comp data
dat_age <- read.csv(text = getURL(paste0(ex_url,fn_age)))
## drop year col & first age_min+age_skip rows
dat_age <- dat_age[-(1:(age_min+age_skip)),-1]
## num of age classes
A <- age_max-age_min+1
## add row(s) of NA's for forecast years
dat_age <- rbind(dat_age,matrix(0,n_fore,A,dimnames=list(n_yrs+seq(n_fore),colnames(dat_age))))
## total num of age obs by cal yr
dat_age[,"sum"] <- apply(dat_age,1,sum)
## row indices for any years with no obs age comp
idx_NA_yrs <- which(dat_age$sum<A,TRUE)
## replace 0's in yrs w/o any obs with NA's
dat_age[idx_NA_yrs,(1:A)] <- NA
## change total in yrs w/o any obs from 0 to A to help dmulti()
dat_age[idx_NA_yrs,"sum"] <- A
## convert class
dat_age <- as.matrix(dat_age)
And then the harvest data:
## harvest
dat_harv <- read.csv(text = getURL(paste0(ex_url,fn_harv)))
## drop year col & first age_max rows
dat_harv <- c(dat_harv[,-1],rep(0,n_fore))
This analysis uses 6 covariates as drivers of the population’s instrinic growth rate:
We begin by getting the daily flow data from the US Geological Service National Water Information System. We will use the direct link to the gage data from the Skagit River near Mount Vernon, WA (#12178100), beginning with the first year of fish data.
## flow site
flow_site <- 12178100
## get URL for flow data from USGS
flow_url <- paste0("https://waterdata.usgs.gov/nwis/dv",
"?cb_00060=on",
"&format=rdb",
"&site_no=",flow_site,
"&begin_date=",yr_frst,"-01-01",
"&end_date=",yr_last,"-12-31")
Next we will retrieve