Population Estimator

Census analysis

Census estimator for asynchronous populations

Multi-season version

Input counts of individuals during many seasons are converted into an estimate of total population size each season. This covers a scenario where there is asynchrony, so that there is never a day when all individuals are present at once. The degree of asynchrony is estimated using the mean length of time each individual remains in the study area (referred to as tenure of individuals). If the duration of the entire season is much longer than the mean tenure, then asynchrony is high (ie synchrony is low); then the maximum count is much lower than the total population. At the other extreme, if the tenure equals the duration of the season, there is complete synchrony, and the count of individuals is the same as the population size.

The model requires prior estimates of mean tenure per individual, and the variance of tenure among individuals. Without that, there are too many parameters to fit. Ideally, tenure is known from observations of some marked individuals. Both mean and variance of tenure must be input as prior probabilitiy distributions in a Bayesian sense. Some background on the use of priors is helpful in understanding the method.

On the other hand, the distribution of arrival and departure dates of individuals are estimated by the model. No knowledge of either is needed in advance. Both distributions are assumed to follow a Gaussian. The model will also estimate the correlation between arrival and tenure, ie, if late arriving individuals have shorter (or longer) tenure on the colony.

The power of the multi-season approach is that some seasons with poor coverage will still yield good population estimates as long as other seasons have many counts. This is based on the assumption that individual phenology is consistent (but not identical!) across seasons. Some knowledge of multi-level statistical modeling will be helpful in understanding how this works.

The model assumes all individuals present on any day are detected counted. Incomplete detection would have to be estimated with additional information.

To execute the model, a table of counts per day in one or more seasons must be input (see text box below), along with basic input parameters needed to initiate the model. Detailed instructions follow, along with a sample data table. Details of the procedure are published in "Estimating population size in asynchronous aggregations: a Bayesian approach and test with elephant seal censuses".

Input parameters and data. Click on each for instructions below.

Start day each season
End day each season
Mean tenure of one individual
Prior standard deviation of mean tenure
CV of tenure among individual
Prior standard deviation of CV tenure
Steps
Burn-in
Show steps

Paste from a spreadsheet (Excel, Libreoffice, Openoffice, etc.) or tab-delimited ascii into text box

There must be 3 columns of integers
There should be no header row
Column 1 must be Season: an integer, in typical use, this is a year, but other numbers will work
Column 2 must be Day within Season: an integer, in typical use, this is the day within each year
Column 3 is the Count on each day: must be an integer
There might be only one season, but then hyper-parameters are meaningless
Days must have the same meaning each year, ie day 10 might mean 10 Jan every year
It is not necessary to have matching days every year; one year can have day 1, 5, 10, the next year 7, 8, 12
Only include days with counts, no blank records; some seasons may have no data at all
There can be seasons with few, or even one day, but some seasons need a full series of counts
Sample data below

When the execution button is clicked, nothing will change on the screen as the model runs, but the browser should show an indication that it is waiting. When complete, results appear. A full 6000 steps with 10 or more seasons will take several minutes to finish, so it is very helpful to start with a test run of few steps (500 or fewer) to confirm that results are output. When execution completes, estimated population size in each season, with confidence limits, are typed to the screen and saved in a table for download. There are also estimated hyper-parameters -- the mean across years of arrival date, the standard deviation in arrival, and the correlation between arrival and tenure.

Input parameters needed:

Start Day: Start day each season should be earlier than counts ever start, ie before the earliest arrivals. It can be negative. The results are most accurate if there are days with predicted count~0. The default -30 works for northern elephant seals; it is equivalent to 31 October. Day 1 is then 1 December.

End Day: End day each season should be a day later than the last departure.

Mean Tenure: The mean length of time an individual is present. This must be known independently for good estimates. The default (in days) is for northern elephant seal.

Prior SD of Tenure: The prior standard deviation (SD) of that mean tenure. It is the degree of confidence in the independent estimate. Ideally, it is very small; if it is high, it will add error to the estimated population size. It must be positive. The default is from northern elephant seals.

CV Tenure: The coefficient of variation (CV) in tenure among individuals (CV = ratio SD/mean). This must be known independently for good estimates. Note the difference between CVtenure, which is a trait of the organism, and prior SD of tenure, which is confidence of the observations.

Prior SD of CV Tenure: The prior standard deviation (SD) of that CV tenure. It is the degree of confidence in the independent estimate of CV tenure. Ideally, it is very small; if it is high, it will add error to the estimated population size. It must be positive. The default is from northern elephant seals.

Steps: Number of steps to run the parameter search. Final results should be 6000-10000 steps. But first test with ~200 steps. This will confirm the model runs and finish quickly.

Burn-in: Preliminary steps to be discarded in parameter calculations. Must be < number of steps. Final run should be 2000. Test run can be any number as long as it is < number of test steps.

Show Steps: After completion, the current estimated hyper-parameters will print to the screen every Show steps. This can confirm the estimates are converging, or suggest a problem.

Download this table as SampleSealData.csv

Sample Data:

2013	31	150
2013	39	422
2013	49	1076
2013	55	1308
2013	62	1303
2014	26	77
2014	27	93
2014	31	179
2014	33	227
2014	44	829
2014	59	1406
2014	60	1374
2014	87	161
2015	17	20
2015	31	154
2015	33	198
2015	44	701
2015	52	1117
2015	58	1276
2015	92	64
2016	33	230
2016	34	274
2016	43	677
2016	51	1258
2016	57	1414
2016	67	1284
2016	93	67
2017	28	46
2017	44	694
2017	56	1442
2017	57	1473
2017	91	93
2018	38	410
2018	54	1424
2018	62	1483
2018	83	382
2018	89	132