SAD formulation

Header photo

SAD formulae in R

Etienne and Haegeman (The neutral theory of biodiversity with random fission speciation, Theoretical Ecology, 2011, 4: 87-109) provide a table of analytic formulations for the species abundance distribution under various demographic models. I took their formulae, both for a large community with no dispersal limitation, one based on point speciation, the other on random-fission speciation, and wrote into R. Their paper also provides a formulation for a local community connected by migration to a large meta-community, but it is very complex. The formula in McKane, Alonso, and Solé (Analytic solution of Hubbell's model of local community dynamics, Theoretical Population Biology, 2004, 65:67-73) is simpler.

Point speciation

The Etienne and Haegeman formula for the biodiversity number under point-speciation is \( \theta=\nu {J-1} \cong \nu J \), where \( \nu \) is the speciation rate and \( J \) the meta-community size. Then the number of species at equilibrium in the meta-community, Equation 32 in their Table 1, is

\begin{equation} S = \sum_{k=1}^{J} { \theta \over {\theta+k-1} }. \end{equation}

The species abundance distribution is

\begin{equation} P(n) = {\theta \over n} \left( 1 - { {n} \over {J} } \right)^{\theta-1} = L_n \end{equation}

using the approximate version of their Equation 24. \( P(n) \) is the probability that a species has abundance \( n. \) R functions thetaPS, totalSpeciesPS, sadPS encode those formulae (PS = point speciation). The latter produces the expected number of species with n individuals, so the probability distribution multiplied by the number of species. This is Fisher's log-series, thus labeled \( L_n \).

Random-fission speciation

Their formulae for random-fission speciation are \( \theta_{rf}=\sqrt(\nu J), \)

\begin{equation} S = \theta_{rf} { {I_0 (2 \theta_{rf})} \over {I_1 (2 \theta_{rf})} }, \end{equation} \begin{equation} P(n) = { {S (S-1)} \over {J} } \left( 1 - { {n} \over {J} } \right)^{S-2}. \end{equation}

Those are the approximate forms of Equations 31 and 27 in Table 1. \( I_k \) is the 'modified Bessel function of the first kind for integer k'. Happily, the base package in R has a function besselI for the modified Bessel function, though it took some trial and error to figure out how to use it. R functions thetaRF, totalSpeciesRF, sadRF encode those formulae for random fission (thus RF).

Local community and migration

The formulae from Etienne and Haegeman for a local community connected by migration are vastly more complicated (Equations 44-45 in their Table 2). In the McKane et al. paper, the problem is split into two and easier to duplicate. They derive the probability \( P_s(n) \) of observing n individuals in a species of the local community once the abundance of the same species is specified in the meta-community. They define some constants for convenience,

\begin{equation} P^*_j = P_j { {m(J-1)} \over {(1-m)} } \end{equation} and \begin{equation} N^*_j = \left( {J-m} \over {1-m} \right) - P^*_j. \end{equation}

Here, m is the migration probability, or the probability that a local birth comes from outside the local community, and J the size of the meta-community. \( P_j \) is the relative abundance of the species in the meta-community, taken as a constant. Then the probability \( P_s(N_j) \) of observing \( N_j \) individuals of that species in the local community is

\begin{equation} P_s(N_j) = { {J} \choose {N_j} } { {\beta(N_j+P^*_j,N^*_j-N_j)} \over {\beta(P^*_j,N^*_j-J)} } \end{equation}

where \( \beta \) is the beta distribution. The latter is Equation 18 in McKane et al.

But this isn't the abundance distribution across all species in the local community. It's only the abundance distribution for a species of fixed meta-community abundance. Calculating the entire abundance distribution requires summing over all possible abundances in the meta-community, from 1 to J. Assume the meta-community probability distribution is the log-series, \( L_n \). The local abundance distribution is

\begin{equation} P(n) = \sum_{P_j=1/j}^{P_j=1} L_j P_s(N_j) \end{equation} where \( P_j \) is the meta-community relative abundance in Eq. 5. The R function sadM encodes Eq. 8, but unfortunately it is very slow to complete the entire distribution. The R code uses lchoose and lbeta, producing logarithms, to avoid overrun caused by high factorials.

A question about Eq. 8 is whether it produces a proper probability distribution. Before graphing below, I enforce this by dividing by the sum of the distribution \( P(n) \).

Sample calculations

R commands below calculated the SAD as a probability at every integer abundance then display them with log-log axes. The graph is trimmed well below \( J_m, \) after \( P(n) \lt 0.001. \) I think the unbinned form of this graph is clearer, because binning plays no role the location of the mode.

pPS=sadPS(nu=5e-5,Jm=1e6,n=1:200)
pRF=sadRF(nu=5e-5,Jm=1e6,n=1:200)
pPS.PDF=pPS/sum(pPS)
pRF.PDF=pRF/sum(pRF)
par(cex.lab=.8,cex.axis=.6,mai=c(1,1,.1,.1),mgp=c(2,1,0))
plot(1:200,pPS.PDF,log='xy',type='l',ylim=c(.001,.2),
     xlab='population size',ylab='fraction of species')
lines(1:200,pRF.PDF,col='red')
legend('bottomleft',legend=c('point speciation','random fission'),
       col=c('black','red'),lty='solid',cex=.8)
plot of chunk examples
sadLocal=sadM(nu=1e-5,Jm=1e6,J=1e4,n=1:10000,m=.2)
sadLocalPDF=sadLocal/sum(sadLocal)
par(cex.lab=.8,cex.axis=.6,mai=c(1,1,.1,.1),mgp=c(2,1,0))
plot(1:1e4,sadLocalPDF,log='xy',type='l',xlab='population size',ylab='fraction of species',
     xlim=c(1,200),ylim=c(.001,.1),col='blue')
text(y=max(sadLocalPDF),x=12,label='Local community: migration=0.2',cex=.8)
plot of chunk sadM

The R functions

(download)

# Theta with point-speciation, meta-community size Jm, speciation nu
thetaPS=function(nu,Jm) return(nu*(Jm-1))

# Total species richness, point-speciation
totalSpeciesPS=function(nu,Jm,debug=FALSE)
 {
        theta=thetaPS(nu,Jm)
        return(sum(theta/(theta+(1:Jm)-1)))
 }

# Abundance distribution point-speciation: 
#  number of species with n individuals, n a vector in [1,Jm]
sadPS=function(nu,Jm,n)
 {
         theta=thetaPS(nu,Jm)
         base=(1-n/Jm)
         ex=base^(theta-1)
         return(ex*theta/n)
 }
# Theta with random fission speciation
thetaRF=function(nu,Jm) return(sqrt(nu)*Jm)

# Total species richness, random fission speciation
totalSpeciesRF=function(nu,Jm,debug=FALSE)
 {
        theta=thetaRF(nu,Jm)
        numer=besselI(nu=0,x=2*theta,expon.scaled=TRUE)
        denom=besselI(nu=1,x=2*theta,expon.scaled=TRUE)
        if(debug) browser()
        return(theta*numer/denom)
 }

# Abundance distribution random fission speciation: 
#  number of species with n individuals, n a vector in [1,Jm]
sadRF=function(nu,Jm,n)
 {
         Sm=totalSpeciesRF(nu,Jm)
         k=Sm*(Sm-1)/Jm
         base=(1-n/Jm)
         ex=base^(Sm-2)
         return(k*ex)
 }

# Abundance distribution in local community of size J within Jm: 
#  number of species with n individuals, n a vector in [1,J], m=migration
#  summing over loop, which should be 1:(Jm-1) 
#  but loop can be less, eliminating very low probability abundances
sadM=function(nu,Jm,J,n,m,loop=1:(Jm-1),debug=FALSE)
 {
        meta=sadPS(nu,Jm,1:(Jm-1))
        sadMeta=meta/totalSpeciesPS(nu,Jm)

        P=(1:(Jm-1))/Jm
        Pstar=m*(J-1)*P/(1-m)
        Nstar=(J-m)/(1-m)-Pstar
        if(debug) browser()

        k=lchoose(J,n)
        for(oneN in loop)
         {
                numer=lbeta(n+Pstar[oneN],Nstar[oneN]-n)
                denom=lbeta(Pstar[oneN],Nstar[oneN]-J)
                logsum=k+numer-denom

                oneP=sadMeta[oneN]*exp(logsum)
                if(debug) browser()

                if(oneN==1) fullPs=oneP
                else fullPs=fullPs+oneP
                if(oneN%%50000==0) cat("At N=", oneN, "\n")
         }

        return(fullPs)
 }