I was excited to use Approximate Bayesian Computation (ABC) in my recent paper on the history of black bears in the Central Interior Highlands of the USA. Using ABC I was able to test alternative hypotheses about the history of the populations in the study. This analysis suggested that the bears translocated to the Midwest 50 years ago admixed with existing Midwest populations versus the alternative hypotheses that either the introduced bears did not breed or that they founded the current populations.
I could envision the benefits of an ABC analysis immediately but executing that vision didn’t come as quickly. The goal of this post is to point new ABC users in the right direction. There is plenty written on how ABC works, see both the primary literature (some starter material here, here, and here) as well as software manuals (ABCToolbox, fastSimCoal, and DIYABC, were particularly helpful to me). Also, I wish I had known about this blog devoted to ABC when I was starting out!
Learning the basics (at least as a practitioner versus developer) may be the easy part as there are a number of software packages available for use. The two harder parts for me were: deciding which software to use and selecting summary statistics. In terms of software, consider both the question you are trying to answer and your ability to think backwards in time. While I did not investigate all available programs, the main difference I saw was their ability to incorporate complexity. If your question focuses on simpler models (e.g.- varying divergence scenarios, stable population size), you can use a simpler program such as DIYABC. More complex models (e.g.- population growth/decline, migration) may need more complex programs such as ABCToolbox. With model complexity comes understanding how to code demographic events into your model; this may be challenging (at least initially) as you learn to think backwards in time instead of forward. DIYABC helps with a pictorial representation of the model that the user may check for errors.
Once you select a model, you next select the summary statistics that will be used to describe the data in multidimensional space. This is challenging because the summary statistics are meant to describe properties of the model (or pick up differences between models if you are interested in model selection). Therefore, the selection of statistics needs to capture aspects of the data you are trying to model and which statistics do that is not always clear. Additionally, small datasets (number of samples and/or number of loci) do not contain much information; researchers may want to limit both the number of summary statistics used or cautiously interpret the output of small datasets.
Below I list the summary statistics available in DIYABC for SSR markers and the model parameter they are meant to capture (see manual for primary literature citations). If I am mistaken, please leave a comment as a community resource!
Mean Garza-Williamson’s M
Mean number of alleles
Mean number of alleles
Delta mu squared distance
Assignment of Individuals to Populations
Shared allele distance
Admixture summary statistics
Deciding on the summary statistics is challenging. Under the same dataset and models, changing the summary statistics can result in different parameter estimates. This makes sense as the summary statistics are capturing different aspects of the demographic history; therefore, not modeling one or more of those aspects will effect output. But a little time learning about ABC may result in strong inference for your question of interest.