Behind the Paper: Spatial Assignment Not as Accurate as Promised

Imagine that you are an airport security screener in charge of searching for and identifying wildlife products and you came across 100 bear paws. Trade of all eight bear species is regulated by the Convention on International Trade in Endangered Species (CITES), and trade of severed paws is illegal. There are a number of questions you would want to know about these paws including:
• What species is this?
• How many individuals have been poached? (In a shipment of 100 paws, this could range from 25 to 100 as each animal has four paws.)
• Where were the animals poached?

Genetics can help answer these questions. To identify species, we would grab some universal mammal primers for a mitochondrial gene, assay with PCR, sequence, align to reference sequences from public databases, and then we have an answer. So let’s pretend our hypothetical paws are from American black bears.

To answer the second question, we would extract DNA from every paw, and could use 5-8 microsatellite loci to genotype each sample. After scoring the alleles (unique variants at each microsatellite locus), we can identify if the genotypes between samples match or not. A match would mean that those paws came from the same individual. In this way we could identify how many individuals were poached. For our example, let’s take the worst possible scenario and say that each of the 100 paws came from a different animal.

Answering the first two questions was easy because little prior knowledge was needed. All we needed was standard molecular biology techniques and analyses. However, identifying where the animals were poached is much more complex; thankfully, population genetics offers a solution. If we know the population structure across geographic areas, we can assign individuals from unknown areas to a population based on their genetic signature. This idea was advanced in 2004 with a method called spatial smoothing. It acknowledged that sometimes we don’t have “training” samples from the entirety of a species range, thus we would not be able to assign a location to a poaching sample if it came from an unsampled region. The authors came up with a way to make genotype allele frequency surfaces that covered these unsampled areas so there was a possibility of assigning a sample to a region not covered in the range-wide dataset.

In my new paper I asked: how accurate and precise are spatial smoothing methods? Specifically, I compared two types of genetic markers, microsatellites and SNPs. To date only microsatellites have been used in spatial smoothing applications; however, SNPs offer the potential for better spatial resolution because they cover more of the genome. Thus local genomic changes in allele frequencies (due to genetic drift or selection) will be accounted for in the spatial smoothing surfaces, and with increasing numbers of surfaces, we expect more accurate and precise estimates to assign a spatial location to a sample.

My results suggest that spatial smoothing is not nearly as accurate as we would like it to be for conservation applications. I calculated the difference between estimated location and sample location for each individual; the median distance for 15 microsatellites was 506km while for 200 SNPs it was 266km. You might be thinking, “you used A LOT more SNPs than microsatellites, of course they did better!” There is a rule of thumb that says 1 microsatellite = 10 SNPs, the idea being that since microsatellites have more alleles per locus (~8-12 for a species with moderate to high genetic diversity), than you need a larger number of SNPs (which are selected to have two alleles) to compensate. However, there are a number of studies that poo-poo on this rule of thumb, and depending on application* SNP panels with 2-5 times more SNPs than microsatellites have equivalent power. (Side note- this all goes out the window in species with low genetic diversity.) While I did not do an informativeness assessment on the markers, yes the SNP panel probably has higher power than the microsatellite panel.

I used a second test of accuracy that was more qualitative, assessing if the estimated location was within the state or province of where the bear was actually captured. I think this is a more realistic way of assessing spatial smoothing since ultimately we would like to have sufficient power for this use of genetics to be useful for law enforcement. Again there was bad news as only 18% and 41% of samples in the microsatellite and SNP datasets were correctly assigned to the state or province where they were collected.

Figure 2 from Puckett and Eggert (2016) showing difference between true (start of line) and estimated (end of point) locations for 96 black bears using either 15 microsatellites or 200 SNPs. These figures demonstrate the high variability and low accuracy (especially for within state assignment) when using spatial smoothing methods.
Figure 2 from Puckett and Eggert (2016) showing difference between true (start of line) and estimated (end of point) locations for 96 black bears using either 15 microsatellites or 200 SNPs. These figures demonstrate the high variability and low accuracy (especially for within state assignment) when using spatial smoothing methods.

Why is Natal Location Important?
There are three applied conservation uses for natal assignment using spatially smoothing. First, as it has been applied to elephants, the method allows the conservation community to call out countries that are doing too little to combat illegal harvest and trade. Second, knowing the location of poaching may allow local law enforcement to bring charges against poachers if caught. Federal and/or international law can take care of many poaching incidences, but individual states may also want the power to prosecute.

Third, by knowing where poaching occurred (particularly hot spots) countries (or states), NGOs, or regulatory agencies can direct resources towards conservation efforts such as rangers, port screening, or education. Getting back to our bear paws example, 100 dead bears is sad but it’s a different conservation story if the poaching occurred in Ontario versus Louisiana. Ontario has so many bears (~70,000 – 90,000) they are about to open a spring harvest season in addition to their fall harvest; whereas bears in Louisiana have been listed as threatened under the Endangered Species Act since 1992. The US Fish and Wildlife Service proposed to delist bears in 2015 and reported a population size between 500 to 750 animals. Thus our hypothetical 100 bears represent vastly different proportions of the population depending on where harvest occurred. It is in this way that knowing where poaching occurs matters to the conservation of populations.

Where is the Hope?
Conservation is full of bad news. I didn’t start this project to generate a little more, I started this project because I thought I could create a useful resource for black bear managers, then go on to create a similar dataset for Asian species which experience higher trade volumes. I think the methods for spatial inference need refinement, particularly the development of methods that take advantage of haplotypes on chromosomes. I don’t know if those data can be spatially smoothed, but if so it may offer increased power over allele frequency methods. Programs such as ChromoPainter use haplotypes and get at very fine levels of populations structure, an extension with spatial explicitness may be the next big breakthrough for conservation genetics.

Spatially smoothing methods may work really well for species with highly fragmented or island populations. Black bears have a fairly continuous distribution, which I think contributed to the poor estimate of location across the eastern range. Thus we may want to consider which species are good candidates for generating range-wide datasets that can be applied in future assignment applications, and prioritize sampling and genotyping them now. As an empirical population geneticist, I’m happy to make these datasets if we have the analysis tools that will advance conservation efforts.

*- One of my conclusions is that spatial assignment is not one of the applications where a SNP panel should only have the power of a microsatellite panel. I think that for this application, panels need much more data to achieve high quality inference.

Leave a comment