Bison, bioinformatics and (meta)barcoding

This is my first post here since April. I’ve spent much of the last three months travelling, holidaying, workshopping and attending two conferences. I had a wonderful time in Adelaide at the Genetics Society of AustralAsia’s 2015 conference and learned about some fascinating new science at the Boden Reseach Conference on Comparative Animal Genomics. But what I thought I’d write about here is the week I spent in Białowieża in Poland, for the 2015 Metabarcoding Spring School (#metabar2015 on Twitter). This was organised by the metabarcoding.org team, in conjunction with the Mammal Research Institute of the Polish Academy of Sciences. Apart from being a chance to meet lots of new people and spend time in an amazing part of the world, I also learned a great deal. The perfect mixture! There’s no way I can report back on everything here (and in a coincidence of timing, this overview has just been published), so I’ll outline some of the key issues discussed at the workshop and provide links to relevant papers for those who want to learn more.

Metabarcoding Spring School 2015 group photo

But first, for those I’ve lost already, what is DNA metabarcoding?

DNA barcoding is a method that can be used to identify unknown specimens. You take a sample, isolate a specific gene sequence, and compare this to sequences from known specimens, to work out which species it belongs to.

DNA metabarcoding takes this concept and applies it to mixed environmental DNA (eDNA) samples, which might contain DNA from several – even hundreds – of species. This is a relatively new technique but it has enabled some really interesting research. Early studies explored microbial diversity, because many microbes cannot be easily studied or maintained in lab conditions, but their DNA can be detected from environmental samples. Likewise, metabarcoding has been used to study cryptic and hard to identify animals, such as nematodes and earthworms. The approach has also been used to assess biodiversity from bulk arthropod samples, soil samples and ancient bones; to determine the components of traditional medicines; and to study animal diets from gut contents and faeces. This explains my interest: we are using DNA metabarcoding to identify mammal predators and their vertebrate prey from scats collected in Tasmania.

So, DNA metabarcoding is a method (or more correctly a series of methods) that can be used to address a whole range of ecological questions.

The Metabarcoding Spring School had a strong focus on experimental design, method choice and evaluation of errors – all very satisfying for a detail-focused person like myself. But these are also essential things to get your head around if you want to work in this area. It may seem, superficially, as though metabarcoding studies should be fairly easy to get going: just extract some DNA, PCR, sequence and assign sequence reads to species – easy! Of course in reality there are many traps for the unwary and these popped up in pretty much every talk. So here, in roughly the order they would be encountered in a metabarcoding workflow, are a selection of things that every would-be-metabarcoder should be thinking about:

Marker choice. DNA barcoding in the strictest sense uses highly standardised markers: the COI gene for animals, rbcL or matK for plants and ITS2 for fungi. However, the same genes or PCR primers may not be suitable for metabarcoding studies. For example, eDNA is often degraded, meaning that PCR works better if you target shorter sequences, but it is difficult to design conserved primers within the COI region. Other genes are likely to be better choices in many cases.

Primer choice. Once you’ve picked a gene, it is important to choose primers that won’t introduce too much bias into your results. Ideally there will be no primer mismatches for any of your target species, but lots of mismatches for non-target species. Even a small degree of mismatch can reduce PCR efficiency, meaning you might fail to detect some species that were really present. There is probably no such thing as the ideal metabarcoding marker, so some compromises will be necessary – just make sure you know what these are!

Sampling. Biases introduced by sampling design may influence results. Sampling needs to account for local heterogeneity in the distribution of biodiversity over time and space in each system. The best approach may vary depending on target taxa (how big are they, how much do they move), environment (e.g. large vs small lakes, still vs flowing water) and local conditions (does biodiversity vary with season or weather conditions – might have to sample all sites within the same very short time period to enable comparisons).

Laboratory analysis. Several studies have now demonstrated that sample handling methods, laboratory reagents and equipment can introduce biases, errors and contaminants into eDNA studies. The same is true for the DNA sequencing process. It may not be possible to remove all traces of bias, but careful experimental design will at least enable you to detect and account for these.

Tag switching. In metabarcoding studies, DNA is often amplified with primers that incorporate short sequence ’tags’ or multiplex identifiers (MIDs) at each end of each amplicon. A unique combination of tags is used for each sample. This allows DNA from many samples to be pooled in a single sequencing run. The sequence reads are then sorted by their tags, to work out which DNA sequence came from which sample. Unfortunately, under some laboratory conditions, some of the tags can get mixed up or chimaeric sequences can form, which may be mistakenly attributed to the wrong sample! This should make you scared! Fortunately, now that this problem has been recognised, there are some good recommendations you can follow to minimise the risks.

Data analysis. Bioinformatics is probably the key limiting factor for metabarcoding studies, now that high-throughput sequencing is widely available. In fact the opportunity to learn about the new metabarcoding software package OBITOOLS was one of the most important parts of the workshop for me. Several speakers noted that different data analysis parameters (e.g. for quality trimming, data filtering, sequence clustering and OTU delineation) can significantly alter interpretation of results. Incorporating biological replicates, technical replicates and plenty of positive and negative controls can help you to establish the most appropriate parameters to use for your data analysis.

That’s only a snapshot of the topics we discussed, but I hope it will point anyone thinking about starting a DNA metabarcoding project in the right direction. Ultimately there is no single correct approach for metabarcoding, the choice of methods will be very context-dependent.

Glimpse of red deer in the meadows next to Białowieża forest

Finally, I can’t end this post without at least mentioning the fantastic workshop location! Białowieża is in eastern Poland, very close to the border with Belarus and right next to the Białowieża forest, one of the last remnants of Europe’s primeval forests. When I visited, the meadows around the village were full of wildflowers. We enjoyed an afternoon walk in the forest, with my group guided by Rafał Kowalczyk, who is the Director of the Mammal Research Institute and an excellent wildlife photographer. I have never been anywhere else like this, so full of big old trees and dead and decaying tree stumps covered in huge fungi. To imagine that such forest once covered much of Europe… And the absolute highlight of the trip – even though I had to get up at 3am – was the chance to see European bison in the mist at sunrise. It’s a very special place!