One basic anaylisis we want to perform is to count the abundance of some references in our data. Here reference is a broad term and may refer to transcripts, genomes, or amplicon sequences contained in a reference database.
For this mbtools
implements a general purpose expectation maximization (EM) counter.
It is implemented in C++ to perform fast and usually reading the alignment file takes
longer than the actual analysis.
library(mbtools)
Let's start by aligning our short read data to a reference database of 10 microbial genomes.
fi <- system.file("extdata/shotgun", package = "mbtools") %>% find_read_files() ref <- system.file("extdata/genomes/zymo_mock.fna.gz", package = "mbtools") alns <- align_short_reads( fi, threads = 3, reference = ref, use_existing = FALSE)
Counting is yet again a workflow and requires an alignment artifact and a configuration.
config <- config_count( reference = system.file("extdata/genomes/zymo_mock.fna.gz", package = "mbtools"), threads = 1, weights = TRUE ) config
And we can proceed to counting:
cn <- count_references(alns, config)
This creates a count artifact which contains the counts.
cn$counts
Note that there is a NA reference. This is the number of reads likely coming from a sequence not contained in the reference.
cn$counts[is.na(reference)]
In each sample we have 10 microbes with the same abundance.
ggplot(cn$counts, aes(x=counts, y=reference)) + geom_point() + xlim(0, 1000) + facet_wrap(~ sample) + theme_minimal()
Looks ok, but there is a lot of noise since we have very low depth.
ggplot(cn$counts, aes(x = counts)) + geom_histogram(bins = 8)
We could also compare that with a naive counting method that just uses the best alignment score. In our case that will perform equally since we have little multi-mapping here.
config$method <- "naive" cn2 <- count_references(alns, config) counts <- cn$counts[cn2$counts, on = c("sample", "reference", "effective_length")] ggplot(counts, aes(x = counts, y = i.counts)) + geom_point() + geom_smooth(method = "lm") + labs(x = "em", y = "naive") + theme_classic()
As we see they both give the same results for that simple case.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.