filter_host | R Documentation |
After aligning your sample to a target library with align_target()
,
use filter_host()
to remove unwelcome host contamination using filter
reference libraries. This function takes as input the name of the .bam file
produced via align_target()
, and produces a sorted .bam file with any
reads that match the filter libraries removed. This resulting .bam file may
be used upstream for further analysis. This function uses Rsubread. For the
Rbowtie2 equivalent of this function, see filter_host_bowtie
.
filter_host(
reads_bam,
lib_dir = NULL,
libs,
make_bam = FALSE,
output = paste(tools::file_path_sans_ext(reads_bam), "filtered", sep = "."),
subread_options = align_details,
YS = 1e+05,
threads = 1,
quiet = TRUE
)
reads_bam |
The name of a merged, sorted .bam file that has previously
been aligned to a reference library. Likely, the output from running an
instance of |
lib_dir |
Path to the directory that contains the filter Subread index files. |
libs |
The basename of the filter libraries (without index extension). |
make_bam |
Logical, whether to also output a bam file with host reads
filtered out. A .csv.gz file will be created instead if |
output |
The desired name of the output .bam or .csv.gz file. Extension is
automatically defined by whether |
subread_options |
A named |
YS |
yieldSize, an integer. The number of alignments to be read in from the bam file at once for chunked functions. Default is 100000. |
threads |
The amount of threads available for the function. Default is 1 thread. |
quiet |
Turns off most messages. Default is |
A compressed .csv can be created to produce a smaller output file that is
created more efficiently and is still compatible with metascope_id()
.
The name of a filtered, sorted .bam file written to the user's
current working directory. Or, if make_bam = FALSE
, a .csv.gz file
containing a data frame of only requisite information to run
metascope_id()
.
#### Filter reads from bam file that align to any of the filter libraries
## Assuming a bam file has been created previously with align_target()
## Create temporary directory
filter_ref_temp <- tempfile()
dir.create(filter_ref_temp)
## Create temporary taxonomizr accession
tmp_accession <- system.file("extdata", "example_accessions.sql", package = "MetaScope")
## Download filter genome
all_species <- c("Staphylococcus aureus subsp. aureus str. Newman")
all_ref <- vapply(all_species, MetaScope::download_refseq,
reference = FALSE, representative = FALSE, compress = TRUE,
out_dir = filter_ref_temp, caching = FALSE,
accession_path = tmp_accession,
FUN.VALUE = character(1))
ind_out <- vapply(all_ref, mk_subread_index, FUN.VALUE = character(1))
## Get path to example reads
readPath <- system.file("extdata", "subread_target.bam",
package = "MetaScope")
## Copy the example reads to the temp directory
refPath <- file.path(filter_ref_temp, "subread_target.bam")
file.copy(from = readPath, to = refPath)
utils::data("align_details")
align_details[["type"]] <- "rna"
align_details[["phredOffset"]] <- 10
# Just to get it to align - toy example!
align_details[["maxMismatches"]] <- 10
## Align and filter reads
filtered_map <- filter_host(
refPath, lib_dir = filter_ref_temp,
libs = stringr::str_replace_all(all_species, " ", "_"),
threads = 1, subread_options = align_details)
## Remove temporary directory
unlink(filter_ref_temp, recursive = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.