View source: R/metascope_blast.R
metascope_blast | R Documentation |
This function allows the user to check a subset of identified reads against NCBI BLAST and the nucleotide database to confirm or contradict results provided by MetaScope. It aligns the top 'metascope_id()' results to NCBI BLAST database. It REQUIRES that command-line BLAST and a separate nucleotide database have already been installed on the host machine. It returns a csv file updated with BLAST result metrics.
metascope_blast(
metascope_id_path,
bam_file_path = list.files(tmp_dir, ".updated.bam$", full.names = TRUE)[1],
tmp_dir,
out_dir,
sample_name,
fasta_dir = NULL,
num_results = 10,
num_reads = 100,
hit_list = 10,
num_threads = 1,
db_path,
quiet = FALSE,
NCBI_key = NULL,
db = NULL,
accessions_path = NULL
)
metascope_id_path |
Character string; path to a csv file output by 'metascope_id()'. |
bam_file_path |
Character string; full path to bam file for the same
sample processed by 'metascope_id'. Note that the 'filter_bam' function
must have output a bam file, and not a .csv.gz file. See
'?filter_bam_bowtie' for more details. Defaults to
|
tmp_dir |
Character string, a temporary directory in which to host files. |
out_dir |
Character string, path to output directory. |
sample_name |
Character string, sample name for output files. |
fasta_dir |
Directory where fasta files for blast will be stored. |
num_results |
Integer, the maximum number of taxa from the metascope_id output to check reads. Takes the top n taxa, i.e. those with largest abundance. Default is 10. |
num_reads |
Integer, the maximum number of reads to blast per microbe. If the true number of reads assigned to a given taxon is smaller, then the smaller number will be chosen. Default is 100. Too many reads will involve more processing time. |
hit_list |
Integer, number of blast hit results to keep. Default is 10. |
num_threads |
Integer, number of threads if running in parallel (recommended). Default is 1. |
db_path |
Character string. The database file to be searched (including basename, but without file extension). For example, if the nt database is in the nt folder, this would be /filepath/nt/nt assuming that the database files have the nt basename. Check this path if you get an error message stating "No alias or index file found". |
quiet |
Logical, whether to print out more informative messages. Default is FALSE. |
NCBI_key |
(character) NCBI Entrez API key. optional. See taxize::use_entrez(). Due to the high number of requests made to NCBI, this function will be less prone to errors if you obtain an NCBI key. |
db |
Currently accepts one of |
accessions_path |
Directory where accession files for blast are stored. |
This function assumes that you used the NCBI nucleotide database to process samples, with a download date of 2021 or later. This is necessary for compatibility with the bam file headers.
This is highly computationally intensive and should be ran with multiple cores, submitted as a multi-threaded computing job if possible.
Note, if best_hit_strain is FALSE, then no strain was observed more often among the BLAST results.
This function writes an updated csv file with metrics.
## Not run:
### Create temporary directory
file_temp <- tempfile()
dir.create(file_temp)
bamPath <- system.file("extdata", "bowtie_target.bam",
package = "MetaScope")
file.copy(bamPath, file_temp)
metascope_id(file.path(file_temp, "bowtie_target.bam"), aligner = "bowtie2",
input_type = "bam", out_dir = file_temp, num_species_plot = 0,
update_bam = TRUE)
## Run metascope blast
### Get export name and metascope id results
out_base <- bamPath %>% base::basename() %>% strsplit(split = "\\.") %>%
magrittr::extract2(1) %>% magrittr::extract(1)
metascope_id_path <- file.path(file_temp, paste0(out_base, ".metascope_id.csv"))
# NOTE: change db_path to the location where your BLAST database is stored!
db <- "/restricted/projectnb/pathoscope/data/blastdb/nt/nt"
Sys.setenv(ENTREZ_KEY = "<your id here>")
metascope_blast(metascope_id_path,
bam_file_path = file.path(file_temp, "bowtie_target.bam"),
#bam_file_path = file.path(file_temp, "bowtie_target.updated.bam")
tmp_dir = file_temp,
out_dir = file_temp,
sample_name = out_base,
db_path = db_path,
num_results = 10,
num_reads = 5,
hit_list = 10,
num_threads = 3,
db = "ncbi",
quiet = FALSE,
NCBI_key = NULL,
fasta_dir = NULL ,
accessions_path = NULL)
## Remove temporary directory
unlink(file_temp, recursive = TRUE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.