blast: Basic Local Alignment Search Tool (BLAST)

View source: R/BLAST.R

blastR Documentation

Basic Local Alignment Search Tool (BLAST)

Description

Open a BLAST database and execute blastn (blastp or blastx) from blast+ to find sequences matches.

Usage

blast(db = NULL, remote = FALSE, type = "blastn")

blast_help(type = "blastn")

## S3 method for class 'BLAST'
print(x, info = TRUE, ...)

## S3 method for class 'BLAST'
predict(
  object,
  newdata,
  BLAST_args = "",
  custom_format = "",
  verbose = FALSE,
  keep_tmp = FALSE,
  ...
)

has_blast()

Arguments

db

the database file to be searched (without file extension).

remote

logical execute the query remotely on the NCBI server. db needs to be the name of a database available in the server.

type

BLAST program to use (e.g., blastn, blastp, blastx).

info

show additional data base information.

...

additional arguments are ignored.

object, x

An open BLAST database as a BLAST object created with blast().

newdata

the query as an object of class Biostrings::XStringSet.

BLAST_args

additional arguments in command-line style.

custom_format

custom format specified by space delimited format specifiers.

verbose

logical; print progress and debugging information.

keep_tmp

logical; keep temporary files for debugging.

Value

  • blast() returns a BLAST database object which can be used for queries (via predict).

  • predict returns a data.frame containing the BLAST results.

  • has_blast() returns TRUE if the blast software installation can be found and FALSE otherwise.

Installing BLAST+

The BLAST+ software needs to be installed on your system. Installation instructions are available in this package's INSTALL file and at https://www.ncbi.nlm.nih.gov/books/NBK569861/.

R needs to be able to find the executable. After installing the software, try in R

Sys.which("blastn")

If the command returns "" instead of the path to the executable, then you need to set the environment variable called PATH. In R

Sys.setenv(PATH = paste(Sys.getenv("PATH"),
   "path_to_your_BLAST_installation", sep=.Platform$path.sep))

BLAST Databases

You will also need a database. NCBI BLAST databases are updated daily and may be downloaded via FTP from https://ftp.ncbi.nlm.nih.gov/blast/db/. See blast_db_cache() on how to manage a local cache of database files.

BLAST databases are a set of database files with different extensions. All files start with the same database name. For example, ⁠16S_ribosomal_RNA.tar.gz⁠ contains files starting with ⁠16S_ribosomal_RNA⁠ which is the database name used for calling blast().

Large databases are separated into several archives numbered 00, 01, etc. Download all archives and extract the files in the same directory. All files will have a common name which is the database name used for calling blast().

Author(s)

Michael Hahsler

References

BLAST Help - BLAST+ Executable: https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html

BLAST Command Line Applications User Manual, https://www.ncbi.nlm.nih.gov/books/NBK279690/

See Also

Other blast: blast_db_cache(), makeblastdb()

Examples

## check if blastn is correctly installed. Should return the path to the
##   executable
Sys.which("blastn")

## only run if blast is installed
if (has_blast()) {
    ## check version you should have version 1.8.1+
    system2("blastn", "-version")

    ## download the latest version of the 16S Microbial
    ##  rRNA data base from NCBI using the local chache
    tgz_file <- blast_db_get("16S_ribosomal_RNA.tar.gz")

    ## extract the database files
    untar(tgz_file, exdir = "./16S_rRNA_DB")

    ## Note the database file can also downloaded without using a
    ##    cache using download.file
    # download.file(paste("https://ftp.ncbi.nlm.nih.gov/blast/db",
    #    "16S_ribosomal_RNA.tar.gz", sep = "/"),
    #    "16S_ribosomal_RNA.tar.gz", mode = "wb")
    # untar("16S_ribosomal_RNA.tar.gz", exdir = "./16S_rRNA_DB")

    ## A BLAST database is just a set of files. It is a good idea to
    ## organize the files in a directory.
    list.files("./16S_rRNA_DB")

    ## load a BLAST database (replace db with the location + name of
    ##   the BLAST DB without the extension)
    bl <- blast(db = "./16S_rRNA_DB/16S_ribosomal_RNA")
    bl

    ## read a single example sequence to BLAST
    seq <- readRNAStringSet(system.file("examples/RNA_example.fasta",
        package = "rBLAST"
    ))[1]
    seq

    ## query a sequence using BLAST
    cl <- predict(bl, seq)
    cl[1:5, ]

    ## Pass on BLAST arguments (99% identity) and use a custom format
    ## (see BLAST documentation)
    fmt <- paste(
        "qaccver saccver pident length mismatch gapopen qstart qend",
        "sstart send evalue bitscore qseq sseq"
    )
    cl <- predict(bl, seq,
        BLAST_args = "-perc_identity 99",
        custom_format = fmt
    )
    cl

    ## cleanup the example: delete the database files
    unlink("./16S_rRNA_DB", recursive = TRUE)
}

mhahsler/rBLAST documentation built on Jan. 9, 2025, 4:16 a.m.