sublong: Align Long Sequence Reads to a Reference Genome via...
In Rsubread: Mapping, quantification and variant analysis of sequencing data

Description Usage Arguments Details Author(s) Examples

This function aligns DNA-seq reads, generated by long-read sequencing technologies such as Nanopore and PacBio sequencers, to a reference genome.

sublong(

    # basic input/output options
    index,
    readFiles,
    outputFiles,
    outputFormat = "BAM",
    nthreads = 1)

`index`	a character vector giving the basename of index files. Index files should be located in the current directory. The provided index should be a full index and also it should have only one block. See `buildindex` for index building options.
`readFiles`	a character vector giving the names of input files that contain long sequence reads. FASTQ and gzipped FASTQ formats are both accepted.
`outputFiles`	a character vector specifying the names of output files that contain read mapping results.
`outputFormat`	a character string specifying the format of output files. `BAM` by default. Acceptable formats include `SAM` and `BAM`.
`nthreads`	an integer giving the number of threads used for mapping. `1` by default. Note that when more than one thread is used, the order of reads might be changed in the output.

sublong is designed for the mapping of long reads. It performs full alignment of reads by performing seed-and-vote mapping followed by a bounded dynamic programming procedure. sublong is able to map reads as long as millions of bases.

sublong is extremely fast. It takes less than 10 minutes to complete the mapping of more than 100,000 long reads generated from Nanopore MinION ultra-long sequencing protocol.

The number of CIGAR operations (eg. insertion and deletion) reported for a long read may exceed the limit on the total number of operations allowed in a CIGAR string (up to 65,535 operations in a CIGAR string in BAM output and up to 99,900 operations in a CIGAR string in SAM output). If this limited is exceeded, the read will be soft clipped.

Yang Liao and Wei Shi

ref <- system.file("extdata","reference.fa",package="Rsubread")
buildindex(basename="./full_index",reference=ref,gappedIndex=FALSE, indexSplit=FALSE)
reads <- system.file("extdata","longreads.txt.gz",package="Rsubread")
sublong("./full_index",reads,"./Long_alignment.BAM",nthreads=4)

        ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
       Rsubread 2.4.2

//================================= setting ==================================\\
||                                                                            ||
||                Index name : full_index                                     ||
||               Index space : base space                                     ||
||               Index split : no-split                                       ||
||          Repeat threshold : 100 repeats                                    ||
||              Gapped index : no                                             ||
||                                                                            ||
||       Free / total memory : 0.4GB / 1.9GB                                  ||
||                                                                            ||
||               Input files : 1 file in total                                ||
||                             o reference.fa                                 ||
||                                                                            ||
||                                                                            ||
||   WARNING: the free memory is lower than 3.0GB.                            ||
||            the program may run very slow or crash.                         ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Check the integrity of provided reference sequences ...                    ||
|| No format issues were found                                                ||
ERROR: No memory can be allocated.
||                                                                            ||
||              WARNING: available memory is lower than 1.2 GB.               ||
||                           The program may run very slow.                   ||
|| Build a gapped index and/or split index into blocks to reduce memory use.  ||
||                                                                            ||
||                                                                            ||
\\============================================================================//

file not found :./full_index.reads

 ====== Subread long read mapping ======

Threads: 4
Input file: /usr/lib/R/site-library/Rsubread/extdata/longreads.txt.gz
Output file: ./Long_alignment.BAM (BAM)
Index: ./full_index

Table file './full_index.00.b.tab' is not found.
Index was loaded; the gap bewteen subreads is 0 bases