View source: R/extract_reads.R
extract_reads | R Documentation |
Extract discordant reads and reads mapped to sequence in interest.
extract_reads(bam, tmp_dir = NULL, out_dir = getwd(), chromosome = c(1:22, "X", "Y", "KJ173426"), samtools = "samtools", interested_region = NULL, interested_seq = NULL, keep_intermediate = FALSE, threads = 5, high_mapq = 5)
bam |
A file path pointing to a bam file. |
tmp_dir |
A file path pointing to a folder for storage temporary files.
If it is |
out_dir |
A directory to store the output. By default, it is the working directory. |
chromosome |
Chromosome/RNAME of interest. Therefore, those supplementary chromosomes are filtered. |
samtools |
A path to call samtools if it cannot be called simply by samtools in the system |
interested_region |
A file path to a bed file containing the regions of interest. |
interested_seq |
A file path to a fasta file containing the sequence of interest, e.g. transposon sequence or virus genome. |
keep_intermediate |
A logical scalar specifying whether it keep the intermediate files. |
threads |
An integer specifying the number of CPU threads used in bwa alignment. If |
high_mapq |
An integer specifying the mapping quality (MAPQ) threshold. At least one reads should pass this threshold in a read pair. |
The discordant read pairs and reads in regions of interest are extracted by samtools. Here are some filters to reduce the size of extracted reads. 1. At least on reads mapped to the chromosomes/RNAME of interest. 2. only reads pairs that are at least 5000bp apart are considered as discordant reads. 3. To further narrow down the number of reads, if both reads of a read pair perfectly map to the sequence of interests, the read pairs will not be used because they cannot provide useful information to locate the potential insertion site. However, if both reads are mapped to the sequence of interests but with long clipped sequence (>20bp), they are included for the further analysis.
A text file containing extracted reads. The text file is basically simplified sam format, which contains the fields of QNAME, FLAG, RNAME, POS, MAPQ, CIGAR, SEQUENCE and QNAME_id.
Cheuk-Ting Law
extract_reads("my.bam")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.