View source: R/sc_atac_trim_barcode.R
sc_atac_trim_barcode | R Documentation |
single-cell data need to be demultiplexed in order to retain the information of the cell barcodes the data belong to. Here we reformat fastq files so barcode/s (and if available the UMI sequences) are moved from the sequence into the read name. Since scATAC-Seq data are mostly paired-end, both 'r1' and 'r2' are demultiplexed in this function.
sc_atac_trim_barcode(
r1,
r2,
bc_file = NULL,
valid_barcode_file = "",
output_folder = "",
umi_start = 0,
umi_length = 0,
umi_in = "both",
rmN = FALSE,
rmlow = FALSE,
min_qual = 20,
num_below_min = 2,
id1_st = -0,
id1_len = 16,
id2_st = 0,
id2_len = 16,
no_reverse_complement = FALSE
)
r1 |
read one for pair-end reads. |
r2 |
read two for pair-end reads, NULL if single read. |
bc_file |
the barcode information, can be either in a |
valid_barcode_file |
optional file path of the valid (expected) barcode sequences to be found in the bc_file (.txt, can be txt.gz).
Must contain one barcode per line on the second column separated by a comma (default ="").
If given, each barcode from bc_file is matched against the barcode of
best fit (allowing a hamming distance of 1). If a FASTQ |
output_folder |
the output dir for the demultiplexed fastq file, which will contain
fastq files with reformatted barcode and UMI into the read name.
Files ending in |
umi_start |
if available, the start position of the molecular identifier. |
umi_length |
if available, the start position of the molecular identifier. |
umi_in |
umi_in |
rmN |
logical, whether to remove reads that contains N in UMI or cell barcode. |
rmlow |
logical, whether to remove reads that have low quality barcode sequences |
min_qual |
the minimum base pair quality that is allowed (default = 20). |
num_below_min |
the maximum number of base pairs below the quality threshold. |
id1_st |
barcode start position (0-indexed) for read 1, which is an extra parameter that is needed if the
|
id1_len |
barcode length for read 1, which is an extra parameter that is needed if the
|
id2_st |
barcode start position (0-indexed) for read 2, which is an extra parameter that is needed if the
|
id2_len |
barcode length for read 2, which is an extra parameter that is needed if the
|
no_reverse_complement |
specifies if the reverse complement of the barcode sequence should be used for barcode error correction (only when barcode sequences are provided as fastq files). FALSE (default) lets the function decide whether to use reverse complement, and TRUE forces the function to use the forward barcode sequences. |
None (invisible 'NULL')
data.folder <- system.file("extdata", package = "scPipe", mustWork = TRUE)
r1 <- file.path(data.folder, "small_chr21_R1.fastq.gz")
r2 <- file.path(data.folder, "small_chr21_R3.fastq.gz")
# Using a barcode fastq file:
# barcodes in fastq format
barcode_fastq <- file.path(data.folder, "small_chr21_R2.fastq.gz")
sc_atac_trim_barcode (
r1 = r1,
r2 = r2,
bc_file = barcode_fastq,
rmN = TRUE,
rmlow = TRUE,
output_folder = tempdir())
# Using a barcode csv file:
# barcodes in .csv format
barcode_1000 <- file.path(data.folder, "chr21_modified_barcode_1000.csv")
## Not run:
sc_atac_trim_barcode (
r1 = r1,
r2 = r2,
bc_file = barcode_1000,
id1_st = 0,
rmN = TRUE,
rmlow = TRUE,
output_folder = tempdir())
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.