sc_atac_trim_barcode: demultiplex raw single-cell ATAC-Seq fastq reads
In LuyiTian/scPipe: Pipeline for single cell multi-omic data pre-processing

sc_atac_trim_barcode

R Documentation

demultiplex raw single-cell ATAC-Seq fastq reads

Description

single-cell data need to be demultiplexed in order to retain the information of the cell barcodes the data belong to. Here we reformat fastq files so barcode/s (and if available the UMI sequences) are moved from the sequence into the read name. Since scATAC-Seq data are mostly paired-end, both 'r1' and 'r2' are demultiplexed in this function.

Usage

sc_atac_trim_barcode(
  r1,
  r2,
  bc_file = NULL,
  valid_barcode_file = "",
  output_folder = "",
  umi_start = 0,
  umi_length = 0,
  umi_in = "both",
  rmN = FALSE,
  rmlow = FALSE,
  min_qual = 20,
  num_below_min = 2,
  id1_st = -0,
  id1_len = 16,
  id2_st = 0,
  id2_len = 16,
  no_reverse_complement = FALSE
)

Arguments

`r1`	read one for pair-end reads.
`r2`	read two for pair-end reads, NULL if single read.
`bc_file`	the barcode information, can be either in a `fastq` format (e.g. from 10x-ATAC) or from a `.csv` file (here the barcode is expected to be on the second column). Currently, for the fastq approach, this can be a list of barcode files.
`valid_barcode_file`	optional file path of the valid (expected) barcode sequences to be found in the bc_file (.txt, can be txt.gz). Must contain one barcode per line on the second column separated by a comma (default =""). If given, each barcode from bc_file is matched against the barcode of best fit (allowing a hamming distance of 1). If a FASTQ `bc_file` is provided, barcodes with a higher mapping quality, as given by the fastq reads quality score are prioritised.
`output_folder`	the output dir for the demultiplexed fastq file, which will contain fastq files with reformatted barcode and UMI into the read name. Files ending in `.gz` will be automatically compressed.
`umi_start`	if available, the start position of the molecular identifier.
`umi_length`	if available, the start position of the molecular identifier.
`umi_in`	umi_in
`rmN`	logical, whether to remove reads that contains N in UMI or cell barcode.
`rmlow`	logical, whether to remove reads that have low quality barcode sequences
`min_qual`	the minimum base pair quality that is allowed (default = 20).
`num_below_min`	the maximum number of base pairs below the quality threshold.
`id1_st`	barcode start position (0-indexed) for read 1, which is an extra parameter that is needed if the `bc_file` is in a `.csv` format.
`id1_len`	barcode length for read 1, which is an extra parameter that is needed if the `bc_file` is in a `.csv` format.
`id2_st`	barcode start position (0-indexed) for read 2, which is an extra parameter that is needed if the `bc_file` is in a `.csv` format.
`id2_len`	barcode length for read 2, which is an extra parameter that is needed if the `bc_file` is in a `.csv` format.
`no_reverse_complement`	specifies if the reverse complement of the barcode sequence should be used for barcode error correction (only when barcode sequences are provided as fastq files). FALSE (default) lets the function decide whether to use reverse complement, and TRUE forces the function to use the forward barcode sequences.

Value

None (invisible 'NULL')

Examples

data.folder <- system.file("extdata", package = "scPipe", mustWork = TRUE)
r1      <- file.path(data.folder, "small_chr21_R1.fastq.gz") 
r2      <- file.path(data.folder, "small_chr21_R3.fastq.gz") 

# Using a barcode fastq file:

# barcodes in fastq format
barcode_fastq      <- file.path(data.folder, "small_chr21_R2.fastq.gz") 

sc_atac_trim_barcode (
r1            = r1, 
r2            = r2, 
bc_file       = barcode_fastq,
rmN           = TRUE,
rmlow         = TRUE,
output_folder = tempdir())

# Using a barcode csv file:

# barcodes in .csv format
barcode_1000       <- file.path(data.folder, "chr21_modified_barcode_1000.csv")

## Not run: 
sc_atac_trim_barcode (
r1            = r1, 
r2            = r2, 
bc_file       = barcode_1000, 
id1_st        = 0,
rmN           = TRUE,
rmlow         = TRUE,
output_folder = tempdir())

## End(Not run)

LuyiTian/scPipe documentation built on Dec. 11, 2023, 8:21 p.m.