demultiplex: Demultiplex cell barcodes and assign cell specific reads
In 87875172/scuff: Single Cell RNA-Seq UMI Filtering Facilitator (scruff)

demultiplex

R Documentation

Demultiplex cell barcodes and assign cell specific reads

Description

Demultiplex fastq files and write cell specific reads in compressed fastq format to output directory

Usage

demultiplex(
  project = paste0("project_", Sys.Date()),
  experiment,
  lane,
  read1Path,
  read2Path,
  bc,
  bcStart,
  bcStop,
  bcEdit = 0,
  umiStart,
  umiStop,
  keep,
  minQual = 10,
  yieldReads = 1e+06,
  outDir = "./Demultiplex",
  summaryPrefix = "demultiplex",
  overwrite = FALSE,
  cores = max(1, parallelly::availableCores() - 2),
  verbose = FALSE,
  logfilePrefix = format(Sys.time(), "%Y%m%d_%H%M%S")
)

Arguments

`project`	The project name. Default is `paste0("project_", Sys.Date())`.
`experiment`	A character vector of experiment names. Represents the group label for each FASTQ file, e.g. "patient1, patient2, ...". The number of cells in a experiment equals the length of cell barcodes `bc`. The length of `experiment` equals the number of FASTQ files to be processed.
`lane`	A character or character vector of flow cell lane numbers. FASTQ files from lanes having the same `experiment` will be concatenated. If FASTQ files from multiple lanes are already concatenated, any placeholder would be sufficient, e.g. "L001".
`read1Path`	A character vector of file paths to the read 1 FASTQ files. These are the read files containing UMI and cell barcode sequences.
`read2Path`	A character vector of file paths to the read 2 FASTQ files. These read files contain genomic transcript sequences.
`bc`	A character vector of pre-determined cell barcodes. For example, see `?barcodeExample`.
`bcStart`	Integer or vector of integers containing the cell barcode start positions (inclusive, one-based numbering).
`bcStop`	Integer or vector of integers containing the cell barcode stop positions (inclusive, one-based numbering).
`bcEdit`	Maximally allowed Hamming distance for barcode correction. Barcodes with mismatches equal or fewer than this will be assigned a corrected barcode if the inferred barcode matches uniquely in the provided predetermined barcode list. Default is 0, meaning no cell barcode correction is performed.
`umiStart`	Integer or vector of integers containing the start positions (inclusive, one-based numbering) of UMI sequences.
`umiStop`	Integer or vector of integers containing the stop positions (inclusive, one-based numbering) of UMI sequences.
`keep`	Read trimming. Read length or number of nucleotides to keep for read 2 (the read that contains transcript sequence information). Longer reads will be clipped at 3' end. Shorter reads will not be affected.
`minQual`	Minimally acceptable Phred quality score for barcode and UMI sequences. Phread quality scores are calculated for each nucleotide in the sequence. Sequences with at least one nucleotide with score lower than this will be filtered out. Default is 10.
`yieldReads`	The number of reads to yield when drawing successive subsets from a fastq file, providing the number of successive records to be returned on each yield. This parameter is passed to the `n` argument of the `FastqStreamer` function in ShortRead package. Default is 1e06.
`outDir`	Output folder path for demultiplex results. Demultiplexed cell specifc FASTQ files will be stored in folders in this path, respectively. Make sure the folder is empty. Default is `"./Demultiplex"`.
`summaryPrefix`	Prefix for demultiplex summary filename. Default is `"demultiplex"`.
`overwrite`	Boolean indicating whether to overwrite the output directory. Default is FALSE.
`cores`	Number of cores used for parallelization. Default is `max(1, parallelly::availableCores() - 2)`, i.e. the number of available cores minus 2.
`verbose`	Poolean indicating whether to print log messages. Useful for debugging. Default to FALSE.
`logfilePrefix`	Prefix for log file. Default is current date and time in the format of `format(Sys.time(), "%Y%m%d_%H%M%S")`.

Value

A SingleCellExperiment object containing the demultiplex summary information in the colData slot.

Examples

# Demultiplex example FASTQ files
data(barcodeExample, package = "scruff")
fastqs <- list.files(system.file("extdata", package = "scruff"),
    pattern = "\\.fastq\\.gz", full.names = TRUE)

de <- demultiplex(
    project = "example",
    experiment = c("1h1"),
    lane = c("L001"),
    read1Path = c(fastqs[1]),
    read2Path = c(fastqs[2]),
    barcodeExample,
    bcStart = 1,
    bcStop = 8,
    umiStart = 9,
    umiStop = 12,
    keep = 75,
    overwrite = TRUE)

87875172/scuff documentation built on July 28, 2024, 6:11 p.m.