demultiplex | R Documentation |
Demultiplex fastq files and write cell specific reads in compressed fastq format to output directory
demultiplex(
project = paste0("project_", Sys.Date()),
experiment,
lane,
read1Path,
read2Path,
bc,
bcStart,
bcStop,
bcEdit = 0,
umiStart,
umiStop,
keep,
minQual = 10,
yieldReads = 1e+06,
outDir = "./Demultiplex",
summaryPrefix = "demultiplex",
overwrite = FALSE,
cores = max(1, parallelly::availableCores() - 2),
verbose = FALSE,
logfilePrefix = format(Sys.time(), "%Y%m%d_%H%M%S")
)
project |
The project name. Default is
|
experiment |
A character vector of experiment names. Represents the
group label for each FASTQ file, e.g. "patient1, patient2, ...". The number
of cells in a experiment equals the length of cell barcodes |
lane |
A character or character vector of flow cell lane numbers. FASTQ
files from lanes having the same |
read1Path |
A character vector of file paths to the read 1 FASTQ files. These are the read files containing UMI and cell barcode sequences. |
read2Path |
A character vector of file paths to the read 2 FASTQ files. These read files contain genomic transcript sequences. |
bc |
A character vector of pre-determined cell barcodes. For example,
see |
bcStart |
Integer or vector of integers containing the cell barcode start positions (inclusive, one-based numbering). |
bcStop |
Integer or vector of integers containing the cell barcode stop positions (inclusive, one-based numbering). |
bcEdit |
Maximally allowed Hamming distance for barcode correction. Barcodes with mismatches equal or fewer than this will be assigned a corrected barcode if the inferred barcode matches uniquely in the provided predetermined barcode list. Default is 0, meaning no cell barcode correction is performed. |
umiStart |
Integer or vector of integers containing the start positions (inclusive, one-based numbering) of UMI sequences. |
umiStop |
Integer or vector of integers containing the stop positions (inclusive, one-based numbering) of UMI sequences. |
keep |
Read trimming. Read length or number of nucleotides to keep for read 2 (the read that contains transcript sequence information). Longer reads will be clipped at 3' end. Shorter reads will not be affected. |
minQual |
Minimally acceptable Phred quality score for barcode and UMI sequences. Phread quality scores are calculated for each nucleotide in the sequence. Sequences with at least one nucleotide with score lower than this will be filtered out. Default is 10. |
yieldReads |
The number of reads to yield when drawing successive
subsets from a fastq file, providing the number of successive records to be
returned on each yield. This parameter is passed to the |
outDir |
Output folder path for demultiplex results. Demultiplexed
cell specifc FASTQ files will be stored in folders in this path,
respectively. Make sure the folder is empty. Default is
|
summaryPrefix |
Prefix for demultiplex summary filename. Default is
|
overwrite |
Boolean indicating whether to overwrite the output directory. Default is FALSE. |
cores |
Number of cores used for parallelization. Default is
|
verbose |
Poolean indicating whether to print log messages. Useful for debugging. Default to FALSE. |
logfilePrefix |
Prefix for log file. Default is current date and time
in the format of |
A SingleCellExperiment object
containing the demultiplex summary information in the colData
slot.
# Demultiplex example FASTQ files
data(barcodeExample, package = "scruff")
fastqs <- list.files(system.file("extdata", package = "scruff"),
pattern = "\\.fastq\\.gz", full.names = TRUE)
de <- demultiplex(
project = "example",
experiment = c("1h1"),
lane = c("L001"),
read1Path = c(fastqs[1]),
read2Path = c(fastqs[2]),
barcodeExample,
bcStart = 1,
bcStop = 8,
umiStart = 9,
umiStop = 12,
keep = 75,
overwrite = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.