scruff | R Documentation |
Run the scruff
pipeline. This function performs all
demultiplex
, alignRsubread
, and countUMI
functions.
Write demultiplex statistics, alignment statistics, and UMI filtered count
matrix in output directories. Return a SingleCellExperiment object
containing the count matrix, cell and gene annotations, and all QC metrics.
scruff(
project = paste0("project_", Sys.Date()),
experiment,
lane,
read1Path,
read2Path,
bc,
index,
reference,
bcStart,
bcStop,
bcEdit = 0,
umiStart,
umiStop,
umiEdit = 0,
keep,
cellPerWell = 1,
unique = FALSE,
nBestLocations = 1,
minQual = 10,
yieldReads = 1e+06,
alignmentFileFormat = "BAM",
demultiplexOutDir = "./Demultiplex",
alignmentOutDir = "./Alignment",
countUmiOutDir = "./Count",
demultiplexSummaryPrefix = "demultiplex",
alignmentSummaryPrefix = "alignment",
countPrefix = "countUMI",
logfilePrefix = format(Sys.time(), "%Y%m%d_%H%M%S"),
overwrite = FALSE,
verbose = FALSE,
cores = max(1, parallelly::availableCores() - 2),
threads = 1,
...
)
project |
The project name. Default is
|
experiment |
A character vector of experiment names. Represents the
group label for each FASTQ file, e.g. "patient1, patient2, ...". The number
of cells in a experiment equals the length of cell barcodes |
lane |
A character or character vector of flow cell lane numbers. If FASTQ files from multiple lanes are concatenated, any placeholder would be sufficient, e.g. "L001". |
read1Path |
A character vector of file paths to the read1 FASTQ files. These are the read files with UMI and cell barcode information. |
read2Path |
A character vector of file paths to the read2 FASTQ files. These read files contain genomic sequences. |
bc |
A vector of pre-determined cell barcodes. For example, see
|
index |
Path to the |
reference |
Path to the reference GTF file. The TxDb object of the GTF file will be generated and saved in the current working directory with ".sqlite" suffix. |
bcStart |
Integer or vector of integers containing the cell barcode start positions (inclusive, one-based numbering). |
bcStop |
Integer or vector of integers containing the cell barcode stop positions (inclusive, one-based numbering). |
bcEdit |
Maximally allowed Hamming distance for barcode correction. Barcodes with mismatches equal or fewer than this will be assigned a corrected barcode if the inferred barcode matches uniquely in the provided predetermined barcode list. Default is 0, meaning no cell barcode correction is performed. |
umiStart |
Integer or vector of integers containing the start positions (inclusive, one-based numbering) of UMI sequences. |
umiStop |
Integer or vector of integers containing the stop positions (inclusive, one-based numbering) of UMI sequences. |
umiEdit |
Maximally allowed Hamming distance for UMI correction. For
read alignments in each gene, by comparing to a more abundant UMI with more
reads, UMIs having fewer reads and with mismatches equal or fewer than
|
keep |
Read trimming. Read length or number of nucleotides to keep for read 2 (the read that contains transcript sequence information). Longer reads will be clipped at 3' end. Shorter reads will not be affected. This number should be determined based on the sequencing kit that was used in library preparation step. |
cellPerWell |
Number of cells per well. Can be an integer (e.g. 1) indicating the number of cells in each well or an vector with length equal to the total number of cells in the input alignment files specifying the number of cells in each file. Default is 1. |
unique |
Argument passed to |
nBestLocations |
Argument passed to |
minQual |
Minimally acceptable Phred quality score for cell barcode and UMI sequences. Phread quality scores are calculated for each nucleotide in these tags. Tags with at least one nucleotide with score lower than this will be filtered out. Default is 10. |
yieldReads |
The number of reads to yield when drawing successive
subsets from a fastq file, providing the number of successive records to be
returned on each yield. This parameter is passed to the |
alignmentFileFormat |
File format of sequence alignment results. "BAM" or "SAM". Default is "BAM". |
demultiplexOutDir |
Output folder path for demultiplex results.
Demultiplexed cell specifc FASTQ files will be stored in folders in this
path, respectively. Make sure the folder is empty. Default is
|
alignmentOutDir |
Output directory for alignment results. Sequence
alignment maps will be stored in folders in this directory, respectively.
Make sure the folder is empty. Default is |
countUmiOutDir |
Output directory for UMI counting results. UMI
filtered count matrix will be stored in this directory. Default is
|
demultiplexSummaryPrefix |
Prefix for demultiplex summary filename.
Default is |
alignmentSummaryPrefix |
Prefix for alignment summary filename. Default
is |
countPrefix |
Prefix for UMI filtered count matrix filename. Default is
|
logfilePrefix |
Prefix for log file. Default is current date and time
in the format of |
overwrite |
Boolean indicating whether to overwrite the output directory. Default is FALSE. |
verbose |
Boolean indicating whether to print log messages. Useful for debugging. Default to FALSE. |
cores |
Number of cores to use for parallelization. Default is
|
threads |
Do not change. Number of threads/CPUs used for
mapping for each core. Refer to |
... |
Additional arguments passed to the |
A SingleCellExperiment
object.
## Not run:
# prepare required files
data(barcodeExample, package = "scruff")
fastqs <- list.files(system.file("extdata", package = "scruff"),
pattern = "\\.fastq\\.gz", full.names = TRUE)
fasta <- system.file("extdata", "GRCm38_MT.fa", package = "scruff")
gtf <- system.file("extdata", "GRCm38_MT.gtf", package = "scruff")
library(Rsubread)
# Specify the basename for Rsubread index
indexBase <- "GRCm38_MT"
# Create index files for GRCm38_MT.
buildindex(basename = indexBase, reference = fasta, indexSplit = FALSE)
# run scruff pipeline
sce <- scruff(project = "example",
experiment = c("1h1"),
lane = c("L001"),
read1Path = c(fastqs[1]),
read2Path = c(fastqs[2]),
bc = barcodeExample,
index = indexBase,
reference = gtf,
bcStart = 1,
bcStop = 8,
umiStart = 9,
umiStop = 12,
keep = 75,
cellPerWell = c(rep(1, 46), 0, 0),
overwrite = TRUE,
verbose = TRUE)
## End(Not run)
# or use the built-in SingleCellExperiment object generated using
# example dataset (see ?sceExample)
data(sceExample, package = "scruff")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.