learnErrors: Learns the error rates from an input list, or vector, of file...

View source: R/errorModels.R

learnErrorsR Documentation

Learns the error rates from an input list, or vector, of file names or a list of derep-class objects.

Description

Error rates are learned by alternating between sample inference and error rate estimation until convergence. Sample inferences is performed by the dada function. Error rate estimation is performed by errorEstimationFunction. The output of this function serves as input to the dada function call as the err parameter.

Usage

learnErrors(
  fls,
  nbases = 1e+08,
  nreads = NULL,
  errorEstimationFunction = loessErrfun,
  multithread = FALSE,
  randomize = FALSE,
  MAX_CONSIST = 10,
  OMEGA_C = 0,
  qualityType = "Auto",
  verbose = FALSE,
  ...
)

Arguments

fls

(Required). character. The file path(s) to the fastq file(s), or a directory containing fastq file(s). Compressed file formats such as .fastq.gz and .fastq.bz2 are supported. A list of derep-class ojects can also be provided.

nbases

(Optional). Default 1e8. The minimum number of total bases to use for error rate learning. Samples are read into memory until at least this number of total bases has been reached, or all provided samples have been read in.

nreads

(Optional). Default NULL. DEPRECATED. Please update your code to use the nbases parameter.

errorEstimationFunction

(Optional). Function. Default loessErrfun.

errorEstimationFunction is computed on the matrix of observed transitions after each sample inference step in order to generate the new matrix of estimated error rates.

multithread

(Optional). Default is FALSE. If TRUE, multithreading is enabled and the number of available threads is automatically determined. If an integer is provided, the number of threads to use is set by passing the argument on to setThreadOptions.

randomize

(Optional). Default FALSE. If FALSE, samples are read in the provided order until enough reads are obtained. If TRUE, samples are picked at random from those provided.

MAX_CONSIST

(Optional). Default 10. The maximum number of times to step through the self-consistency loop. If convergence was not reached in MAX_CONSIST steps, the estimated error rates in the last step are returned.

OMEGA_C

(Optional). Default 0. The threshold at which unique sequences inferred to contain errors are corrected in the final output, and used to estimate the error rates (see more at setDadaOpt). For reasons of convergence, and because it is more conservative, it is recommended to set this value to 0, which means that all reads are counted and contribute to estimating the error rates.

qualityType

(Optional). character(1). The quality encoding of the fastq file(s). "Auto" (the default) means to attempt to auto-detect the encoding. This may fail for PacBio files with uniformly high quality scores, in which case use "FastqQuality". This parameter is passed on to readFastq; see information there for details.

verbose

(Optional). Default TRUE Print verbose text output. More fine-grained control is available by providing an integer argument.

  • 0: Silence. No text output (same as FALSE).

  • 1: Basic text output (same as TRUE).

  • 2: Detailed text output, mostly intended for debugging.

...

(Optional). Additional arguments will be passed on to the dada function.

Value

A named list with three entries: $err_out: A numeric matrix with the learned error rates. $err_in: The initialization error rates (unimportant). $trans: A feature table of observed transitions for each type (eg. A->C) and quality score.

See Also

derepFastq, plotErrors, loessErrfun, dada

Examples

 fl1 <- system.file("extdata", "sam1F.fastq.gz", package="dada2")
 fl2 <- system.file("extdata", "sam2F.fastq.gz", package="dada2")
 err <- learnErrors(c(fl1, fl2))
 err <- learnErrors(c(fl1, fl2), nbases=5000000, randomize=TRUE)
 # Using a list of derep-class objects
 dereps <- derepFastq(c(fl1, fl2))
 err <- learnErrors(dereps, multithread=TRUE, randomize=TRUE, MAX_CONSIST=20)


benjjneb/dada2 documentation built on Jan. 12, 2025, 10:03 a.m.