readFastq | R Documentation |
readFastq
reads all FASTQ-formated files in a directory
dirPath
whose file name matches pattern pattern
,
returning a compact internal representation of the sequences and
quality scores in the files. Methods read all files into a single R
object; a typical use is to restrict input to a single FASTQ file.
writeFastq
writes an object to a single file
, using
mode="w"
(the default) to create a new file or mode="a"
append to an existing file. Attempting to write to an existing file
with mode="w"
results in an error.
countFastq
counts the nubmer of records, nucleotides, and
base-level quality scores in one or several fastq files.
readFastq(dirPath, pattern=character(0), ...)
## S4 method for signature 'character'
readFastq(dirPath, pattern=character(0), ..., withIds=TRUE)
writeFastq(object, file, mode="w", full=FALSE, compress=TRUE, ...)
countFastq(dirPath, pattern=character(0), ...)
## S4 method for signature 'character'
countFastq(dirPath, pattern=character(0), ...)
dirPath |
A character vector (or other object; see methods defined on this generic) giving the directory path (relative or absolute) or single file name of FASTQ files to be read. |
pattern |
The ( |
object |
An object to be output in |
file |
A length 1 character vector providing a path to a file to the object is to be written to. |
mode |
A length 1 character vector equal to either ‘w’ or ‘a’ to write to a new file or append to an existing file, respectively. |
full |
A logical(1) indicating whether the identifier line should
be repeated |
compress |
A logical(1) indicating whether the file should be
gz-compressed. The default is |
... |
Additional arguments. In particular,
|
withIds |
|
The fastq format is not quite precisely defined. The basic definition used here parses the following four lines as a single record:
@HWI-EAS88_1_1_1_1001_499 GGACTTTGTAGGATACCCTCGCTTTCCTTCTCCTGT +HWI-EAS88_1_1_1_1001_499 ]]]]]]]]]]]]Y]Y]]]]]]]]]]]]VCHVMPLAS
The first and third lines are identifiers preceded by a specific
character (the identifiers are identical, in the case of Solexa). The
second line is an upper-case sequence of nucleotides. The parser
recognizes IUPAC-standard alphabet (hence ambiguous nucleotides),
coercing .
to -
to represent missing values. The final
line is an ASCII-encoded representation of quality scores, with one
ASCII character per nucleotide.
The encoding implicit in Solexa-derived fastq files is that each
character code corresponds to a score equal to the ASCII character
value minus 64 (e.g., ASCII @
is decimal 64, and corresponds to
a Solexa quality score of 0). This is different from BioPerl, for
instance, which recovers quality scores by subtracting 33 from the
ASCII character value (so that, for instance, !
, with decimal
value 33, encodes value 0).
The BioPerl description of fastq asserts that the first character of
line 4 is a !
, but the current parser does not support this
convention.
writeFastq
creates files following the specification outlined
above, using the IUPAC-standard alphabet (hence, sequences containing
‘.’ when read will be represented by ‘-’ when written).
readFastq
returns a single R object (e.g.,
ShortReadQ
) containing sequences and qualities
contained in all files in dirPath
matching
pattern
. There is no guarantee of order in which files are
read.
writeFastq
is invoked primarily for its side effect, creating
or appending to file file
. The function returns, invisibly, the
length of object
, and hence the number of records written.
countFastq
returns a data.frame with row names equal to the
base (file) name of the fastq file, and columns records
,
nucleotides
, and scores
, corresponding to tally of each
entity in each file. Parsing mistakes from poorly formmated files
result in an error.
Martin Morgan
The IUPAC alphabet in Biostrings.
http://www.bioperl.org/wiki/FASTQ_sequence_format for the BioPerl definition of fastq.
Solexa documentation 'Data analysis - documentation : Pipeline output and visualisation'.
methods(readFastq)
methods(writeFastq)
methods(countFastq)
sp <- SolexaPath(system.file('extdata', package='ShortRead'))
rfq <- readFastq(analysisPath(sp), pattern="s_1_sequence.txt")
sread(rfq)
id(rfq)
quality(rfq)
## SolexaPath method 'knows' where FASTQ files are placed
rfq1 <- readFastq(sp, pattern="s_1_sequence.txt")
rfq1
file <- tempfile()
writeFastq(rfq, file)
readLines(file, 8)
countFastq(file)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.