Description Usage Arguments Details Value Author(s) See Also Examples
Import files containing aligned reads into an internal representation of the alignments, sequences, and quality scores. Most methods (see ‘details’ for exceptions) read all files into a single R object.
1 | readAligned(dirPath, pattern=character(0), ...)
|
dirPath |
A character vector (or other object; see methods defined on this generic) giving the directory path (relative or absolute; some methods also accept a character vector of file names) of aligned read files to be input. |
pattern |
The ( |
... |
Additional arguments, used by methods. When |
There is no standard aligned read file format; methods parse particular file types.
The readAligned,character-method
interprets file types based
on an additional type
argument. Supported types are:
type="SolexaExport"
This type parses .*_export.txt
files following the
documentation in the Solexa Genome Alignment software manual,
version 0.3.0. These files consist of the following columns;
consult Solexa documentation for precise descriptions. If parsed,
values can be retrieved from AlignedRead
as
follows:
see below
stored in alignData
stored in alignData
stored in alignData
stored in alignData
stored in alignData
see below
see below
sread
quality
chromosome
alignData
position
strand
Ignored
alignQuality
Ignored
Ignored
Ignored
Ignored
Ignored
alignData
The following optional arguments, set to FALSE
by default,
influence data input
When TRUE
, include the
multiplex index as a column multiplexIndex
in
alignData
.
When TRUE
, include the paired
read number as a column pairedReadNumber
in
alignData
.
When TRUE
, construct an identifier string
as
‘Machine_Run:Lane:Tile:X:Y#multiplexIndex/pairedReadNumber’. The
substrings ‘#multiplexIndex’ and
‘/pairedReadNumber’ are not present if
withMultiplexIndex=FALSE
or
withPairedReadNumber=FALSE
.
A convencience which, when TRUE
, sets all
with*
values to TRUE
.
Note that not all paired read columns are interpreted. Different
interfaces to reading alignment files are described in
SolexaPath
and
SolexaSet
.
type="SolexaPrealign"
See SolexaRealign
type="SolexaAlign"
See SolexaRealign
type="SolexaRealign"
These types parse s_L_TTTT_prealign.txt
,
s_L_TTTT_align.txt
or s_L_TTTT_realign.txt
files
produced by default and eland analyses. From the Solexa
documentation, align
corresponds to unfiltered first-pass
alignments, prealign
adjusts alignments for error rates
(when available), realign
filters alignments to exclude
clusters failing to pass quality criteria.
Because base quality scores are not stored with alignments, the
object returned by readAligned
scores all base qualities as
-32
.
If parsed, values can be retrieved from
AlignedRead
as follows:
stored in sread
stored in alignQuality
stored in alignData
stored in position
stored in strand
Ignored; parse using
readXStringColumns
stored in alignData
type="SolexaResult"
This parses s_L_eland_results.txt
files, an intermediate
format that does not contain read or alignment quality
scores.
Because base quality scores are not stored with alignments, the
object returned by readAligned
scores all base qualities as
-32
.
Columns of this file type can be retrieved from
AlignedRead
as follows (description of
columns is from Table 19, Genome Analyzer Pipeline Software User
Guide, Revision A, January 2008):
Not parsed
stored in sread
Stored in alignData
as
matchCode
. Codes are (from the Eland manual): NM (no
match); QC (no match due to quality control failure); RM (no
match due to repeat masking); U0 (best match was unique and
exact); U1 (best match was unique, with 1 mismatch); U2 (best
match was unique, with 2 mismatches); R0 (multiple exact
matches found); R1 (multiple 1 mismatch matches found, no
exact matches); R2 (multiple 2 mismatch matches found, no
exact or 1-mismatch matches).
stored in alignData
as
nExactMatch
stored in alignData
as nOneMismatch
stored in alignData
as nTwoMismatch
stored in chromosome
stored in position
(direction of match) stored in strand
stored in alignData
, as
NCharacterTreatment
. ‘.’ indicates treatment of
‘N’ was not applicable; ‘D’ indicates treatment
as deletion; ‘|’ indicates treatment as insertion
stored in alignData
as
mismatchDetailOne
and mismatchDetailTwo
. Present
only for unique inexact matches at one or two
positions. Position and type of first substitution error,
e.g., 11A represents 11 matches with 12th base an A in
reference but not read. The reference manual cited below lists
only one field (mismatchDetailOne
), but two are present
in files seen in the wild.
type="MAQMap", records=-1L
Parse binary map
files produced by MAQ. See details in the next section. The
records
option determines how many lines are read;
-1L
(the default) means that all records are input. For
type="MAQMap"
, dir
and pattern
must match a
single file.
type="MAQMapShort", records=-1L
The same as
type="MAQMap"
but for map files made with Maq prior to
version 0.7.0. (These files use a different maximum read length
[64 instead of 128], and are hence incompatible with newer Maq map
files.). For type="MAQMapShort"
, dir
and
pattern
must match a single file.
type="MAQMapview"
Parse alignment files created by MAQ's ‘mapiew’ command. Interpretation of columns is based on the description in the MAQ manual, specifically
...each line consists of read name, chromosome, position, strand, insert size from the outer coordinates of a pair, paired flag, mapping quality, single-end mapping quality, alternative mapping quality, number of mismatches of the best hit, sum of qualities of mismatched bases of the best hit, number of 0-mismatch hits of the first 24bp, number of 1-mismatch hits of the first 24bp on the reference, length of the read, read sequence and its quality.
The read name, read sequence, and quality are read as
XStringSet
objects. Chromosome and strand are read as
factor
s. Position is numeric
, while mapping quality is
numeric
. These fields are mapped to their corresponding
representation in AlignedRead
objects.
Number of mismatches of the best hit, sum of qualities of mismatched
bases of the best hit, number of 0-mismatch hits of the first 24bp,
number of 1-mismatch hits of the first 24bp are represented in the
AlignedRead
object as components of alignData
.
Remaining fields are currently ignored.
type="Bowtie"
Parse alignment files created with the Bowtie alignment
algorithm. Parsed columns can be retrieved from
AlignedRead
as follows:
id
strand
chromosome
position
; see comment below
sread
; see comment below
quality
; see comments below
alignData
, ‘similar’
column; Bowtie v. 0.9.9.3 (12 May, 2009) documents this as
the number of other instances where the same read aligns against the
same reference characters as were aligned against in this
alignment. Previous versions marked this as ‘Reserved’
alignData
‘mismatch’, column
NOTE: the default quality encoding changes to FastqQuality
with ShortRead version 1.3.24.
This method includes the argument qualityType
to specify
how quality scores are encoded. Bowtie quality scores are
‘Phred’-like by default, with
qualityType='FastqQuality'
, but can be specified as
‘Solexa’-like, with qualityType='SFastqQuality'
.
Bowtie outputs positions that are 0-offset from the left-most end
of the +
strand. ShortRead
parses position
information to be 1-offset from the left-most end of the +
strand.
Bowtie outputs reads aligned to the -
strand as their
reverse complement, and reverses the quality score string of these
reads. ShortRead
parses these to their original sequence
and orientation.
type="SOAP"
Parse alignment files created with the SOAP alignment
algorithm. Parsed columns can be retrieved from
AlignedRead
as follows:
id
sread
; see comment below
quality
; see comment below
alignData
alignData
(pairedEnd
)
alignData
(alignedLength
)
strand
chromosome
position
; see comment below
alignData
(typeOfHit
: integer
portion; hitDetail
: text portion)
This method includes the argument qualityType
to specify
how quality scores are encoded. It is unclear from SOAP
documentation what the quality score is; the default is
‘Solexa’-like, with qualityType='SFastqQuality'
, but
can be specified as ‘Phred’-like, with
qualityType='FastqQuality'
.
SOAP outputs positions that are 1-offset from the left-most end of
the +
strand. ShortRead
preserves this
representation.
SOAP reads aligned to the -
strand are reported by SOAP as
their reverse complement, with the quality string of these reads
reversed. ShortRead
parses these to their original sequence
and orientation.
A single R object (e.g., AlignedRead
) containing
alignments, sequences and qualities of all files in dirPath
matching pattern
. There is no guarantee of order in which files
are read.
Martin Morgan <mtmorgan@fhcrc.org>, Simon Anders <anders@ebi.ac.uk> (MAQ map)
The AlignedRead
class.
Genome Analyzer Pipeline Software User Guide, Revision A, January 2008.
The MAQ reference manual, http://maq.sourceforge.net/maq-manpage.shtml#5, 3 May, 2008.
The Bowtie reference manual, http://bowtie-bio.sourceforge.net, 28 October, 2008.
The SOAP reference manual, http://soap.genomics.org.cn/soap1, 16 December, 2008.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | sp <- SolexaPath(system.file("extdata", package="ShortRead"))
ap <- analysisPath(sp)
## ELAND_EXTENDED
(aln0 <- readAligned(ap, "s_2_export.txt", "SolexaExport"))
## PhageAlign
(aln1 <- readAligned(ap, "s_5_.*_realign.txt", "SolexaRealign"))
## MAQ
dirPath <- system.file('extdata', 'maq', package='ShortRead')
list.files(dirPath)
## First line
readLines(list.files(dirPath, full.names=TRUE)[[1]], 1)
countLines(dirPath)
## two files collapse into one
(aln2 <- readAligned(dirPath, type="MAQMapview"))
## select only chr1-5.fa, '+' strand
filt <- compose(chromosomeFilter("chr[1-5].fa"),
strandFilter("+"))
(aln3 <- readAligned(sp, "s_2_export.txt", filter=filt))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.