processReads: Process reads from High-Troughtput Sequencing experiments
In gthar/nucleR: Nucleosome positioning package for R

Description Usage Arguments Details Value Note Author(s) See Also Examples

This method allows the processment of NGS nucleosome reads from different sources and a basic manipulation of them. The tasks includes the correction of strand-specific single-end reads and the trimming of reads to a given length.

processReads(data, type = "single", fragmentLen, trim, ...)

## S4 method for signature 'AlignedRead'
processReads(data, type = "single", fragmentLen, trim,
  ...)

## S4 method for signature 'CompressedGRangesList'
processReads(data, type = "single",
  fragmentLen, trim, ...)

## S4 method for signature 'GRanges'
processReads(data, type = "single", fragmentLen, trim,
  ...)

## S4 method for signature 'RangedData'
processReads(data, type = "single", fragmentLen, trim,
  ...)

`data`	Sequence reads objects, probably imported using other packages as `ShortRead`. Allowed object types are ShortRead::AlignedRead and GenomicRanges::GRanges with a `strand` attribute.
`type`	Describes the type of reads. Values allowed are `single` for single-ended reads and `paired` for paired-ended.
`fragmentLen`	Expected original length of the sequenced fragments. See details.
`trim`	Length to trim the reads (or extend them if `trim` > read length)
`...`	Other parameters passed to `fragmentLenDetect` if no fixed `fragmentLen` is given.

This function reads a ShortRead::AlignedRead or a GenomicRanges::GRanges object containing the position, length and strand of the sequence reads.

It allows the processment of both paired and single ended reads. In the case of single end reads this function corrects the strand-specific mapping by shifting plus strand reads and minus strand reads towards a middle position where both strands are overlaped. This is done by accounting the expected fragment length (fragmentLen).

For paired end reads, mononucleosomal reads could extend more than expected length due to mapping issues or experimental conditions. In this case, the fragmentLen variable sets the threshold from which reads longer than it should be ignored.

If no value is supplied for fragmentLen it will be calculated automatically (increasing the computing time) using fragmentLenDetect with default parameters. Performance can be increased by tunning fragmentLenDetect parameteres in a separated call and passing its result as fragmentLen parameter.

In some cases, could be useful trim the reads to a shorter length to improve the detection of nucleosome dyads, easing its detection and automatic positioning. The parameter trim allows the selection of how many nucleotides select from each read.

A special case for single-ended data is setting the trim to the same value as fragmentLen, so the reads will be extended strand-wise towards the 3' direction, creating an artificial map comparable with paired-ended data. The same but opposite can be performed with paired-end data, setting a trim value equal to the read length from paired ended, so paired-ended data will look like single-ended.

GenomicRanges::GRanges containing the aligned/trimmed individual reads.

IMPORTANT: this information is only used to correct possible strand-specific mapping, this package doesn't link the two ends of paired reads.

Oscar Flores oflores@mmb.pcb.ub.es

ShortRead::AlignedRead, GenomicRanges::GRanges, fragmentLenDetect()

# Load data
data(nucleosome_htseq)

# Process nucleosome reads, select only those shorter than 200bp
pr1 <- processReads(nucleosome_htseq, fragmentLen=200)

# Now process them, but picking only the 40 bases surrounding the dyad
pr2 <- processReads(nucleosome_htseq, fragmentLen=200, trim=40)

# Compare the results:
library(ggplot2)
cov1 <- as.vector(coverage.rpm(pr1)[["chr1"]])
cov2 <- as.vector(coverage.rpm(pr2)[["chr1"]])
plot_data <- rbind(
    data.frame(x=seq_along(cov1), y=cov1, coverage="original"),
    data.frame(x=seq_along(cov2), y=cov2, coverage="trimmed")
)
qplot(x=x, y=y, geom="line", data=plot_data, xlab="position",
  ylab="coverage") + facet_grid(coverage~.)