featuretypeCounts: Plot read distribution across genomic features
In tgirke/systemPipeR: systemPipeR: Workflow Environment for Data Analysis and Report Generation

featuretypeCounts

R Documentation

Plot read distribution across genomic features

Description

Counts how many reads in short read alignment files (BAM format) overlap with entire annotation categories. This utility is useful for analyzing the distribution of the read mappings across feature types, e.g. coding versus non-coding genes. By default the read counts are reported for the sense and antisense strand of each feature type separately. To minimize memory consumption, the BAM files are processed in a stream using utilities from the Rsamtools and GenomicAlignment packages. The counts can be reported for each read length separately or as a single value for reads of any length. Subsequently, the counting results can be plotted with the associated plotfeaturetypeCounts function.

Usage

featuretypeCounts(bfl, grl, singleEnd = TRUE, readlength = NULL, type = "data.frame")

Arguments

`bfl`	`BamFileList` object containing the paths to the target BAM files stored on disk. Note, memory-efficient processing is achieved by streaming through BAM files rather than reading entire files into memory at once. The maximum number of alignments to process in each iteration is determined by the `yieldSize` value passed on to the `BamFileList` function. For details see `?BamFileList`.
`grl`	`GRangesList` object containing in each list component the range set of a feature type. Typically, this object is generated with the `genFeatures` function. For details see the example section below and `?genFeatures`.
`singleEnd`	Specifies whether the targets BAM files contain alignments for single-end (SE) or paired-end read data. `TRUE` is for SE and `FALSE` for PE data.
`readlength`	Integer vector specifying the read length values for which to report counts separately. If `readlength=NULL` the length of the reads will be ignored resulting in a single value for each feature type and strand. Note, for PE data the two reads in a pair may differ in length. In those cases the length of the two reads is averaged and then assigned to the corresponding length category after rounding the mean length to the closest integer. This is not an ideal solution but a reasonable compromise for the purpose of the summary statistics generated by `featuretypeCounts`.
`type`	Determines whether the results are returned as `data.frame` (`type="data.frame"`) or as `list` (`type="list"`). Each list component contains the counting results for one BAM file and is named after the corresponding sample. The `data.frame` result contains this sample assignment information in a separate column.

Value

The results are returned as data.frame or list of data.frames. For details see above under types argument. The result data.frames contain the following columns in the given order:

`SampleName`	Sample names obtained from `BamFileList` object.
`Strand`	Sense or antisense strand of read mappings.
`Featuretype`	Name of feature type provided by `GRangesList` object. Note, the total number of aligned reads is reported under the special feature type 'N_total_aligned'. This value is useful for scaling/normalization purposes in plots, e.g. counts per million reads.
`Featuretypelength`	Total genomic length of each reduced feature type in bases. This value is useful to normalize the read counts by genomic length units, e.g. in plots.
`Subsequent columns`	Counts for reads of any length or for individual read lengths.

Author(s)

Thomas Girke

Examples

## Construct SYSargs2 object from param and targets files 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
args <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
args <- renderWF(args, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
args

## Not run: 
## Run alignments
args <- runCommandline(args, dir = FALSE, make_bam = TRUE)
outpaths <- subsetWF(args, slot = "output", subset = 1, index = 1)

## Features from sample data of systemPipeRdata package
library(txdbmaker)
file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis")
feat <- genFeatures(txdb, featuretype="all", reduce_ranges=TRUE, upstream=1000, downstream=0, verbose=TRUE)

## Generate and plot feature counts for specific read lengths
fc <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=c(74:76,99:102), type="data.frame")
p <- plotfeaturetypeCounts(x=fc, graphicsfile="featureCounts.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=FALSE)

## Generate and plot feature counts for any read length  
fc2 <- featuretypeCounts(bfl=BamFileList(outpaths, yieldSize=50000), grl=feat, singleEnd=TRUE, readlength=NULL, type="data.frame")
p2 <- plotfeaturetypeCounts(x=fc2, graphicsfile="featureCounts2.pdf", graphicsformat="pdf", scales="fixed", anyreadlength=TRUE)

## End(Not run)

tgirke/systemPipeR documentation built on Sept. 24, 2024, 9:48 a.m.

tgirke/systemPipeR index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

tgirke/systemPipeR
systemPipeR: Workflow Environment for Data Analysis and Report Generation

featuretypeCounts: Plot read distribution across genomic features
In tgirke/systemPipeR: systemPipeR: Workflow Environment for Data Analysis and Report Generation

Plot read distribution across genomic features

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Related to featuretypeCounts in tgirke/systemPipeR...

R Package Documentation

Browse R Packages

We want your feedback!

tgirke/systemPipeR systemPipeR: Workflow Environment for Data Analysis and Report Generation

featuretypeCounts: Plot read distribution across genomic features In tgirke/systemPipeR: systemPipeR: Workflow Environment for Data Analysis and Report Generation

Plot read distribution across genomic features

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Related to featuretypeCounts in tgirke/systemPipeR...

R Package Documentation

Browse R Packages

We want your feedback!

tgirke/systemPipeR
systemPipeR: Workflow Environment for Data Analysis and Report Generation

featuretypeCounts: Plot read distribution across genomic features
In tgirke/systemPipeR: systemPipeR: Workflow Environment for Data Analysis and Report Generation