In JMF47/recountNNLS: Transcript abundance linear modeling from compressed coverage information

Overview

This package provides an interface to estimate transcript abundances of any samples quantified by the aligner Rail-RNA. This method is a non-negative least squares (NNLS) estimation that infers the number of reads that originated from each transcript of the coding portion of the GencodeV25 transcriptome. The model does not require raw aligned BAM files, but is content with compressed coverage statistics (coverage of the genome at a basepair level and across annotated junctions) primarily stored in bigwig formats.

The more than 70,000 samples compiled by recount2 already have transcript expression pre-computed by this method and is directly accessible. To replicate the abundance estimation of any SRA project in recount2, the user only needs to supply the SRA project id. Otherwise, users can supply the necessary information as outlined further below to utilize this package to carry out transcript abundance estimation.

Accessing quantified estimates for samples in recount2

For the projects currently curated in recount2, to access quantified transcript abundances use:

project = 'DRP000366'
path = getRseTx(project, download_path=paste0(getwd(), '/rse_tx.RData'))
load(path)

Quantifying samples on recount2

To re-run the NNLS model on the samples in recount2 follow:

library(recountNNLS)

## Specify a SRA project and download the relevant path data from recount2
project = 'SRP063581'
pheno = processPheno(project)

## Main NNLS workhorse function to create a RSE of transcript abundance
rse_tx = recountNNLS(pheno)

Data not yet part of recount2

If not quantifying the transcript expression of a project already compiled by recount2, but samples have been aligned with Rail-RNA to the GRCh38 assembly, then one can quantify transcript expression with these 2 pieces of information:

A manifest of 5 columns is required. The columns are: project name (project); sample id (run); path to bigwig file resulting from Rail-RNA alignment (bigwit_path); the read length of the sample (rls); and whether the sample is paired end (paired_end).
A path to the junction table created by Rail-RNA. It is usually located in the folder cross_sample_results/junctions.tsv.gz.

base='/dcl01/leek/data/ta_poc/geuvadis/simulation/37_1/rail/rail-rna_out/'
manifest = data.frame(project = "test", run = c("sample_01", "sample_02", "sample_03", "sample_04"), 
                      bigwig_path = c('coverage_bigwigs/sample_01.bw', 'coverage_bigwigs/sample_02.bw', 
                               'coverage_bigwigs/sample_03.bw', 'coverage_bigwigs/sample_04.bw'),
                      rls = c(37, 37, 37, 37), paired_end=FALSE)
junction_path =  'cross_sample_results/junctions.tsv.gz'

manifest
junction_path

To obtain the necessary inputs for the linear model, one would apply:

library(recountNNLS)
## Analyze the manifest input to format for downstream use
pheno = processPheno(manifest)

## Main NNLS workhorse function to create a RSE of transcript abundance
rse_tx = recountNNLS(pheno, jx_file=junction_path, cores=1)

Deliverable

The output of recountNNLS() is a RangedSummarizedExperiment where:

Each row corresponds to a transcript
Each column corresponds to a run (sample)
rowRanges() can be used to access a GRangesList annotation of the transcripts
assays() returns a list of length 2 that can be access with $
- $fragments returns a matrix of estimated transcript abundance on a number of reads scale
- $se returns a matrix of estimated standard error of the abundance
- $score returns a matrix of estimated scores
- $df returns a matrix of degrees of freedom for statistical inference
colData() returns a DataFrame that contains the metadata for this project obtained from SRA and processed by processPheno().
rowData() returns a DataFrame containing information on transcript names, gene, names, transcript lengths, and number of exons in transcript.
colnames() returns the sample ids (runs) in the project, and corresponds to each column of the assays() matrices.
rownames() returns the transcript ids, and corresponds to each row of the assays() matrices.

JMF47/recountNNLS documentation built on May 28, 2019, 12:42 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

JMF47/recountNNLS
Transcript abundance linear modeling from compressed coverage information

In JMF47/recountNNLS: Transcript abundance linear modeling from compressed coverage information

Overview

Accessing quantified estimates for samples in recount2

Quantifying samples on recount2

Data not yet part of recount2

Deliverable

R Package Documentation

Browse R Packages

We want your feedback!

JMF47/recountNNLS Transcript abundance linear modeling from compressed coverage information

In JMF47/recountNNLS: Transcript abundance linear modeling from compressed coverage information

Overview

Accessing quantified estimates for samples in recount2

Quantifying samples on recount2

Data not yet part of recount2

Deliverable

R Package Documentation

Browse R Packages

We want your feedback!

JMF47/recountNNLS
Transcript abundance linear modeling from compressed coverage information