This package provides an interface to estimate transcript abundances of any samples quantified by the aligner Rail-RNA. This method is a non-negative least squares (NNLS) estimation that infers the number of reads that originated from each transcript of the coding portion of the GencodeV25 transcriptome. The model does not require raw aligned BAM
files, but is content with compressed coverage statistics (coverage of the genome at a basepair level and across annotated junctions) primarily stored in bigwig
formats.
The more than 70,000 samples compiled by recount2 already have transcript expression pre-computed by this method and is directly accessible. To replicate the abundance estimation of any SRA project in recount2, the user only needs to supply the SRA project id. Otherwise, users can supply the necessary information as outlined further below to utilize this package to carry out transcript abundance estimation.
For the projects currently curated in recount2, to access quantified transcript abundances use:
project = 'DRP000366' path = getRseTx(project, download_path=paste0(getwd(), '/rse_tx.RData')) load(path)
To re-run the NNLS model on the samples in recount2 follow:
library(recountNNLS) ## Specify a SRA project and download the relevant path data from recount2 project = 'SRP063581' pheno = processPheno(project) ## Main NNLS workhorse function to create a RSE of transcript abundance rse_tx = recountNNLS(pheno)
If not quantifying the transcript expression of a project already compiled by recount2, but samples have been aligned with Rail-RNA to the GRCh38 assembly, then one can quantify transcript expression with these 2 pieces of information:
cross_sample_results/junctions.tsv.gz
.base='/dcl01/leek/data/ta_poc/geuvadis/simulation/37_1/rail/rail-rna_out/' manifest = data.frame(project = "test", run = c("sample_01", "sample_02", "sample_03", "sample_04"), bigwig_path = c('coverage_bigwigs/sample_01.bw', 'coverage_bigwigs/sample_02.bw', 'coverage_bigwigs/sample_03.bw', 'coverage_bigwigs/sample_04.bw'), rls = c(37, 37, 37, 37), paired_end=FALSE) junction_path = 'cross_sample_results/junctions.tsv.gz'
manifest junction_path
To obtain the necessary inputs for the linear model, one would apply:
library(recountNNLS) ## Analyze the manifest input to format for downstream use pheno = processPheno(manifest) ## Main NNLS workhorse function to create a RSE of transcript abundance rse_tx = recountNNLS(pheno, jx_file=junction_path, cores=1)
The output of recountNNLS()
is a RangedSummarizedExperiment
where:
rowRanges()
can be used to access a GRangesList
annotation of the transcriptsassays()
returns a list of length 2 that can be access with $
$fragments
returns a matrix
of estimated transcript abundance on a number of reads scale$se
returns a matrix
of estimated standard error of the abundance$score
returns a matrix
of estimated scores$df
returns a matrix
of degrees of freedom for statistical inferencecolData()
returns a DataFrame
that contains the metadata for this project obtained from SRA and processed by processPheno()
.rowData()
returns a DataFrame
containing information on transcript names, gene, names, transcript lengths, and number of exons in transcript.colnames()
returns the sample ids (runs) in the project, and corresponds to each column of the assays()
matrices.rownames()
returns the transcript ids, and corresponds to each row of the assays()
matrices.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.