Let us illustrate how to perform transcriptomic data analysis using data from TCGA project. We have uploaded to the opal server a resource called tcga_liver
whose URL is http://duffel.rail.bio/recount/TCGA/rse_gene_liver.Rdata which is available through the recount project. This resource contains the RangeSummarizedExperiment
with the RNAseq profiling of liver cancer data from TCGA. Next, we illustrate how a differential expression analysis to compare RNAseq profiling of women vs men (variable gdc_cases.demographic.gender
). The DGE analysis is normally performed using r Biocpkg("limma")
package. In that case, as we are analyzing RNA-seq data, limma + voom
method will be required.
Let us start by creating the connection to the opal server:
builder <- newDSLoginBuilder() builder$append(server = "study1", url = "https://opal-demo.obiba.org", user = "dsuser", password = "P@ssw0rd", resource = "RSRC.tcga_liver", profile = "omics") logindata <- builder$build() conns <- datashield.login(logins = logindata, assign = TRUE, symbol = "res")
Then, let us coerce the resource to a RangedSummarizedExperiment
which is the type of object that is available in the recount project.
datashield.assign.expr(conns, symbol = "rse", expr = quote(as.resource.object(res))) ds.class("rse")
The number of features and samples can be inspected by
ds.dim("rse")
And the names of the features using the same function used in the case of analyzing an ExpressionSet
name.features <- ds.featureNames("rse") lapply(name.features, head)
Also the covariate names can be inspected by
name.vars <- ds.featureData("rse") lapply(name.vars, head, n=15)
We can visualize the levels of the variable having gender information that will be our condition (i.e., we are interested in obtaining genes that are differentially expressed between males and females)
ds.table1D("rse$gdc_cases.demographic.gender")
We have implemented a function called ds.RNAseqPreproc()
to perform RNAseq data pre-processing that includes:
ds.RNAseqPreproc('rse', group= 'gdc_cases.demographic.gender', newobj.name = 'rse.pre')
Note that it is recommended to indicate the grouping variable (i.e., condition). Once data have been pre-processed, we can perform differential expression analysis. Notice how dimensions have changed given the fact that we have removed genes with low expression which are expected to do not be differentially expressed.
ds.dim('rse') ds.dim('rse.pre')
The differential expression analysis is ´dsOmicsClient/dsOmics´ is implemented in the funcion ds.limma()
. This functions runs a limma-pipeline for microarray data and for RNAseq data allows:
We recommend to use the voom + limma
pipeline proposed here given its versatility and that limma
is much faster than DESeq2
and edgeR
. By default, the function consider that data are obtained from a microarray experiment (type.data = "microarray"
). Therefore, as we are analyzing RNAseq data, we much indicate that type.data = "RNAse"
ans.gender <- ds.limma(model = ~ gdc_cases.demographic.gender, Set = "rse.pre", type.data = "RNAseq")
The top differentially expressed genes can be visualized by:
ans.gender
We can verify whether the distribution of the observed p-values are the ones we expect in this type of analyses
hist(ans.gender$study1$P.Value, xlab="Raw p-value gender effect", main="", las=1, cex.lab=1.5, cex.axis=1.2, col="gray")
We can also check whether there is inflation just executing
qqplot(ans.gender$study1$P.Value)
So, in that case, the model needs to remove unwanted variability ($\lambda>2$). If so, we can use surrogate variable analysis just changing the argument sva=TRUE
ans.gender.sva <- ds.limma(model = ~ gdc_cases.demographic.gender, Set = "rse.pre", type.data = "RNAseq", sva = TRUE)
Now the inflation has dramatically been reduced ($\lambda>1.12$)
qqplot(ans.gender.sva$study1$P.Value)
We can add annotation to the output that is available in our RSE object. We can have access to this information by
ds.fvarLabels('rse.pre')
So, we can run
ans.gender.sva <- ds.limma(model = ~ gdc_cases.demographic.gender, Set = "rse.pre", type.data = "RNAseq", sva = TRUE, annotCols = c("chromosome"))
The results are:
ans.gender.sva
The function has another arguments that can be used to fit other type of models:
lmFit
)eBayes
)voom
transformation (default "none")voomQualityWeights
function be used instead of voom
? (default FALSE)We have also implemented two other functions ds.DESeq2
and ds.edgeR
that perform DGE analysis using DESeq2 and edgeR methods. This is the R code used to that purpose:
To be supplied
We close the DataSHIELD session by:
datashield.logout(conns)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.