The goal of yarn is to expedite large RNA-seq analyses using a combination of previously developed tools. Yarn is meant to make it easier for the user to perform accurate comparison of conditions by leveraging many Bioconductor tools and various statistical and normalization techniques while accounting for the large heterogeneity and sparsity found in very large RNA-seq experiments.
You can install yarn from github with:
source("http://bioconductor.org/biocLite.R") biocLite("yarn")
If you're here to grab the GTEx version 6.0 data then look no further than this gist that uses yarn to download all the data and preprocess it for you.
Below are a few of the functions we can use to preprocess a large RNA-seq experiment. We follow a particular procedure where we:
We will make use of the skin
dataset for examples. The skin
dataset is a small sample of the full GTEx data that can be downloaded using the downloadGTEx
function. The skin
dataset looks like this:
library(yarn) data(skin)
skin
This is a basic workflow. Details will be fleshed out:
library(yarn)
For computational reasons we load the sample skin data instead of having the user download the
library(yarn) data(skin)
checkMisAnnotation(skin,"GENDER",controlGenes="Y",legendPosition="topleft")
checkTissuesToMerge(skin,"SMTS","SMTSD")
skin_filtered = filterLowGenes(skin,"SMTSD") dim(skin) dim(skin_filtered)
Or group specific genes
tmp = filterGenes(skin,labels=c("X","Y","MT"),featureName = "chromosome_name") # Keep only the sex names tmp = filterGenes(skin,labels=c("X","Y","MT"),featureName = "chromosome_name",keepOnly=TRUE)
plotDensity(skin_filtered,"SMTSD",main=expression('log'[2]*' raw expression')) skin_filtered = normalizeTissueAware(skin_filtered,"SMTSD") plotDensity(skin_filtered,"SMTSD",normalized=TRUE,main="Normalized")
Other than checkMisAnnotation
and checkTissuesToMerge
we provide a few plotting function.
We include, plotCMDS
, plotDensity
, plotHeatmap
.
plotCMDS
- PCoA / Classical Multi-Dimensional Scaling of the most variable genes.
data(skin) res = plotCMDS(skin,pch=21,bg=factor(pData(skin)$SMTSD))
plotDensity
- Density plots colored by phenotype of choosing. Allows for inspection of global trend differences.
filtData = filterLowGenes(skin,"SMTSD") plotDensity(filtData,groups="SMTSD",legendPos="topleft")
plotHeatmap
- Heatmap of the most variable genes.
library(RColorBrewer) tissues = pData(skin)$SMTSD heatmapColColors=brewer.pal(12,"Set3")[as.integer(factor(tissues))] heatmapCols = colorRampPalette(brewer.pal(9, "RdBu"))(50) plotHeatmap(skin,normalized=FALSE,log=TRUE,trace="none",n=10, col = heatmapCols,ColSideColors = heatmapColColors,cexRow = 0.25,cexCol = 0.25)
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.