title: "patelGBMSC -- CONQUER quantification of single-cell RNA-seq in glioblastoma, thinned, with colData"
author: "Vincent J. Carey, stvjc at channing.harvard.edu"
date: "r format(Sys.time(), '%B %d, %Y')
"
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{patelGBMSC -- a single-cell RNA-seq dataset in glioblastoma}
%\VignetteEncoding{UTF-8}
output:
BiocStyle::pdf_document:
toc: yes
number_sections: yes
BiocStyle::html_document:
highlight: pygments
number_sections: yes
theme: united
toc: yes
```{r setup,echo=FALSE,results="hide"} suppressPackageStartupMessages({ suppressMessages({ library(patelGBMSC) }) })
# Introduction
[Patel et al. 2014](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4123637/)
describe a single-cell RNA-seq study of several glioblastoma
samples. The data were reprocessed with the
[CONQUER](http://imlspenticton.uzh.ch:3838/conquer/)
pipeline (see the [QC report](http://imlspenticton.uzh.ch/robinson_lab/conquer/report-multiqc/GSE57872_multiqc_report.html)).
The rds file distributed by CONQUER is large as it includes
multiple gene-level and transcript-level quantifications.
As of Oct 30 2017, the CONQUER distribution does not include
sample-level information beyond the GSM identifier. This
package includes a smaller image of the data (the `count_lstpm`
quantifications, that are estimated counts created using the
salmon algorithm, rescaled to account for library size).
The data image is over 200MB, so the `r Biocpkg("BiocFileCache")`
discipline is used to perform a one-time download, insertion
and bookkeeping in cache; the `loadPatel` function takes
care of the download and retrieval from cache as appropriate.
# Quick view of the data
We'll randomly sample 5000 genes to reduce runtime
in this vignette. We filter down to the 430 patient samples
that passed quality control.
```{r getdat}
library(patelGBMSC)
patelGeneCount = loadPatel()
qdrop = grep("excluded", patelGeneCount$description) # QC issues
patelGeneCount = patelGeneCount[,-qdrop]
ispat = grep("MGH", patelGeneCount$characteristics_ch1)
patelGeneCount = patelGeneCount[,ispat]
patelGeneCount = patelGeneCount[-grep("ERCC", rownames(patelGeneCount)),] # drop ERCC spikeins
patelGeneCount$sampcode = factor(gsub("patient id: ", "", patelGeneCount$characteristics_ch1))
tcol = as.numeric(tfac <- factor(patelGeneCount$sampcode))
set.seed(1234)
samp = assay(patelGeneCount[sample(1:nrow(patelGeneCount), size=5000),])
library(Rtsne)
RTL = Rtsne(t(log(samp+1)))
#plot(RTL$Y, col=tcol, pch=19)
#legend(8,15,legend=levels(tfac),col=1:length(levels(tfac)),
# pch=19)
myd = data.frame(ts1=RTL$Y[,1], ts2=RTL$Y[,2],
code = patelGeneCount$sampcode, tcol=tcol)
library(ggplot2)
ggplot(myd, aes(x=ts1, y=ts2, group=code, colour=code)) + geom_point()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.