This document shows how SRP045638 can be retrieved
and analyzed. We start with filtered and normalized data in the vGene
object.
library(edgeR) library(CSHstats) data(vGene) names(vGene) head(vGene$E[,1:5])
The model matrix is also important:
data(mod) head(mod)
Here are the top results from the limma-voom analysis:
data(de_results) options(digits=3) de_results[1:5, c("gene_name", "logFC", "t", "P.Value", "adj.P.Val")]
This is a little different from the de_results
computed in the
class document -- because we sort by p and take the most significant
results.
The vGene$E
structure holds the
estimated expression values, and vGene$weights
are
quantities that measure relative variability of
the quantities in E
. We can pick a gene of
interest and examine the marginal distribution
and estimate association.
The top DE gene was "ENSG00000121210.15". Let's make a data.frame with the E values, the covariates, and the weights.
target = "ENSG00000121210.15" ind = which(rownames(vGene$E)==target) mydf = data.frame(cbind(KIAA0922=vGene$E[ind,], mod[,-1], wts=vGene$weights[ind,]))
Now examine the marginal distribution:
hist(mydf$KIAA0922)
Interesting. There's a big gap in the KIAA0922 expression distribution.
Exercise: Is that true for the "raw" data, or is it an artifact of all the computations we've done?
library(beeswarm) beeswarm(mydf$KIAA0922, pwcol=as.numeric(factor(mydf$prenatalprenatal))) legend(.6,6,legend=c("postnatal", "prenatal"), col=c(1,2), pch=19)
Finally, we fit the linear model:
m1 = lm(KIAA0922~.-wts, data=mydf, weights=wts) summary(m1) plot(m1, which=2, col=(as.numeric(mydf$prenatalprenatal)+1), pch=19)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.