NanoStringClustR
enables users to quickly and easily assess the performane of mutliple normalisation methods on nanostring nCounter data. NanoStringClustR
performs nCounter scaling factor based normalisations using spike-in controls and housekeeping genes and uses wrappers for geNorm, variance stabilising normalisation (vsn), cyclic loess, quantile and RUV-III normalisation. A combination of a cluster validity index and Relative Log Expression are used to rank normalisations. NanoStringClustR
also enables the effect of normalization on differential gene expression to be assessed by implementing a wrapper for limma. NanoStringClustR
currently supports NanoString nCounter mRNA and miRNA data, although it has only been tested with mRNA data.
NanoStringClustR
contains 4 main functions:
count_set()
generate a count_set summarising a NanoString experimentmulti_norm()
perform normalisations and output diagnostic plots norm_rank()
rank normalisationsmulti_diff()
perform differential gene expression analysis of all pairwise combinations on normalized datasetsFirst, install and load the NanoStringClustR
library and example dataset. NanoStringClustR
uses the R package SummarizedExperiment
to hold NanoString count data, so load this package too.
library(NanoStringClustR) data("Rnf5") library(SummarizedExperiment)
A count_set is a SummarizedExperiment
that holds NanoString count data and sample annotations. To build a count_set, provide:
First, define sample annotations
# biological groups rnf5_group <- c(rep("WT", 5), rep("KO", 5)) # sample ids rnf5_sampleid <- c("GSM3638131", "GSM3638132", "GSM3638133", "GSM3638134", "GSM3638135", "GSM3638136", "GSM3638137", "GSM3638138", "GSM3638139", "GSM3638140")
Second, build a count_set
# for this example, we will use in-package Rnf5 dataset as an example: rnf5_count_set <- count_set(count_data = Rnf5, group = rnf5_group, samp_id = rnf5_sampleid)
You can generate count_set with a file path to a csv generated by nSolver RCC Collector Tool Format Export:
# e.g. rnf5_count_set <- count_set(rccexp_dir = "~/path/to/file.csv", #group = rnf5_group, #samp_id = rnf_sampleid)
Adding output_log = "~/Dropbox/NanoStringCountR/NanoStringCountR/raw_data/")
will save the se.R
Then, you can load an existing summarizedExperiment from an se object in R, or a full file path to a saved se.R
# e.g. rnf5_count_set <- count_set(count_set = rnf5_count_set, #group = group, #batch = batch, samp_id = samp_id) # e.g. rnf5_count_set <- count_set(count_se = "~/path/to/se.R")
The count_set can be accessed by functions in the SummarizedExperiment package
rnf5_count_set
multi_norm()
performs the following types of normalisation
A. Optional Pre-processing. Choose which pre-processing method you would prefer.
background_correct
"mean2sd", "proportional", "none"
count_threshold
"mean2sd"
of the negative controls
or any number from 0 - inf
positive_control_scaling
TRUE/FALSE
B. Count Normalisations. multi_norm()
performs all normalizations automatically.
housekeeping_scaled
geNorm_housekeeping
geNorm_n
stably expressed housekeeping genes selected by geNormall_endogenous_scaled
loess
vsn
quantile
ruv
multi_norm()
returns a summarized experiment with the normalized counts as assays.
Diagnostic plots will be saved if a plot_dir
is provided e.g. plot_dir = "~/full/path/to/my/plots/"
rnf5_count_set_norm <- multi_norm(count_set = rnf5_count_set, positive_control_scaling = TRUE, background_correct = "mean2sd")
Log2 transformed, normalised data are returned in a normalized count_set
as assays and can be accessed here:
#list types of normalisations names(assays(rnf5_count_set_norm)) #access normalisations #assays(rnf5_count_set_norm)$housekeeping_scaled
To access log2 transformed counts, use
#assays(rnf5_count_set_norm)$counts
norm_rank()
performs cluster based ranking of normalisation methods using Generalized Dunn Index between groups and sum RLE variation.
rnf5_eval <- norm_rank(count_set = rnf5_count_set_norm)
norm_rank()
returns a dataframe ranks. Lower ranking normalizations are rated better.
rnf5_eval
multi_diff()
performs differential gene expression analysis on all possible pairs of groups
defined in the count_set
. The threshold for significantly differentially expressed genes is defined by p_cut_off
and logFC_cut_off
.
rnf5_multi_diff <- multi_diff(count_set = rnf5_count_set_norm, adj_method = "fdr", p_cut_off = 0.05, logFC_cut_off = 0)
multi_diff()
will return a list with:
rnf5_multi_diff$plot_DEG
rnf5_multi_diff$overlap_DEG
rnf5_multi_diff$summary_DEG
#rnf5_multi_diff$full_result$NAME_OF_NORM_METHOD e.g. head(rnf5_multi_diff$results_DEG$housekeeping_scaled$`KO - WT`)
For more information, see the topTable
function from the limma
R package.
NanoStingClustR
supports differential gene expression with pairing, for example an experiment where samples have been taken in the same person, before and after treatment. For this example, we will consider each WT and KO sample to be paired.
First, add pairing information to the normalized count_set
.
colData(rnf5_count_set_norm)$pair <- as.factor(c("pair1", "pair2", "pair3", "pair4", "pair5", "pair1", "pair2", "pair3", "pair4", "pair5"))
Second, run multi_diff with pairing = "paired"
rnf5_multi_diff_paired <- multi_diff(count_set = rnf5_count_set_norm, adj_method = "fdr", p_cut_off = 0.05, logFC_cut_off = 0, pairing = "paired")
rnf5_multi_diff_paired$plot_DEG
If technical replicates are present, multi_norm
will perform RUV normalisation by RUV-III. NanoStringClustR
defines technical replicates by the sample ID in the samp_id
slot of the count_set
object. Technical (or pseudo) replicates should have the same name. For this example, we will consider the first 2 WT samples to be technical replicates. Currently, only one factor of variation is determined (k = 1
).
For more information on using RUV-III with NanoString data, see: Molania R, Gagnon-Bartsch JA, Dobrovic A, et al. A new normalization for Nanostring nCounter gene expression data. Nucleic Acids Res. 2019 May 22. doi: 10.1093/nar/gkz433. PubMed PMID: 31114909
First define technical replicates in the un-normalized count_set
.
rnf5_count_set$samp_id <- c("techrep_1", "techrep_1", "GSM3638133", "GSM3638134", "GSM3638135", "GSM3638136", "GSM3638137", "GSM3638138", "GSM3638139", "GSM3638140")
Run multi_norm
with the count_set
containing the technical replicate info in the $samp_id
slot
rnf5_ruv_count_set_norm <- multi_norm(count_set = rnf5_count_set, positive_control_scaling = TRUE, background_correct = "mean2sd")
Running norm_rank
and multi_diff
will now include ruvIII
rnf5_ruv_eval <- norm_rank(count_set = rnf5_ruv_count_set_norm) rnf5_ruv_eval
rnf5_ruv_multi_diff <- multi_diff(count_set = rnf5_ruv_count_set_norm, adj_method = "fdr", p_cut_off = 0.05, logFC_cut_off = 0, pairing = "unpaired")
When using this package, please cite NanoStringClustR
as follows:
citation("NanoStringClustR")
Please also cite all methods used.
If you use multi_norm, cite:
citation("vsn") citation("affy") citation("ruv") citation("preprocessCore")
If you use norm_rank, cite:
citation("clv")
If you use multi_diff, cite:
citation("limma") citation("UpSetR")
Please also check reference suggestions for each package.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.