In Mengbo-Li/ruvms: Removing Unwanted Variations in Mass-Spectrometry Data.

options(width = 100, digits = 3)
knitr::opts_chunk$set(collapse = TRUE, 
                      comment = "#>", 
                      echo = TRUE, 
                      cache = FALSE, 
                      prompt = FALSE,
                      tidy = TRUE,
                      comment = NA, 
                      message = FALSE, 
                      warning = FALSE, 
                      tidy = TRUE, 
                      tidy.opts = list(width.cutoff = 60),
                      fig.width = 7, 
                      fig.height = 5, 
                      dev = "png")

Overview

Here we present an example of the ruvms workflow that includes data normalisation and a simple downstream differential expression analysis using the Sydney Heart Bank (SHB) proteomics data.

Install `ruvms`

# install.packages("devtools")
devtools::install_github("Mengbo-Li/ruvms")

Load packages

library(tidyverse)
library(RColorBrewer)
library(naniar)
library(ggpubr)
library(ruvms)
library(limma)
require(ruv)
require(Rfast)
require(Matrix)

Load data

The Sydney Heart Bank (SHB) proteomics data are loaded with corresponding sample information.

data("shb", package = "ruvms")
dim(raw)
raw[1:5, 1:5]
dim(smpinfo)
head(smpinfo)

Pre-processing and data normalisation

Here we demonstrate the workflow using the SHB proteomics data.

Log2 transformation

raw <- log2(raw)
raw[1:5, 1:5]

Missing data patterns

Total proportion of missingness:

sum(is.na(raw)) / length(raw)

Missingness in samples:

ggplot(data.frame(x = colSums(is.na(raw))), aes(x = x)) + 
  geom_histogram(bins = 20) + 
  labs(x = "Number of NA observations", y = "Count") + 
  ggtitle("Number of NAs in each protein (n = 57)")

Missingness by proteins:

ggplot(data.frame(x = rowSums(is.na(raw))/ncol(raw)), aes(x = x)) + 
  geom_histogram(bins = 10) + 
  labs(x = "Proportion of NA proteins", y = "Count") + 
  ggtitle("Proportion of NA proteins in each sample (p = 3,263)")

Choose negative controls

Here, we use the low sample variance proteins as the negative controls:

ctl <- colVars(raw) <= quantile(colVars(raw, na.rm = TRUE), probs = 0.25, na.rm = TRUE)
table(ctl)

Identify the replication structure

The SHB data set contains 70 samples, where 7 of them have one replicate.

sum(duplicated(smpinfo$locBank))
M <- replicate.matrix(smpinfo$locBank)
rownames(M) <- smpinfo$locBank
dim(M)

Perform `ruvms` normalisation

normed <- ruvms(raw, M, ctl)
normed[1:5, 1:5]

Diagnostics

PCA

Normalisation is performed on the data set with missingness, but PCA is produced by subsetting the raw and normalised data to complete subsets.

gg_additions_heart <- list(aes(color = smpinfo$condition, size = 4), 
                           labs(color = "Condition"), 
                           scale_size_identity(guide = "none"), 
                           guides(color = guide_legend(override.aes = list(size = 4))), 
                           scale_color_manual(values = brewer.pal(3, "Dark2")))
ggPCA <- function(data) {
  pcs <- data.frame(prcomp(data)$x)
  ggplot(pcs, aes(x = PC1, y = PC2)) + geom_point() + gg_additions_heart + theme_classic()
}

pca_raw <- ggPCA(raw[, colSums(is.na(raw)) == 0]) + ggtitle("Raw data")
pca_norm <- ggPCA(normed[, colSums(is.na(normed)) == 0]) + ggtitle("Normalised data")
ggarrange(pca_raw, pca_norm, ncol = 2)

It is shown that conditions in ruvms normalised data are better separated, especially the donor samples.

RLA

par(mfrow = c(1, 2))
RLA(raw, repCol = brewer.pal(3, "Dark2")[smpinfo$condition], repLabel = unique(smpinfo$condition), 
    title = "Raw data", ylim = c(-3, 3), guides = seq(-2, 2))
RLA(normed, repCol = brewer.pal(3, "Dark2")[smpinfo$condition], repLabel = unique(smpinfo$condition), 
    title = "Normalised data", ylim = c(-3, 3), guides = seq(-2, 2))

Ideally, given a properly normalised data matrix, RLA boxplots are expected to be aligned around zero, and the widths of the boxplots should be relatively narrow and concordant with each other. We can see in our example, not only are the RLA boxplots better aligned at zero, but the widths of the of the boxplots are shrinked and more concordant post normalisation.

TRA

TRA boxplots measure the similarity among replications of the same effective sample. The widths of TRA boxplots are proportional to variations among technical replicates. Provided effective removal of unwanted variations from data, technical replicates are expected to be identical after normalisation.

The ruvms normalisation makes technical replications the same, thereby the widths of the TRA boxplots are zero after normalisation.

par(mfrow = c(1, 2))
smp_w_rep <- which(rowSums(M[, colSums(M) == 2]) == 1)
smp_w_rep <- filter(smpinfo, locBank %in% names(smp_w_rep)) %>% arrange(locBank)
smp_w_rep

TRA(raw[smp_w_rep$ind, ], replicates = smp_w_rep$locBank, ylim = c(0, 2.2), title = "Raw data")
TRA(normed[smp_w_rep$ind, ], replicates = smp_w_rep$locBank, ylim = c(0, 2.2), title = "Normalised data")

To further demonstrate the use of TRA plots, we also visualise the variations among the same condition, that is, Donors, DCMs and ICMs.

par(mfrow = c(1, 2))
TRA(raw, replicates = smpinfo$condition, col = brewer.pal(3, "Dark2"), 
    title = "Raw data", ylim = c(0, 0.65), guides = c(0, 0.3))
TRA(normed, replicates = smpinfo$condition, col = brewer.pal(3, "Dark2"), 
    title = "Normalised data",ylim = c(0, 0.65), guides = c(0, 0.3))

We can see that the variations within the same condition are reduced after normalisation.

Something extra: In ruv::RUVIII, the k parameter determines the amount of adjustment performed on the raw data set. The TRA plots are also useful to show that the larger k is, the more similar technical replications are after normalisation. That is, the heights of the TRA boxplots become smaller as k increases to its maximum.

MDA plots

The MDA plots shows how the median of each measurement shifts before and after normalisation. MDA plots are useful for the selection of the lambda parameter when standardise is disabled:

par(mfrow = c(1, 2))
ruvms_lambda0 <- ruvms(raw, M, ctl, standardise = FALSE, lambda = 0)
MDA(raw, ruvms_lambda0, outline = TRUE, guides = 0, title = "lambda = 0")
ruvms_w_lambda <- ruvms(raw, M, ctl, standardise = FALSE)
MDA(raw, ruvms_w_lambda, outline = TRUE, guides = 0, title = "lambda = 1e-5")

We can see that there are huge shifts in the range of each protein after normalisation when there is no lambda-regularisation. A very small lambda-regularisation helps.

Differential expression analysis

With normalised data, we then perform a simple differential expression analysis on the disease conditions. The factor of interest is the disease conditions, including the healthy donors (Donor), DCM hearts and ICM hearts. We are interested in the differentially abundant proteins in DCM and ICM versus Donor hearts respectively.

normed_avg <- ruvms(raw, M, ctl, average = TRUE)
effective_smp <- filter(smpinfo, !duplicated(locBank)) %>% arrange(locBank)

design <- model.matrix(~ 0 + condition, data = effective_smp)
contrasts <- makeContrasts(DCM = conditionDCM - conditionDonor, 
                           ICM = conditionICM - conditionDonor,
                           levels = design)

fit <- lmFit(t(normed_avg), design)
cfit <- contrasts.fit(fit, contrasts = contrasts)
efit <- eBayes(cfit)

DCM vs Donor

signif(topTable(efit, coef = "DCM", number = 20), 2)

ICM vs Donor

signif(topTable(efit, coef = "ICM", number = 20), 2)

References

Li, M., Parker, B. L., Pearson, E., Hunter, B., Cao, J., Koay, Y. C., ... & O’Sullivan, J. F. (2020). Core functional nodes and sex-specific pathways in human ischaemic and dilated cardiomyopathy. Nature communications, 11(1), 1-12.

Ritchie, M. E., Phipson, B., Wu, D. I., Hu, Y., Law, C. W., Shi, W., & Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic acids research, 43(7), e47-e47.

Session information

sessionInfo()

Mengbo-Li/ruvms documentation built on Nov. 14, 2023, 2:11 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Mengbo-Li/ruvms
Removing Unwanted Variations in Mass-Spectrometry Data.

In Mengbo-Li/ruvms: Removing Unwanted Variations in Mass-Spectrometry Data.

Overview

Install `ruvms`

Load packages

Load data

Pre-processing and data normalisation

Log2 transformation

Missing data patterns

Choose negative controls

Identify the replication structure

Perform `ruvms` normalisation

Diagnostics

PCA

RLA

TRA

MDA plots

Differential expression analysis

DCM vs Donor

ICM vs Donor

References

Session information

R Package Documentation

Browse R Packages

We want your feedback!

Mengbo-Li/ruvms Removing Unwanted Variations in Mass-Spectrometry Data.

In Mengbo-Li/ruvms: Removing Unwanted Variations in Mass-Spectrometry Data.

Overview

Install ruvms

Load packages

Load data

Pre-processing and data normalisation

Log2 transformation

Missing data patterns

Choose negative controls

Identify the replication structure

Perform ruvms normalisation

Diagnostics

PCA

RLA

TRA

MDA plots

Differential expression analysis

DCM vs Donor

ICM vs Donor

References

Session information

R Package Documentation

Browse R Packages

We want your feedback!

Mengbo-Li/ruvms
Removing Unwanted Variations in Mass-Spectrometry Data.

Install `ruvms`

Perform `ruvms` normalisation