Introduction

The MM2S package is providing relevant functions for subtype prediction of Medulloblastoma primary samples, mouse models, and cell lines.

MM2S is single-sample classifier that generates Medulloblastoma (MB) subtype predictions for single-samples of human MB patients and model systems, including cell lines and mouse-models. The MM2S algorithm uses a systems-based methodology that faciliates application of the algorithm on samples irrespective of their platform or source of origin. MM2S demonstrates >96 percent accuracy for patients of well-characterized normal cerebellum, Wingless (WNT), or Sonic hedgehog(SHH) subtypes, and the less-characterized Group4 (86 percent) and Group3 (78.2 percent). MM2S also enables classification of Medulloblastoma (MB) cell lines and mouse models into their human counterparts.This package contains function for implementing the classifier onto human data and mouse data, as well as graphical rendering of the results as Principal Component Analysis (PCA) plots and heatmaps.

Please refer to the manuscript URL: http://www.sciencedirect.com/science/article/pii/S0888754315000774

Please also refer to the References section for additional information on downloading the MM2S package from Github, or running the MM2S server from the Lab website.

Loading package and case studies

First we load the MM2S and MM2Sdata packages into the workspace. Both packages are publicly available and can be installed from Bioconductor version 2.8 or higher in R version 2.13.0 or higher.

The MM2Sdata package contains companion datasets that will be used for the examples in the following case studies. The MM2Sdata package contains ExpressionSet objects of both Human and Mouse model Medulloblastoma, specifically: * GSE36594expr: Gene expression for 20 GTML Medulloblastoma mouse samples * GSE37418Expr: Gene expression for 10 primary Medulloblastoma human samples

Please consult the manual of the MM2Sdata package for more details.

if (!require("MM2S")) install.packages("MM2S")
if (!require("MM2Sdata")) install.packages("MM2Sdata")

suppressPackageStartupMessages(library(MM2S))
suppressPackageStartupMessages(library(MM2Sdata))

Case Study 1: Prediction Human Subtype Counterparts from Mouse Models

We first load the Mouse model dataset from GSE36594. We select all samples pertaining to the GTML mouse model. There are 20 sample replicates for this mouse model, all of which are labelled as GTML in the GEO series. We select for those samples and perform MM2S predictions on them.

data(GSE36594Expr)
ExprMat <- exprs(GSE36594Expr)
GTML <- ExprMat[, grep("GTML_MB", (colnames(exprs(GSE36594Expr))))]

#Change mouse sample names for clarity
for(sample in seq_len(ncol(GTML))) {
  newnames <- strsplit(x=(colnames(GTML)[sample]), split="_")[[1]][1]
  colnames(GTML)[sample] <- newnames
}

# Conduct Subtype Predictions for those particular replicates, save results in a XLS file
GTMLPreds <- MM2S.mouse(InputMatrix=GTML, parallelize=1, seed=12345, tempdir())

Now we can view the predictions for the GTML sample replicates in more detail. We first generate heatmap of MM2S confidence predictions for each sample replicate.

PredictionsHeatmap(InputMatrix=GTMLPreds$Predictions[1:20, ],
    pdf_output=TRUE, pdfheight=12, pdfwidth=10)

We can also represent the results as a stacked barplot.

PredictionsBarplot(InputMatrix=GTMLPreds$Predictions[1:20, ], pdf_output=TRUE,
    pdfheight=5, pdfwidth=12)

Using the heatmap or the stacked barplot, we observe that the majority of sample replicates strongly predict as Group3, suggesting the potential for a non-SHH mouse model.

We generate here an overview of the majority subtypes, across all sample replicates.

PredictionsDistributionPie(InputMatrix=GTMLPreds, pdf_output=TRUE, pdfheight=5,
    pdfwidth=5)

To assess further, we also plot the distribution of subtype calls, across all the samples.

PredictionsDistributionBoxplot(InputMatrix=GTMLPreds, pdf_output=FALSE)

Notably, some samples also predict as either Sonic hedgehog (SHH) or Normal. Further investigation would need to performed on these samples. To invesitage further, we can graphically visualize different sample replicates and their nearest human Medulloblastoma (MB) neighbors from the MM2S training set using Principal Component Analysis (PCA).

Three PDF files are generated which renders PC1 vs PC2, PC2 vs PC3, and a lattice plot of PC1-PC3.

PCARender(GSVAmatrixTesting=GTMLPreds$RankMatrixTesting,
          GSVAmatrixTraining=GTMLPreds$RankMatrixTraining)

Case Study 2: Predict Human Subtypes from Primary Patient samples

We first load the gene expression data of 10 primary human patient tumours from GSE37418, and conduct MM2S subtype predictions on them.

data(GSE37418Expr)
HumanExpr <- exprs(GSE37418Expr)
# Conduct Subtype Predictions for all samples, save results in a XLS file
# [This will take a few minutes to compute]
HumanPreds <- MM2S.human(InputMatrix=HumanExpr, parallelize=1, seed=12345, 
    tempdir())

We can compare MM2S predictions against known subtype predictions of the samples. These subtype predictions are obtained from the Gene Expression Omnibus (GEO).

# We first assess the distribution of the known subtypes for the 76 samples.
table(pData(GSE37418Expr)$characteristics_ch1)
# We now assess the distribtuion of MM2S predicted subtypes for the 76 samples. 
table(HumanPreds$MM2S_Subtype[, 2])
# Side-by-side comparison of MM2S predictions and pre-determined subtypes across all samples
# first check that all samples are matching in the pData and MM2S
all(HumanPreds$MM2S_Subtype[, 1] == rownames(pData(GSE37418Expr)))
# then generate comparisons
ComparisonTable <- cbind(Sample=rownames(pData(GSE37418Expr)),
    Original=as.character(pData(GSE37418Expr)$characteristics_ch1),
    MM2S=HumanPreds$MM2S_Subtype[, 2])
# We view the first 15 samples here
ComparisonTable[1:10, ]

We can easily generate a heatmap of all predictions, as well as Principal Component Analysis (PCA) plots for our given samples against the MM2S training set.

# Now generate a heatmap of the predictions and save the results in a PDF file.  
# This indicates MM2S confidence perdictions for each sample.
# We can view the first 10 samples.
PredictionsHeatmap(InputMatrix=HumanPreds$Predictions[1:10, ],
    pdf_output=TRUE, pdfheight=10, pdfwidth=5)

# NB: Output may appear on multiple pages

# We can graphically visualize different sample replicates and their nearest human Medulloblastoma (MB) neighbors 
# from the MM2S training set using Principal Component Analysis (PCA). 
PCARender(GSVAmatrixTesting=HumanPreds$RankMatrixTesting, 
          GSVAmatrixTraining=HumanPreds$RankMatrixTraining)

References and Extra Notes

Both MM2S and MM2Sdata are publicly available and can be installed in R version 2.13.0 or higher. Both packages are also availabe on Github. Companion datasets are also available on the Haibe-Kains (BHK) Lab website.

Please refer to the following data repositories and websites for additional information, as necessary: * MM2S and MM2Sdata on Github: https://github.com/bhklab * BHK Lab Website: http://www.pmgenomics.ca/bhklab/software/mm2s

The following code snippet is an example installation of the data repositories from Github.

remotes::install_github("bhklab/MM2S")
remotes::install_github("bhklab/MM2Sdata")

License

The MM2S package is released under the GPL-3.0 License.

The MM2S package is provided "AS-IS" and without any warranty of any kind. In no event shall the University Health Network (UHN) or the authors be liable for any consequential damage of any kind, or any damages resulting from the use of MM2S.

sessionInfo

sessionInfo()


bhklab/MM2S documentation built on Dec. 25, 2021, 10:48 a.m.