knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 5, fig.height = 4, dpi=200 )
The NanoStringGeoMxSet was inherited from Biobase's ExpressionSet class. The NanoStringGeoMxSet class was designed to encapsulate data and corresponding methods for NanoString DCC files generated from the NanoString GeoMx Digital Spatial Profiling (DSP) platform.
There are numerous functions that NanoStringGeoMxSet inherited from ExpressionSet class. You can find these in this link: https://www.bioconductor.org/packages/release/bioc/vignettes/Biobase/inst/doc/ExpressionSetIntroduction.pdf
Loading the NanoStringNCTools and GeoMxTools packages allow users access to the NanoStringGeoMxSet class and corresponding methods.
library(NanoStringNCTools) library(GeomxTools) library(EnvStats) library(ggiraph)
Use the readNanoStringGeoMxSet function to read in your DCC files.
The phenoDataFile variable takes in the annotation file, the phenoDataDccColName is to specify which column from your annotation contains the DCC file names. The protocolDataColNames are the columns in your annotation file that you want to put in the protocol data slot.
datadir <- system.file("extdata", "DSP_NGS_Example_Data", package="GeomxTools") DCCFiles <- dir(datadir, pattern=".dcc$", full.names=TRUE) PKCFiles <- unzip(zipfile = file.path(datadir, "/pkcs.zip")) SampleAnnotationFile <- file.path(datadir, "annotations.xlsx") demoData <- suppressWarnings(readNanoStringGeoMxSet(dccFiles = DCCFiles, pkcFiles = PKCFiles, phenoDataFile = SampleAnnotationFile, phenoDataSheet = "CW005", phenoDataDccColName = "Sample_ID", protocolDataColNames = c("aoi", "cell_line", "roi_rep", "pool_rep", "slide_rep"))) class(demoData) isS4(demoData) is(demoData, "ExpressionSet") demoData
# access the count matrix assayData(demoData)[["exprs"]][1:3, 1:3] # access pheno data pData(demoData)[1:3, ] # access the protocol data pData(protocolData(demoData))[1:3, ] # access the probe information fData(demoData)[1:3, ] # check feature type featureType(demoData) # access PKC information annotation(demoData)
Alongside the accessors associated with the ExpressionSet class, NanoStringGeoMxSet objects have unique additional assignment and accessor methods facilitating common ways to view DSP data and associated labels.
The package provide functions to get the annotations of the data
Access the available pheno and protocol data variables
svarLabels(demoData) head(sData(demoData), 2)
Design information can be assigned to the NanoStringGeoMxSet object, as well as feature and sample labels to use for NanoStringGeoMxSet plotting methods.
design(demoData) <- ~ `segments` design(demoData) dimLabels(demoData) dimLabels(demoData)[2] <- "Sample ID" dimLabels(demoData)
Easily summarize count results using the summary method. Data summaries can be generated across features or samples. Labels can be used to generate summaries based on feature or sample groupings.
head(summary(demoData, MARGIN = 1), 2) head(summary(demoData, MARGIN = 2), 2) unique(sData(demoData)$"cell_line") head(summary(demoData, MARGIN = 2, GROUP = "cell_line")$"HS578T", 2) head(summary(demoData, MARGIN = 2, GROUP = "cell_line")$"COLO201", 2) head(summary(demoData, MARGIN = 2, GROUP = "cell_line", log2 = FALSE)$"COLO201", 2)
NanoStringGeoMxSet provides subsetting methods including bracket subsetting and subset functions. Users can use the subset or select arguments to further subset by feature or sample, respectively.
dim(demoData)
use the bracket notation
dim(demoData[, demoData$`slide name` == "6panel-old-slide1 (PTL-10891)"])
Or use subset method to subset demoData object by selecting only certain slides
dim(subset(demoData, select = phenoData(demoData)[["slide name"]] == "6panel-old-slide1 (PTL-10891)"))
Subset by selecting specific targets and slide name
dim(subset(demoData, TargetName == "ACTA2", `slide name` == "6panel-old-slide1 (PTL-10891)")) dim(subset(demoData, CodeClass == "Control", `slide name` == "6panel-old-slide1 (PTL-10891)"))
use endogenousSubset and negativeControlSubset function to subset the demodata and include only features that belong to endogenous code class or negative code class.
dim(endogenousSubset(demoData)) dim(negativeControlSubset(demoData))
endogenousSubset function also takes select arguments to further subset by phenodata
dim(endogenousSubset(demoData, select = phenoData(demoData)[["slide name"]] == "6panel-old-slide1 (PTL-10891)")) # tally the number of samples according to their protocol or phenodata grouping with(endogenousSubset(demoData), table(`slide name`)) with(demoData [1:10, 1:10], table(cell_line)) with(negativeControlSubset(demoData), table(CodeClass))
Similar to the ExpressionSet's esApply function, an equivalent method is available with NanoStringGeoMxSet objects. Functions can be applied to assay data feature-wise or sample-wise.
Add the demoElem data which is computed as the logarithm of the count matrix (exprs) into the demoData by using assayDataApply function. The accessor function assayDataElement from eSet returns matrix element from assayData slot of object. Elt refers to the element in the assayData.
assayDataElement(demoData, "demoElem") <- assayDataApply(demoData, MARGIN=2, FUN=log, base=10, elt="exprs") assayDataElement(demoData, "demoElem")[1:3, 1:2] # loop over the features(1) or samples(2) of the assayData element and get the mean assayDataApply(demoData, MARGIN=1, FUN=mean, elt="demoElem")[1:5] # split the data by group column with feature, pheno or protocol data then get the mean head(esBy(demoData, GROUP = "cell_line", FUN = function(x) { assayDataApply(x, MARGIN = 1, FUN=mean, elt="demoElem") }))
Users can flag samples that fail QC thresholds or have borderline results based on expression. The setQC Flags will set the QC flags in the protocolData for the samples and probes that are low in count and saturation levels. It will also set flags for probe local outliers (low and high) and Global Outliers
demoData <- shiftCountsOne(demoData, useDALogic=TRUE) demoData <- setSegmentQCFlags(demoData) head(protocolData(demoData)[["QCFlags"]]) demoData <- setBioProbeQCFlags(demoData) featureData(demoData)[["QCFlags"]][1:5, 1:4]
Probes and Samples that were flagged can be removed from analysis by subsetting.
Subset object to exclude all that did not pass Sequencing and background QC.
QCResultsIndex <- which(apply(protocolData(demoData)[["QCFlags"]], 1L , function(x) sum(x) == 0L)) QCPassed <- demoData[, QCResultsIndex] dim(QCPassed)
After cleaning the object from low counts, the counts can be collapsed to Target using aggregateCounts function.
Save the new object as target_demoData when you call the aggregateCounts function. This will change the dimension of the features. After aggregating the counts, feature data will contain target counts and not probe counts. To check the feature type, you can use the featureType accessor function.
target_demoData <- aggregateCounts(demoData) dim(target_demoData)
Note that feature data changed to target.
featureType(target_demoData) exprs(target_demoData)[1:5, 1:5]
There is a preloaded GeoMx DSP-DA Normalization that comes with the NanoStringGeoMxSet class. This includes the options to normalize on quantile, housekeeping or negative normalization.
target_demoData <- normalize(target_demoData , norm_method="quant", desiredQuantile = .9, toElt = "q_norm") target_demoData <- normalize(target_demoData , norm_method="neg", fromElt="exprs", toElt="neg_norm") target_demoData <- normalize(target_demoData , norm_method="hk", fromElt="exprs", toElt="hk_norm") assayDataElement( target_demoData , elt = "q_norm" )[1:3, 1:2] assayDataElement( target_demoData , elt = "hk_norm" )[1:3, 1:2] assayDataElement( target_demoData , elt = "neg_norm" )[1:3, 1:2]
The NanoStringGeoMxSet munge function generates a data frame object for downstream modeling and visualization. This combines available features and samples into a long format.
neg_set <- negativeControlSubset(demoData) class(neg_set) neg_ctrls <- munge(neg_set, ~ exprs) head(neg_ctrls, 2) class(neg_ctrls) head(munge(demoData, ~ exprs), 2) munge(demoData, mapping = ~`cell_line` + GeneMatrix)
Subtract max count from each sample Create log1p transformation of adjusted counts
thresh <- assayDataApply(negativeControlSubset(demoData), 2, max) demoData <- transform(demoData, negCtrlZeroed = sweep(exprs, 2, thresh), log1p_negCtrlZeroed = log1p(pmax(negCtrlZeroed, 0))) assayDataElementNames(demoData)
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.