MEM: Marker Enrichment Modeling

View source: R/MEM_function.R

MEMR Documentation

Marker Enrichment Modeling

Description

The MEM function takes pre-clustered, single-cell data as input and calculates relative enrichment scores for each marker on each population.

Usage

MEM(exp_data,
    transform=FALSE,
    cofactor=1,
    choose.markers=FALSE,
    markers="all",
    choose.ref=FALSE,
    zero.ref=FALSE,
    rename.markers=FALSE,
    new.marker.names="none",
    file.is.clust=FALSE,
    add.fileID=FALSE,
    IQR.thresh=NULL,
    output.prescaled.MEM=FALSE,
    scale.matrix = "linear",
    scale.factor = 0)

Arguments

exp_data

list of file names or a matrix or data.frame object where the last column contains a numeric cluster ID for each cell (row). If exp_data is a list of files, either each file must be one cluster or each file must contain a cluster channel (column) that specifies a numeric cluster ID for each cell (row). See details for more information.

transform

TRUE or FALSE; whether or not to apply asinh transformation to the data. Default is FALSE.

cofactor

numeric; if transform is TRUE, what cofactor should be applied. Default is 1. Arcsinh transformed value = arcsinh(raw value/cofactor)

choose.markers

TRUE or FALSE; whether or not the user wants to choose the markers (columns) for analysis in the console. If data contains markers that will not be used in the analysis (e.g. SSC or FSC channels in flow data), should be set to TRUE. If FALSE, either all of the markers in the experiment data will be used in MEM or the user can pass a character string of the markers to be used in the analysis using the function call (markers) below.

markers

"all" or ex."1:2,7,11:12,25"; if the user wants to choose markers to be used in the MEM analysis without having the console ask for a user input, enter a character string similar to the one shown in the example. The markers chosen should be separated with colons or commas, without spaces spaces between. If "all", all of the markers will be used in MEM.

choose.ref

TRUE or FALSE; Default reference for each population is all other populations in the dataset. For example, in a dataset containing 7 clusters, reference for population 1 would include clusters 2-7. If set to TRUE, user will be prompted in the console to enter which cluster(s) should be used as reference instead of the default bulk non-population reference.

zero.ref

TRUE or FALSE; If set to TRUE, a zero, or synthetic negative, reference will be used for all populations. MAGref therefore is 0 and IQRref is the median IQR across all markers chosen. MEM scores will go from 0 to +10 instead of -10 to +10.

rename.markers

TRUE or FALSE; if TRUE, user will be prompted to enter new column names in the console. Default FALSE. If FALSE, either the column names will not be changed or the user can pass a character string of the new column names using the function call new.marker.names below.

new.marker.names

"none" or ex."CD4,CD19,HLA-DR,CD8,CD14,CD16"; if user wants new column names for channels without having the console ask for a user input, enter a character string like the one shown in the example. Each new column name should be separated by a comma, without spaces between names. If "none", the column names will not be changed.

file.is.clust

TRUE or FALSE; if multiple files are entered as input and each file contains cells from only one cluster, should be set to TRUE. This prompts function to merge data into one matrix for analysis and to add a file ID for each file that will stand in as the cluster ID. A text file indicating which file corresponds to which cluster number will be written to the output files folder (by default will be created as a subdirectory in your working directory).

add.fileID

TRUE or FALSE; if multiple files are entered but file.is.clust is FALSE, this indicates that there are multiple files but each contains cells from multiple clusters and that there is already a cluster channel included as the last column in each file. If add.fileID is TRUE, a file ID will be appended to the cluster ID so user can identify the file as well as cluster from which each population came.

IQR.thresh

Default NULL. Optionally can be set to a numeric value. See Details for more information.

output.prescaled.MEM

Default FALSE. If TRUE, creates folder in working directory called "output files" containing a TXT file with pre-scaled MEM values. The MEM matrix output by build_heatmaps contains post-scaled (-10 to +10 scale) MEM values.

scale.matrix

Default "linear". Choose how to scale the MEM matrix. Can choose from "linear" "log" or "arcsinh" for the MEM matrix scale, apply scale, and then transform from -10 to 10 or 0 to 10.

scale.factor

Default 0. Choose the factor for the MEM matrix scaling. For example, choosing 2 will apply a log2 scale if "log" is chosen for scale.matrix, if "arcsinh"" is chosen then choosing 2 for scale.factor will use arcsinh scale with a cofactor of 2.

Details

For each population and its reference, MEM first calculates median marker levels and marker interquartile ranges (IQR), and then calculates MEM scores according to the equation

MEM = |Median_Pop - Median_Ref| + IQR_Ref/IQR_Pop -1 ; if Median_Pop - Median_ref < 0, -MEM

A dataset is provided as an example to be used with MEM and build_heatmaps. Please see dataset PBMC for more details.

Input data can be file type .txt, .fcs, or .csv. A matrix or data.frame object where the last column contains cluster identy per cell is also accepted. In all cases, the expected data structure is cells (datapoints) in rows and measured markers (i.e. features, parameters) in columns of the input data.

IQR threshold: The MEM equation takes the ratio of population and reference IQRs and adds this value to the difference in medians. Low IQR values below 1, like those resulting from background noise level measurements, can therefore artificially inflate the overall MEM score. In order to correct this, a threshold of 0.5 is automatically applied. However, the function can calculate an IQR threshold using the input data. If IQR_thresh is set to "auto", the threshold will be calculated as the IQR associated with the 2nd quartile median value across all populations and corresponding reference populations. This should be used if the user anticipates that 0.5 will not be an adequate threshold for the particular dataset.

Value

MAGpop

Matrix; Median expression level of markers on each population

MAGref

Matrix; Median expression on each population's corresponding reference population

IQRpop

Matrix; IQR of markers on each population

IQRref

Matrix; IQR on each population's corresponding reference population

Note

The object generated from MEM is meant to be passed to build_heatmaps which will generate MEM labels and heatmaps.

Author(s)

Kirsten Diggins, Sierra Lima, and Jonathan Irish

References

Diggins et al., Nature Methods, 2017

See Also

build_heatmaps

Examples

## For multiple file input, set working directory to folder containing files, then
## infiles <- dir()

## For single file or object input (e.g. PBMC), input data directly into MEM function

## User inputs
data(PBMC)
MEM_values = MEM(
              PBMC,
              transform=TRUE,
              cofactor=15,
              choose.markers=FALSE,
              markers="all",
              choose.ref=FALSE,
              zero.ref = FALSE,
              rename.markers=FALSE,
              new.marker.names="none",
              IQR.thresh=NULL,
              output.prescaled.MEM=FALSE,
              scale.matrix = "linear",
              scale.factor = 0)

cytolab/cytoMEM documentation built on Sept. 13, 2023, 7:28 a.m.