knitr::opts_chunk$set(warning = FALSE, eval = TRUE, message=FALSE, warning=FALSE, results='hold')

````{=html}

## Introduction

This guide helps users with performing batch correction when necessary. SingleCellTK provides 11 methods that are already published including `BBKNN`, `ComBatSeq`, `FastMNN`, `MNN`, `Harmony`, `LIGER`, `Limma`, `Scanorama`, `scMerge`, `Seurat` integration and `ZINBWaVE`. All of these methods are available to use through our shiny ui application as well as through the R console environment through our wrapper functions. 

These methods accept various type of input expression matrices (e.g. raw counts or log-normalized counts), and generate either a new corrected expression matrix or a low-dimensional representation. Users should choose a function according to their needs. More detailed instructions on how to use these methods either through the shiny ui application (select "Interactive Analysis") or through the R console (select "Console Analysis") are described below:

## Workflow Guide

````{=html}
<div class="tab">
  <button class="tablinks" onclick="openTab(event, 'interactive')" id="ia-button">Interactive Analysis</button>
  <button class="tablinks" onclick="openTab(event, 'console')" id="console-button">Console Analysis</button>
</div>

<div id="interactive" class="tabcontent">

Entry of the Panel

From anywhere of the UI, the panel for normalization and batch correction can be accessed from the top navigation panel at the boxed tab. After this, click on the second boxed tab to enter the sub-page for batch correction.

Entry\

Perform the Correction

The UI is constructed in a sidebar style, where the left-sided sidebar works for selecting method, setting the parameters and running the batch correction, and the right part main panel is for visualization checking.

Parameters\

For running any types of batch correction, there are always three essential inputs that users should be sure with:

After the batch correction method is confirmed, the lower part will dynamically switch to the method specific settings. Different types of method have various parameter requirements, and various forms of output. For the detailed help information of each method, users can go to reference page and click on the function that stands for the method chosen, or click on the "(help for {method})" link in the UI.

Visualization

Visualization\

SCTK adopts two approaches to demonstrate the effect of batch correction:

````{=html}

1. SCTK attempts to plot the variance explained by annotations calculated with the assay to correct and the output data matrix.
Explanation: Explained Variance Plot

There will be three columns in the plots for the variance explained. The third one is for the variance explained by batch division, the second is for that of an additional condition, which is optional, and the first one is the variance explained by combining both annotations. The additional condition is usually an annotation that indicates the differences between cells that should not be eliminated, such as cell types and treatments. In the first plot, the combined variance should ideally be close to the sum of the two types of variance. (i.e. the height of the first column should be closed to the sum of the heights of the other two) As a result that makes sense, the changes between the two plots in the second column should not be too much, while the third column should be largely eliminated after correction.

````{=html}
</details>
</div>
<br>
<div class = "offset">
2. SCTK plots the PCA projections of the assay selected for correction, and the output corrected data matrix. For both cases, user should use the same batch annotation as used when running the algorithm to see a plausible result.  
<details>
  <summary><b>Explanation: PCA plots</b></summary>

In the PCA plots, cells (i.e. dots) will be labeled with different colors to mark the batch belonging, and different shapes to mark for the additional condition, which was mentioned in the previous paragraph. Usually in the result, in the plot for the output, users can observe that different colors of dots are mixed up together, while different shapes are still grouped into separate clusters.

````{=html}


Visualization Parameters

- The data matrix used for correction - selection input **"Original Assay"**. User should choose the one that was used when generating the result they want to visualize. 
- The batch annotation used for correction - selection input **"Batch Annotation"**. User should choose the one that was used when generating the result they want to visualize. 
- The additional condition - selection input **"Additional Condition (optional)"**. See the explanation in the previous paragraphs. Such a condition that makes the most sense does not always exist, so it becomes optional in SCTK. 
- The result to visualize - selection input **"Corrected Matrix"**. Only the results generated in the current SCTK UI session will be gathered in the option list here. 

````{=html}
</details>

</div>

<div id="console" class="tabcontent">

Before Correction

To run the pipeline, the most basic requirements are:

  • An assay of expression value, usually pre-processed.
  • The annotation of the batches.

As we adopt SingleCellExperiment (sce) object through out the whole SCTK for a collection of all matrices and metadatas of the dataset of interests, the assay to be corrected, called "assayToCorr", has to be saved at assay(sce, "assayToCorr"). Meanwhile, the batch annotation information has to be saved in a column of colData(sce).

Note that the batch annotation should better be saved as a factor in the colData, especially when the batches are represented by integer numbers, because some downstream analysis are likely to parse the non-character and non-logical information as continuous values instead of categorical values.

Choosing a Method

As mentioned in the introduction, our 11 methods vary in type of input and output. Input type for the expression matrix could be raw counts, log-transformed counts or scaled counts. Users need to make sure the assay being used is of the correct type. The output matrix can be an expression matrix (assay), a low-dimensional representation (reducedDim), or a subset features of expression matrix (altExp). Users need to choose a method that produce a matrix that fits in their analysis.

````{=html}

Example

**1. Prepare an SCE object with multiple batches**

Here we present an example dataset that is combined from "pbmc3k" and "pbmc4k", originally available from package [TENxPBMCData](https://bioconductor.org/packages/release/data/experiment/html/TENxPBMCData.html), which you can import by a function called `importExampleData()`.
```r
library(singleCellTK)
pbmc6k <- importExampleData('pbmc6k')
pbmc8k <- importExampleData('pbmc8k')
print(paste(dim(pbmc6k), c('genes', 'cells')))
print(paste(dim(pbmc8k), c('genes', 'cells')))
```
SCTK has a function called `combineSCE()`, which accepts a list of SCE objects as input and returns a combined SCE object. This function requires that the number of genes in each SCE object has to be the same, and the gene metadata (i.e. `rowData`) has to match with each other if the same fields exist. Therefore, we need some pre-process for the combination. Users do not necessarily have to follow the same way, depending on how the raw datasets are provided.
```r
## Combine the two SCE objects
sce.combine <- combineSCE(sceList = list(pbmc6k = pbmc6k, pbmc8k = pbmc8k), by.r = names(rowData(pbmc6k)), by.c = names(colData(pbmc6k)), combined = TRUE)
table(sce.combine$sample)
```
In this manual, we only present a toy example instead of a best practice for real data. Therefore, QC and filtering are skipped.  
Additionally, most of the batch correction methods provided **require a log-normalized assay** as input, users can check the required assay type of each method by looking at its default setting. (i.e. if by default `useAssay = "logcounts"`, then it requires log-normalized assay; else if by default `useAssay = "counts"`, then the raw count assay.)  
```r
## Simply filter out the genes that are expressed in less than 1% of all cells.
rowData(sce.combine)$expPerc <- rowSums(assay(sce.combine) > 0) / ncol(sce.combine)
sce.filter <- subsetSCERows(sce.combine, rowData = "expPerc >= 0.01", returnAsAltExp = FALSE)
print(sce.filter)
sce.small <- sce.filter[sample(nrow(sce.filter), 800), sample(ncol(sce.filter), 800)]
sce.small <- scaterlogNormCounts(inSCE = sce.small, useAssay = 'counts', assayName = 'logcounts')
print(sce.small)
```

**2. Run a batch correction method on the prepared SCE object**

The basic way to run a batch correction method from SingleCellTK is to select a function for the corresponding **method**, input the **SCE object**, specify the **assay** to correct, and the **batch annotation**.  

For example, here we will try the batch correction method provided by [Limma](https://rdrr.io/bioc/limma/man/removeBatchEffect.html), which fits a linear model to the data.
```r
sce.small <- runLimmaBC(inSCE = sce.small, useAssay = 'logcounts', batch = 'sample', assayName = 'LIMMA')
print(sce.small)
```

**3. Visualization**

In this documentation, we provide three ways to examine the removal of batch effect, in terms of visualization. 

1. Plot the variation explained by the batch annotation and another condition.  

This functionality is implemented in `plotBatchVariance()`. It plots a violin plot of the variation explained by the given batch annotation, an additional condition, and the variation explained by combining these two conditions.  
This plot would be useful for examining whether the existing batch effect is different than another condition (e.g. subtype) or is confounded by that. However, the additional condition labels (e.g. cell types) do not necessarily exist when batch effect removal is wanted, so only plotting the variation explained by batches is also supported. 

```r
plotBatchVariance(inSCE = sce.small, useAssay = 'logcounts', batch = 'sample', title = 'Variation Before Correction')
```

```r
plotBatchVariance(inSCE = sce.small, useAssay = 'LIMMA', batch = 'sample', title = 'Variation After Correction')
```

2. Plot the mean expression level of each gene separately for each batch.  

This functionality is implemented in `plotSCEBatchFeatureMean()`. The methodology is straight forward, which plots violin plots for all the batches, and within each batch, the plot illustrates the distribution of the mean expression level of each gene. Thus the batch effect can be observed from the mean and standard deviation of each batch.

3. Plot an embedding to see the grouping of cells  

There is no function special for batch correction, but this can be achieved simply by using the [dimension reduction](dimensionality_reduction.html)/[2D embedding](2d_embedding.html) calculation functions (e.g. `scaterPCA()`, `runUMAP()` and `run`TSNE()`) and `plotSCEDimReduceColData()`.

```r
sce.small <- runUMAP(inSCE = sce.small, initialDims = 25, useAssay = "logcounts", useReducedDim = NULL, logNorm = FALSE)
plotSCEDimReduceColData(inSCE = sce.small, colorBy = 'sample', reducedDimName = 'UMAP', title = 'UMAP Before Correction')
```

```r
sce.small <- runUMAP(inSCE = sce.small, useAssay = 'LIMMA', useReducedDim = NULL, initialDims = 25, logNorm = FALSE, reducedDimName = 'UMAP_LIMMA')
plotSCEDimReduceColData(inSCE = sce.small, colorBy = 'sample', reducedDimName = 'UMAP_LIMMA', title = 'UMAP After Correction')
```

If the cell typing is already given, it is strongly recommended to specify `shape = {colData colname for cell typing}` in `plotSCEDimReduceColData()` to visualize the grouping simultaneously.  

````{=html}
</details> 

<details>
  <summary><b>Method Reference</b></summary>

| Method | Function | Input Type | Output Type | | :----: | :----: | :----: | :----: | | BBKNN [@Polanski2020] | runBBKNN() | Scaled |reducedDim | | ComBatSeq [@ZhangY2020] | runComBatSeq() | raw counts | assay | | FastMNN [@Haghverdi2018] | runFastMNN() | log-normalized | reducedDim | | MNN [@Haghverdi2018] | runMNNCorrect() | log-normalized | assay | | Harmony [@Korsunsky2019] | runHarmony() | PCA | reducedDim | | LIGER [@LiuL2020] | runLIGER() | raw counts | reducedDim | | Limma [@Ritchie2015] | runLimmaBC() | log-normalized | assay | | Scanorama [@Hie2019] | runSCANORAMA() | log-normalized | assay | | scMerge [@LinY2019] | runSCMerge() | log-normalized | assay | | Seurat Integration [@Butler2018a] | runSeuratIntegration() | raw counts | altExp | | ZINBWaVE [@Risso2018] | runZINBWaVE() | raw counts | reducedDim |

{=html} </details> </div> <script> document.getElementById("ia-button").click(); </script> </body>

References



compbiomed/singleCellTK documentation built on Oct. 27, 2024, 3:26 a.m.