fpca: Runs a principal components analysis, the facile way.
In facilebio/FacileAnalysis: Modularized and interactive analyses over a FacileDataStore

View source: R/fpca-base.R

fpca	R Documentation

Runs a principal components analysis, the facile way.

Description

Performs a principal components analysis over a specified assay from the (subset of) samples in a FacileDataStore.

Usage

fpca(
  x,
  assay_name = NULL,
  dims = 5,
  features = NULL,
  filter = "variance",
  ntop = 1000,
  row_covariates = NULL,
  col_covariates = NULL,
  batch = NULL,
  main = NULL,
  ...,
  metadata = list()
)

## S3 method for class 'facile_frame'
fpca(
  x,
  assay_name = NULL,
  dims = min(5, nrow(collect(x, n = Inf)) - 1L),
  features = NULL,
  filter = "variance",
  ntop = 1000,
  row_covariates = NULL,
  col_covariates = NULL,
  batch = NULL,
  main = NULL,
  custom_key = Sys.getenv("USER"),
  ...,
  metadata = list()
)

## S3 method for class 'FacilePcaAnalysisResult'
compare(x, y, run_all = TRUE, rerun = TRUE, ...)

Arguments

`x`	a facile data container (FacileDataSet), or a `facile_frame` (refer to the FacileDataStore (facile_frame) section.
`assay_name`	the name of the assay to extract data from to perform the PCA. If not specified, default assays are taken for each type of assay container (ie. `⁠default_assay(facile container)⁠`, `"counts"` for a `DGEList`, `assayNames(SummarizedExperiment)[1L]`, etc.)
`dims`	the number of PC's to calculate (minimum is 3).
`features`	A feature descriptor of the features to use for the analysis. If `NULL` (default), then the specified `filter` strategy is used.
`filter`	A strategy used to identify which features to use for the dimensionality reduction. The current (and only choice) is `"default"`, which takes the `ntop` features, sorted be decreasing variance.
`ntop`	the number of features (genes) to include in the PCA. Genes are ranked by decreasing variance across the samples in `x`.
`row_covariates`, `col_covariates`	data.frames that provie meta information for the features (rows) and samples (columns). The default is to get these values from "the obvious places" given `x` (`⁠$genes⁠` and `⁠$samples⁠` for a DGEList, or the sample and feature-level covariate database tables from a FacileDataSet, for example).
`batch`, `main`	specify the covariates to use for batch effect removal. Refer to the `FacileData::remove_batch_effect()` help for more information.
`rerun`	when `rerun = TRUE` (default), the `fpca(x)` and `fpca(y)` will be rerun over the union of the features in `x` and `y`.

Details

The FacilePcaAnalysisResult produced here can be used in "the usual" ways, ie. can be viz-ualized. shine() is 1/4th-implemented, and report() has not been worked on yet.

Importantly / interestingly, you can shoot this result into ffsea() to perform gene set enrichment analysis over a specified dimension to identify functional categories loaded onto differend PCs.

Value

an fpca result

Batch Correction

Because we assume that PCA is performed on normalized data, we leverage the batch correction facilities provided by the batch and main parameters in the FacileData::fetch_assay_data() pipeline. If your samples have a "sex" covariate defined, for example, you can perform a PCA with sex-corrected expression values like so: fpca(samples, batch = "sex")

Features Used for PCA

By default, fpca() will assess the variance of all the features (genes) to perform PCA over, and will keep the top ntop ones. This behavior is determined by the following three parameters:

filter determines the method by which features are selected for analysis. Currently you can only choose "variance" (the default) or "none".
features determines the universe of features that are available for the analysis. When NULL (default), all features for the given assay will be loaded and filtered using the specification of the filter parameter. If a feature descriptor is provided and filter is not specified, then we assume that these are the exact features to run the analysis on, and filter defaults to "none". You may, however, intend for features to define the universe of features to use prior to filtering, perhaps to perform a PCA on only a certain set of genes (protein coding), but then filter those further by variance. In this case, you will need to pass in the feature descriptor for the universe of features you want to consider, then explicity set filter = "variance".
ntop the default "top" number of features to take when filtering by variance.

Development Notes

Follow progress on implementation of shine() and report() below:

Implement report()

Note that there are methods defined for other assay containers, like an edgeR::DGEList, limma::EList, and SummarizedExperiment. If these are called directly, their downstream use within the facile ecosystem isn't yet fully supported. Development of the FacileBioc package will address this.

Random Things to elaborate on

The code here is largely inspired by DESeq2's plotPCA.

You should look at factominer:

http://factominer.free.fr/factomethods/index.html
http://factominer.free.fr/graphs/factoshiny.html

Teaching and Tutorials

This looks like a useful tutorial to use when explaining the utility of PCA analysis: http://alexhwilliams.info/itsneuronalblog/2016/03/27/pca/

High-Dimensional Data Analysis course by Rafa Irizarry and Michael Love https://online-learning.harvard.edu/course/data-analysis-life-sciences-4-high-dimensional-data-analysis?category[]=84&sort_by=date_added&cost[]=free

FacileDataStore (facile_frame)

We enable the user to supply extra sample covariates that are not found in the FacileDataStore associated with these samples x by adding them as extra columns to x.

If manually provioded col_covariates have the same name as internal sample covariates, then the manually provided ones will supersede the internals.

Comparing PCA Results

We can compare two PCA results. Currently this just means we compare the loadings of the features along each PC from fpca result x and y.

Examples

efds <- FacileData::exampleFacileDataSet()

# A subset of samples ------------------------------------------------------
pca.crc <- efds |>
  FacileData::filter_samples(indication == "CRC") |>
  fpca()
if (interactive()) {
  # report(pca.crc, color_aes = "sample_type")
  shine(pca.crc)
  viz(pca.crc, color_aes = "sex")
}

# Regress "sex" out from expression data
pca.crcs <- FacileData::samples(pca.crc) |>
  fpca(batch = "sex")
if (interactive()) {
  viz(pca.crcs, color_aes = "sex")
}

# Perform PCA on only the protein coding genes
genes.pc <- features(efds) |> subset(meta == "protein_coding")
pca.crc.pc <- samples(pca.crc) |>
  fpca(features = genes.pc, filter = "variance")

pca.gdb <- pca.crc |>
  signature(dims = 1:3) |>
  result() |>
  sparrow::GeneSetDb()

# All samples --------------------------------------------------------------
pca.all <- fpca(efds)
if (interactive()) {
  viz(pca.all, color_aes = "indication", shape_aes = "sample_type")
  # report(pca.all, color_aes = "indication", shape_aes = "sample_type")
}
efds <- FacileData::exampleFacileDataSet()
p1 <- efds |>
  FacileData::filter_samples(indication == "CRC") |>
  fpca()
p2 <- efds |>
  FacileData::filter_samples(indication == "BLCA") |>
  fpca()
pcmp <- compare(p1, p2)

facilebio/FacileAnalysis documentation built on April 5, 2025, 2:42 p.m.

facilebio/FacileAnalysis index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

facilebio/FacileAnalysis
Modularized and interactive analyses over a FacileDataStore

fpca: Runs a principal components analysis, the facile way.
In facilebio/FacileAnalysis: Modularized and interactive analyses over a FacileDataStore

Runs a principal components analysis, the facile way.

Description

Usage

Arguments

Details

Value

Batch Correction

Features Used for PCA

Development Notes

Random Things to elaborate on

Teaching and Tutorials

FacileDataStore (facile_frame)

Comparing PCA Results

Examples

Related to fpca in facilebio/FacileAnalysis...

R Package Documentation

Browse R Packages

We want your feedback!

facilebio/FacileAnalysis Modularized and interactive analyses over a FacileDataStore

fpca: Runs a principal components analysis, the facile way. In facilebio/FacileAnalysis: Modularized and interactive analyses over a FacileDataStore

Runs a principal components analysis, the facile way.

Description

Usage

Arguments

Details

Value

Batch Correction

Features Used for PCA

Development Notes

Random Things to elaborate on

Teaching and Tutorials

FacileDataStore (facile_frame)

Comparing PCA Results

Examples

Related to fpca in facilebio/FacileAnalysis...

R Package Documentation

Browse R Packages

We want your feedback!

facilebio/FacileAnalysis
Modularized and interactive analyses over a FacileDataStore

fpca: Runs a principal components analysis, the facile way.
In facilebio/FacileAnalysis: Modularized and interactive analyses over a FacileDataStore