filter_data | R Documentation |
This function is designed to filter the numeric data in form of data.frame
or SummarizedExperiment
. The filtering builds on two functions pOverA
and cv
from the package genefilter (Gentleman et al. 2018).
filter_data(
data,
assay.na = NULL,
pOA = c(0, 0),
CV = c(-Inf, Inf),
top.CV = 1,
desc = NULL,
sam.factor = NULL,
con.factor = NULL,
file = NULL,
verbose = TRUE
)
data |
|
assay.na |
The name of target assay to use when |
pOA |
Parameters of the filter function |
CV |
Parameters of the filter function |
top.CV |
The proportion (0-1) of top coefficient of variations (CVs). Only genes with CVs in this proportion are kept. E.g. if |
desc |
A column name in the |
sam.factor |
The column name corresponding to spatial features in |
con.factor |
The column name corresponding to experimental variables in |
file |
The output file name for saving the filtered data matrix in TSV format, which is ready to upload in the Shiny App (see |
verbose |
Logical. If TRUE (default), the summary of statistics is printed. |
An object of a data.frame
or SummarizedExperiment
, depending on the input data.
Jianhai Zhang jzhan067@ucr.edu
Dr. Thomas Girke thomas.girke@ucr.edu
Gentleman, R, V Carey, W Huber, and F Hahne. 2018. "Genefilter: Methods for Filtering Genes from High-Throughput Experiments." http://bioconductor.uib.no/2.7/bioc/html/genefilter.html
Matt Dowle and Arun Srinivasan (2017). data.table: Extension of 'data.frame'. R package version 1.10.4. https://CRAN.R-project.org/package=data.table
Martin Morgan, Valerie Obenchain, Jim Hester and Hervé Pagès (2018). SummarizedExperiment: SummarizedExperiment container. R package version 1.10.1
R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/
Keays, Maria. 2019. ExpressionAtlas: Download Datasets from EMBL-EBI Expression Atlas
Love, Michael I., Wolfgang Huber, and Simon Anders. 2014. "Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2." Genome Biology 15 (12): 550. doi:10.1186/s13059-014-0550-8
Cardoso-Moreira, Margarida, Jean Halbert, Delphine Valloton, Britta Velten, Chunyan Chen, Yi Shao, Angélica Liechti, et al. 2019. “Gene Expression Across Mammalian Organ Development.” Nature 571 (7766): 505–9
Amezquita R, Lun A, Becht E, Carey V, Carpp L, Geistlinger L, Marini F, Rue-Albrecht K, Risso D, Soneson C, Waldron L, Pages H, Smith M, Huber W, Morgan M, Gottardo R, Hicks S (2020). “Orchestrating single-cell analysis with Bioconductor.” Nature Methods, 17, 137–145. https://www.nature.com/articles/s41592-019-0654-x
## Two example data sets are showcased for the data formats of "data.frame" and
## "SummarizedExperiment" respectively. Both come from an RNA-seq analysis on
## For conveninece, they are included in this package. The complete raw count data are
## downloaded using the R package ExpressionAtlas (Keays 2019) with the accession
## number "E-MTAB-6769".
# Access example data 1.
df.chk <- read.table(system.file('extdata/shinyApp/data/count_chicken_simple.txt',
package='spatialHeatmap'), header=TRUE, row.names=1, sep='\t', check.names=FALSE)
# Column names follow the naming scheme
# "spatialFeature__variable".
df.chk[1:3, ]
# A column of gene description can be optionally appended.
ann <- paste0('ann', seq_len(nrow(df.chk))); ann[1:3]
df.chk <- cbind(df.chk, ann=ann)
df.chk[1:3, ]
# Access example data 2.
count.chk <- read.table(system.file('extdata/shinyApp/data/count_chicken.txt',
package='spatialHeatmap'), header=TRUE, row.names=1, sep='\t')
count.chk[1:3, 1:5]
# A targets file describing spatial features and variables is required for example
# data 2, which should be made based on the experiment design.
# Access the targets file.
target.chk <- read.table(system.file('extdata/shinyApp/data/target_chicken.txt',
package='spatialHeatmap'), header=TRUE, row.names=1, sep='\t')
# Every column in example data 2 corresponds with a row in the targets file.
target.chk[1:5, ]
# Store example data 2 in "SummarizedExperiment".
library(SummarizedExperiment)
se.chk <- SummarizedExperiment(assay=count.chk, colData=target.chk)
# The "rowData" slot can optionally store a data frame of gene annotation.
rowData(se.chk) <- DataFrame(ann=ann)
# Normalize data.
df.chk.nor <- norm_data(data=df.chk, norm.fun='CNF', log2.trans=TRUE)
se.chk.nor <- norm_data(data=se.chk, norm.fun='CNF', log2.trans=TRUE)
# Aggregate replicates of "spatialFeature_variable", where spatial features are organs
# and variables are ages.
df.chk.aggr <- aggr_rep(data=df.chk.nor, aggr='mean')
df.chk.aggr[1:3, ]
se.chk.aggr <- aggr_rep(data=se.chk.nor, sam.factor='organism_part', con.factor='age',
aggr='mean')
assay(se.chk.aggr)[1:3, 1:3]
# Genes with experssion values >= 5 in at least 1% of all samples (pOA), and coefficient
# of variance (CV) between 0.2 and 100 are retained.
df.chk.fil <- filter_data(data=df.chk.aggr, pOA=c(0.01, 5), CV=c(0.2, 100))
se.chk.fil <- filter_data(data=se.chk.aggr, sam.factor='organism_part', con.factor='age',
pOA=c(0.01, 5), CV=c(0.2, 100), file=NULL)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.