knitr::opts_chunk$set( collapse = TRUE, comment = "#>", crop = NULL ## Related to ## https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html )
## Track time spent on making the vignette startTime <- Sys.time() ## Bib setup library("knitcitations") ## Load knitcitations with a clean bibliography cleanbib() cite_options(hyperlink = "to.doc", citation_format = "text", style = "html") ## Write bibliography information bib <- c( R = citation(), BiocStyle = citation("BiocStyle")[1], knitcitations = citation("knitcitations")[1], knitr = citation("knitr")[1], rmarkdown = citation("rmarkdown")[1], sessioninfo = citation("sessioninfo")[1], testthat = citation("testthat")[1], ISAnalytics = citation("ISAnalytics")[1] ) write.bibtex(bib, file = "aggregate_function_usage.bib")
In this vignette we're going to explain in detail how to use functions of the aggregate family, namely:
aggregate_metadata
aggregate_values_by_key
To install the package run the following code:
## For release version if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install("ISAnalytics") ## For devel version if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } # The following initializes usage of Bioc devel BiocManager::install(version = "devel") BiocManager::install("ISAnalytics")
To install from GitHub:
# For release version if (!require(devtools)) { install.packages("devtools") } devtools::install_github("calabrialab/ISAnalytics", ref = "RELEASE_3_12", dependencies = TRUE, build_vignettes = TRUE ) ## Safer option for vignette building issue devtools::install_github("calabrialab/ISAnalytics", ref = "RELEASE_3_12" ) # For devel version if (!require(devtools)) { install.packages("devtools") } devtools::install_github("calabrialab/ISAnalytics", ref = "master", dependencies = TRUE, build_vignettes = TRUE ) ## Safer option for vignette building issue devtools::install_github("calabrialab/ISAnalytics", ref = "master" )
library(ISAnalytics)
ISAnalytics
has a verbose option that allows some functions to print
additional information to the console while they're executing.
To disable this feature do:
# DISABLE options("ISAnalytics.verbose" = FALSE) # ENABLE options("ISAnalytics.verbose" = TRUE)
Some functions also produce report in a user-friendly HTML format, to set this feature:
# DISABLE HTML REPORTS options("ISAnalytics.widgets" = FALSE) # ENABLE HTML REPORTS options("ISAnalytics.widgets" = TRUE)
We refer to information contained in the association file as "metadata":
sometimes it's useful to obtain collective information based on a certain
group of variables we're interested in. The function aggregate_metadata
does just that, according to the grouping variables, meaning the names of
the columns in the association file to perform a group_by
operation with,
creates a summary which includes:
FusionPrimerPCRDate_min
- The minimum date in the group for this variableLinearPCRDate_min
- The minimum date in the group for this variableVCN_avg
- The mean of "VCN" column for each groupDNAngUsed_avg
- The mean of "DNAngUsed "column for each groupKapa_avg
- The mean of "Kapa" column for each groupDNAngUsed_sum
- The sum of "DNAngUsed" column for each groupulForPool_sum
- The sum of "ulForPool" column for each groupAggregateMeta_sum
- A string obtained by concatenation of all
of the grouping variables separated by "_"Import the association file via import_assocition_file
. If you need more
information on import function please view the vignette
"How to use import functions".
withr::with_options(list(ISAnalytics.widgets = FALSE), { path_AF <- system.file("extdata", "ex_association_file.tsv", package = "ISAnalytics" ) root_correct <- system.file("extdata", "fs.zip", package = "ISAnalytics" ) root_correct <- unzip_file_system(root_correct, "fs") association_file <- import_association_file(path_AF, root_correct) })
Perform aggregation:
aggregated_meta <- aggregate_metadata(association_file, grouping_keys = c( "SubjectID", "CellMarker", "Tissue", "TimePoint" ), import_stats = FALSE )
knitr::kable(aggregated_meta)
As you can see there is an additional parameter you can set, import_stats
:
if set to TRUE
, the function will automatically look into the file system you
provided as root when you imported the association file and will try to locate
first the iss
folder for each project and then all Vispa2 "stats" files.
Vispa2 stats contain useful information that is not included in the association
file and is linked to the single Vispa2 run. In some cases it's useful to
perform aggregation on those info too. If you set the parameter to TRUE
,
besides the columns mentioned before you will also have:
BARCODE_MUX_sum
- The sum of the "BARCODE_MUX" column for the groupTRIMMING_FINAL_LTRLC_sum
- The sum of the "TRIMMING_FINAL_LTRLC" column for
the groupLV_MAPPED_sum
- The sum of the "LV_MAPPED" column for the groupBWA_MAPPED_OVERALL_sum
- The sum of the "BWA_MAPPED_OVERALL" column for the
groupISS_MAPPED_PP_sum
- The sum of the "ISS_MAPPED_PP" column for the groupwithr::with_options(list(ISAnalytics.widgets = FALSE), { aggregated_meta <- aggregate_metadata(association_file, grouping_keys = c( "SubjectID", "CellMarker", "Tissue", "TimePoint" ), import_stats = TRUE ) })
knitr::kable(aggregated_meta)
If you have the option ISAnalytics.widgets
set to TRUE, this will produce a
report in HTML format that tells you which stats files were imported.
To avoid this, you can set the option to FALSE.
ISAnalytics
contains useful functions to aggregate the values contained in
your imported matrices based on a key, aka a single column or a combination of
columns contained in the association file that are related to the samples.
Import your association file (see previous section) and then import your matrices:
withr::with_options(list(ISAnalytics.widgets = FALSE), { matrices <- import_parallel_Vispa2Matrices_auto( association_file = association_file, root = NULL, quantification_type = c("fragmentEstimate", "seqCount"), matrix_type = "annotated", workers = 2, patterns = NULL, matching_opt = "ANY", multi_quant_matrix = FALSE ) })
The function aggregate_values_by_key
can perform the aggregation both on the
list of matrices and a single matrix.
# Takes the whole list and produces a list in output aggregated_matrices <- aggregate_values_by_key(matrices, association_file) # Takes a single matrix and produces a single matrix as output aggregated_matrices_single <- aggregate_values_by_key( matrices$seqCount, association_file )
knitr::kable(head(aggregated_matrices_single))
The function has several different parameters that have default values that can be changed according to user preference.
key
valuec("SubjectID", "CellMarker",
"Tissue", "TimePoint")
(same default key as the aggregate_metadata
function.agg1 <- aggregate_values_by_key( x = matrices$seqCount, association_file = association_file, key = c("SubjectID", "ProjectID") )
knitr::kable(head(agg1))
lambda
valuelambda
parameter indicates the function(s) to be applied to the
values for aggregation.
lambda
must be a named list of either functions or purrr-style lambdas:
if you would like to specify additional parameters to the function
the second option is recommended.
The only important note on functions is that they should perform some kind of
aggregation on numeric values: this means in practical terms they need
to accept a vector of numeric/integer values as input and produce a
SINGLE value as output. Valid options for this purpose might be: sum
, mean
,
median
, min
, max
and so on.agg2 <- aggregate_values_by_key( x = matrices$seqCount, association_file = association_file, key = "SubjectID", lambda = list(mean = ~ mean(.x, na.rm = TRUE)) )
knitr::kable(head(agg2))
Note that, when specifying purrr-style lambdas (formulas), the first
parameter needs to be set to .x
, other parameters can be set as usual.
You can also use in lambda
functions that produce data frames or lists.
In this case all variables from the produced data frame will be included
in the final data frame. For example:
agg3 <- aggregate_values_by_key( x = matrices$seqCount, association_file = association_file, key = "SubjectID", lambda = list(describe = psych::describe) ) agg3
value_cols
valuevalue_cols
parameter tells the function on which numeric columns
of x the functions should be applied.
NOte that every function contained in lambda
will be applied to every
column in value_cols
: resulting columns will be named as
"original name_function applied".## Obtaining multi-quantification matrix comp <- comparison_matrix(matrices) agg4 <- aggregate_values_by_key( x = comp, association_file = association_file, key = "SubjectID", lambda = list(sum = sum, mean = mean), value_cols = c("seqCount", "fragmentEstimate") )
knitr::kable(head(agg4))
group
valuegroup
parameter should contain all other variables to include in the
grouping besides key
. By default this contains c("chr", "integration_locus",
"strand", "GeneName", "GeneStrand")
. You can change this grouping as you see
fit, if you don't want to add any other variable to the key, just set it to
NULL.agg5 <- aggregate_values_by_key( x = matrices$seqCount, association_file = association_file, key = "SubjectID", lambda = list(sum = sum, mean = mean), group = c(mandatory_IS_vars()) )
knitr::kable(head(agg5))
The r Biocpkg("ISAnalytics")
package r citep(bib[["ISAnalytics"]])
was made possible thanks to:
r citep(bib[["R"]])
r Biocpkg("BiocStyle")
r citep(bib[["BiocStyle"]])
r CRANpkg("knitcitations")
r citep(bib[["knitcitations"]])
r CRANpkg("knitr")
r citep(bib[["knitr"]])
r CRANpkg("rmarkdown")
r citep(bib[["rmarkdown"]])
r CRANpkg("sessioninfo")
r citep(bib[["sessioninfo"]])
r CRANpkg("testthat")
r citep(bib[["testthat"]])
This package was developed using
r BiocStyle::Githubpkg("lcolladotor/biocthis")
.
R
session information.
## Session info library("sessioninfo") options(width = 120) session_info()
This vignette was generated using r Biocpkg("BiocStyle")
r citep(bib[["BiocStyle"]])
with r CRANpkg("knitr")
r citep(bib[["knitr"]])
and
r CRANpkg("rmarkdown")
r citep(bib[["rmarkdown"]])
running behind the scenes.
Citations made with r CRANpkg("knitcitations")
r citep(bib[["knitcitations"]])
.
## Print bibliography bibliography()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.