knitr::opts_chunk$set( collapse = TRUE, comment = "#>", crop = NULL ## Related to ## https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html )
## Track time spent on making the vignette startTime <- Sys.time() ## Bib setup library("knitcitations") ## Load knitcitations with a clean bibliography cleanbib() cite_options(hyperlink = "to.doc", citation_format = "text", style = "html") ## Write bibliography information bib <- c( R = citation(), BiocStyle = citation("BiocStyle")[1], knitcitations = citation("knitcitations")[1], knitr = citation("knitr")[1], rmarkdown = citation("rmarkdown")[1], sessioninfo = citation("sessioninfo")[1], testthat = citation("testthat")[1], ISAnalytics = citation("ISAnalytics")[1] ) write.bibtex(bib, file = "collision_removal.bib")
To install the package run the following code:
## For release version if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install("ISAnalytics") ## For devel version if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } # The following initializes usage of Bioc devel BiocManager::install(version = "devel") BiocManager::install("ISAnalytics")
To install from GitHub:
# For release version if (!require(devtools)) { install.packages("devtools") } devtools::install_github("calabrialab/ISAnalytics", ref = "RELEASE_3_12", dependencies = TRUE, build_vignettes = TRUE ) ## Safer option for vignette building issue devtools::install_github("calabrialab/ISAnalytics", ref = "RELEASE_3_12" ) # For devel version if (!require(devtools)) { install.packages("devtools") } devtools::install_github("calabrialab/ISAnalytics", ref = "master", dependencies = TRUE, build_vignettes = TRUE ) ## Safer option for vignette building issue devtools::install_github("calabrialab/ISAnalytics", ref = "master" )
library(ISAnalytics)
ISAnalytics
has a verbose option that allows some functions to print
additional information to the console while they're executing.
To disable this feature do:
# DISABLE options("ISAnalytics.verbose" = FALSE) # ENABLE options("ISAnalytics.verbose" = TRUE)
Some functions also produce report in a user-friendly HTML format, to set this feature:
# DISABLE HTML REPORTS options("ISAnalytics.widgets" = FALSE) # ENABLE HTML REPORTS options("ISAnalytics.widgets" = TRUE)
We're not going into too much detail here, but we're going to explain in a very simple way what is a "collision" and how the function in this package deal with them.
We say that an integration (aka a unique combination of chromosome,
integration locus and strand) is a collision if this combination is shared
between different independent samples: an independent sample is a unique
combination of ProjectID
and SubjectID
(where subjects usually represent
patients). The reason behind this is that it's highly improbable to observe
the very same integration in two different subjects and this phenomenon might
be an indicator of some kind of contamination in the sequencing phase or in
PCR phase, for this reason we might want to exclude such contamination from
our analysis.
ISAnalytics
provides a function that processes the imported data for the
removal or reassignment of these "problematic" integrations,
remove_collisions
.
The processing is done on the sequence count matrix (after import) and matrices of other quantification types are re-aligned accordingly.
The remove_collisions
function follows several logical steps to decide whether
an integration is a collision and if it is it decides whether to re-assign it or
remove it entirely based on different criterias.
As we said before, a collision is a triplet made of chr
, integration locus
and strand
, which is shared between different independent samples, aka a pair
made of ProjectID
and SubjectID
. The function uses the information stored
in the association file to assess which independent samples are present and
counts the number of independent samples for each integration: those who have a
count > 1 are considered collisions.
Once the collisions are identified, the function follows 3 steps where it tries to re-assign the combination to a single independent sample. The criterias are:
reads_ratio
), the default value is 10.If none of the criterias were sufficient to make a decision, the integration is simply removed from the matrix.
To know more about import functions take a look at the vignette "How to use import functions".
Import your association file:
withr::with_options(list(ISAnalytics.widgets = FALSE), { path_AF <- system.file("extdata", "ex_association_file.tsv", package = "ISAnalytics" ) root_correct <- system.file("extdata", "fs.zip", package = "ISAnalytics" ) root_correct <- unzip_file_system(root_correct, "fs") association_file <- import_association_file(path_AF, root_correct, dates_format = "dmy") })
Important notes on the association file:
# This imports both sequence count and fragment estimate matrices withr::with_options(list(ISAnalytics.widgets = FALSE), { matrices <- import_parallel_Vispa2Matrices_auto( association_file = association_file, root = NULL, quantification_type = c("fragmentEstimate", "seqCount"), matrix_type = "annotated", workers = 2, patterns = NULL, matching_opt = "ANY", multi_quant_matrix = FALSE ) })
As stated in the introduction, it is fundamental that the sequence count matrix is present for the collision removal process to take place.
You can process the collisions in 3 different ways.
# Pass the whole named list withr::with_options(list(ISAnalytics.widgets = FALSE), { matrices_processed <- remove_collisions( x = matrices, association_file = association_file, date_col = "SequencingDate", reads_ratio = 10 ) })
If you have the "widgets" option active, a report file is produced at the end
that shows the before and after for each subject (and some other details).
This report is an HTML widget, so you can save it or export it for future
reference if you need it.
In this case, collision removal is done on the sequence
count matrix and other matrices are re-aligned automatically.
# Pass the sequence count matrix only withr::with_options(list(ISAnalytics.widgets = FALSE), { matrices_processed_single <- remove_collisions( x = matrices$seqCount, association_file = association_file, date_col = "SequencingDate", reads_ratio = 10 ) })
If you have the "verbose" option active, a console message will remind you to align other matrices if you have them at a later time.
If you'd like to avoid the re-alignment phase, you can call collision removal
on a multi-quantification matrix obtained via the function comparison_matrix
:
# Obtain multi-quantification matrix multi <- comparison_matrix(matrices) multi withr::with_options(list(ISAnalytics.widgets = FALSE), { matrices_processed_multi <- remove_collisions( x = multi, association_file = association_file, date_col = "SequencingDate", reads_ratio = 10, seq_count_col = "seqCount" ) })
As you can see, comparison_matrix
produces a single integration matrix
from the named list of single quantification matrices. This is the recommended
approach if you don't have specific needs as it negates the necessity of
realigning matrices altogether.
If you have opted for the second way, to realign other matrices you have
to call the function realign_after_collisions
, passing as input the
processed sequence count matrix and the named list of other matrices
to realign.
NOTE: the names in the list must be quantification types.
seq_count_proc <- matrices_processed_single other_matrices <- matrices[!names(matrices) %in% "seqCount"] # Select only matrices that are not relative to sequence count other_realigned <- realign_after_collisions(seq_count_proc, other_matrices)
The r Biocpkg("ISAnalytics")
package r citep(bib[["ISAnalytics"]])
was made possible thanks to:
r citep(bib[["R"]])
r Biocpkg("BiocStyle")
r citep(bib[["BiocStyle"]])
r CRANpkg("knitcitations")
r citep(bib[["knitcitations"]])
r CRANpkg("knitr")
r citep(bib[["knitr"]])
r CRANpkg("rmarkdown")
r citep(bib[["rmarkdown"]])
r CRANpkg("sessioninfo")
r citep(bib[["sessioninfo"]])
r CRANpkg("testthat")
r citep(bib[["testthat"]])
This package was developed using
r BiocStyle::Githubpkg("lcolladotor/biocthis")
.
R
session information.
## Session info library("sessioninfo") options(width = 120) session_info()
This vignette was generated using r Biocpkg("BiocStyle")
r citep(bib[["BiocStyle"]])
with r CRANpkg("knitr")
r citep(bib[["knitr"]])
and
r CRANpkg("rmarkdown")
r citep(bib[["rmarkdown"]])
running behind the scenes.
Citations made with r CRANpkg("knitcitations")
r citep(bib[["knitcitations"]])
.
## Print bibliography bibliography()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.