ISAnalytics is an R package developed to analyze gene therapy vector insertion sites data identified from genomics next generation sequencing reads for clonal tracking studies.
In gene therapy, stem cells are modified using viral vectors to deliver the therapeutic transgene and replace functional properties since the genetic modification is stable and inherited in all cell progeny. The retrieval and mapping of the sequences flanking the virus-host DNA junctions allows the identification of insertion sites (IS), essential for monitoring the evolution of genetically modified cells in vivo. A comprehensive toolkit for the analysis of IS is required to foster clonal tracking studies and supporting the assessment of safety and long term efficacy in vivo. This package is aimed at (1) supporting automation of IS workflow, (2) performing base and advance analysis for IS tracking (clonal abundance, clonal expansions and statistics for insertional mutagenesis, etc.), (3) providing basic biology insights of transduced stem cells in vivo.
The paper is available here https://academic.oup.com/bib/article/24/1/bbac551/6955274?login=false
You can visit the package website to view documentation, vignettes and more.
ISAnalytics
can be installed quickly in different ways:
devtools
There are always 2 versions of the package active:
RELEASE
is the latest stable versionDEVEL
is the development version, it is the most up-to-date version
where all new features are introducedRELEASE version:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("ISAnalytics")
DEVEL version:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
# The following initializes usage of Bioc devel
BiocManager::install(version='devel')
BiocManager::install("ISAnalytics")
RELEASE:
if (!require(devtools)) {
install.packages("devtools")
}
devtools::install_github("calabrialab/ISAnalytics",
ref = "RELEASE_3_17",
dependencies = TRUE,
build_vignettes = TRUE)
DEVEL:
if (!require(devtools)) {
install.packages("devtools")
}
devtools::install_github("calabrialab/ISAnalytics",
ref = "devel",
dependencies = TRUE,
build_vignettes = TRUE)
ISAnalytics
has a verbose option that allows some functions to print
additional information to the console while they’re executing. To
disable this feature do:
# DISABLE
options("ISAnalytics.verbose" = FALSE)
# ENABLE
options("ISAnalytics.verbose" = TRUE)
Some functions also produce report in a user-friendly HTML format, to set this feature:
# DISABLE HTML REPORTS
options("ISAnalytics.reports" = FALSE)
# ENABLE HTML REPORTS
options("ISAnalytics.reports" = TRUE)
default_af_transform()
, transformation failed
if NAs were present in the columnsmagrittr
data.table
now it’s completely
optional and will be used internally only if the package is availableImports
to Suggests
- functions
will notify when additional packages are requested for the specific
functionalityavailable_tags()
HSC_population_size_estimate()
now better supports the
computation of estimates from different groups of cell types and
tissues at the same time. The tabular output now contains an
additional column “Timepoints_included” that specifies how many time
points the estimate containsis_sharing()
can now handle better limit cases and has the
option of being parallelised provided appropriate packages are
available (better performance)import_parallel_Vispa2Matrices_auto()
and
import_parallel_Vispa2Matrices_interactive()
are officially defunct
and will not be exported anymore starting from the next release cyclemode
of import_parallel_Vispa2Matrices()
no longer
accepts INTERACTIVE
as a valid option and the interactive mode is
considered now defunct, since the usage is very limiting and limitedassociation_file
of import_parallel_Vispa2Matrices()
no longer accepts a string representing a path. Association file
import is delegated solely to its dedicated function from now on.threshold_filter()
is deprecated, since its use is
rather complicated instead of using standard filtering with dplyr or
similar toolsdefault_af_transform()
now pads time points based on the maximum
number of characters + 1 in the columntop_abund_tableGrob()
- now
the function has a new argument transform_by
which is useful for
controlling ordering of columnsDT
has been moved (likely temporarily) in Imports - linked
to issue https://github.com/calabrialab/ISAnalytics/issues/2tidyselect
warnings (internal use of .data\$ in selection
context)gene_frequency_fisher
progressr
, added a wrapper function for fast enabling
progress bars, enable_progress_bars()
HSC_population_size_estimate()
-
signals eventual problems in computing estimates and whyCIS_grubbs
function is now faster (removed dependency from
psych::describe
)CIS_grubbs_overtime()
and associated plotting function
top_cis_overtime_heatmap()
to compute CIS_grubbs test over timeimport_association_file()
- function had minor
issues when importing *.xlsx files and missing optional columns threw
errorsas_sparse_matrix()
- function failed when trying to
process an aggregated matrixexport_ISA_settings()
and
import_ISA_settings()
that allow a faster workflow setupcompute_near_integrations()
- function errored
when report_path
argument was set to NULL
integration_alluvial_plot()
internalsremove_collisions()
use again dplyr internally for
joining and grouping operations - needed because of performance issues
with data.tablefisher_scatterplot()
has 2 new arguments that allow the disabling of
highlighting for some genes even if their p-value is under the
thresholdvignette("workflow_start", package="ISAnalytics")
gene_frequency_fisher()
is a new function of the analysis family
that allows the computation of Fisher’s exact test p-values on gene
frequency - fisher_scatterplot()
is the associated plotting functiontop_targeted_genes()
is a new function of the analysis family that
produces the top n targeted genes based on the number of ISNGSdataExplorer()
is a newly implemented Shiny interface that allows
the exploration and plotting of datagenerate_default_folder_structure()
generates the standard folder structure with package-included data
on-demandtransform_columns()
is a new utility function, also used internally
by other exported functions, that allows arbitrary transformations on
data frame columnsremove_collisions()
now has a dedicated parameter to specify how
independent samples are identifiedcompute_near_integration_sites()
now has a parameter called
additional_agg_lambda()
to allow aggregation of additional columnsCIS_grubbs()
now signals if there are missing genes in the refgenes
table and eventually returns them as a dfoutlier_filter()
is now able to take multiple tests in input and
combine them with a given logic. It now also produces an HTML report.integration_alluvial_plot()
import_Vispa2_stats()
- function failed when
passing report_path = NULL
circos_genomic_density()
when trying to use a
pdf deviceunzip_file_system()
was made defunct in favor of
generate_default_folder_structure()
cumulative_count_union()
was deprecated and its functionality was
moved to cumulative_is()
fragmentEstimate_column
and
fragmentEstimate_threshold
in HSC_population_size_estimate()
.
Slightly revised filtering logic.max_workers
in function remove_collisions()
aggregate_metadata()
import_Vispa2_stats()
from
import_association_file()
remove_collisions()
: if process
fails function doesn’t stopiss_source()
refGenes_mm9
and function
compute_near_integrations()
purity_filter()
is_sharing()
function, detailed usage in vignette
vignette("sharing_analyses", package = "ISAnalytics")
cumulative_is()
sharing_venn()
vignette("report_system", package = "ISAnalytics")
ISAnalytics.widgets
option has been replaced by
ISAnalytics.reports
remove_collisions()
, removed arguments seq_count_col
,
max_rows_reports
and save_widget_path
, added arguments
quant_cols
and report_path
(see documentation for details)import_single_Vispa2Matrix()
now allows keeping additional
non-standard columnscompute_near_integrations()
is now faster on bigger data setscolumns
and key
in
compute_abundance()
compute_near_integrations()
now produces only re-calibration map in
*.tsv formatCIS_grubbs()
now supports calculations for each group specified in
argument by
sample_statistics()
now there is the option to include the
calculation of distinct integration sites for each group (if mandatory
vars are present)circos_genomic_density()
import_parallel_Vispa2Matrices_interactive()
and
import_parallel_Vispa2Matrices_auto()
are officially deprecated in
favor of import_parallel_Vispa2Matrices()
is_sharing
computes the sharing of IS between groupssharing_heatmap
allows visualization of sharing data through
heatmapsintegration_alluvial_plot
allows visualization of integration sites
distribution in groups over time.top_abund_tableGrob
can be used in combination with the previous
function or by itself to obtain a summary of top abundant integrations
as an R graphic (tableGrob) object that can be combined with plots.default_stats
generate_Vispa2_launch_AF
HSC_population_size_estimate
and HSC_population_plot
allow
estimates on hematopoietic stem cell population sizeimport_Vispa2_stats
outlier_filter
and outliers_by_pool_fragments
offer a mean to
filter poorly represented samples based on custom outliers testsimport_stats
of aggregate_metadata
is officially
deprecated in favor of import_Vispa2_stats
aggregate_metadata
is now a lot more flexible on what operations can
be performed on columns via the new argument aggregating_functions
import_association_file
allows directly for the import of Vispa2
stats and converts time points to months and years where not already
presentimport_association_file
now produces 3
separate columns for pathsseparate_quant_matrices
and comparison_matrix
now do not require
mandatory columns other than the quantifications - this allows for
separation or joining also for aggregated matricesCIS_volcano_plot
that caused duplication of
some labels if highlighted genes were provided in inputcompute_near_integrations
: when provided
recalibration map export path as a folder now the function works
correctly and produces an automatically generated file nameaggregate_metadata
: now paths to folder that contains
Vispa2 stats is looked up correctly. Also, VISPA2 stats columns are
aggregated if found in the input data frame independently from the
parameter import_stats
.compute_abundance
can now take as input aggregated matrices and has
additional parameters to offer more flexibility to the user. Major
updates and improvements also on documentation and reproducible
examples.import_single_Vispa2Matrix
: import is
now preferentially carried out using data.table::fread
greatly
speeding up the process - where not possible readr::read_delim
is
used insteadimport_association_file
: greatly
improved parsing precision (each column has a dedicated type), import
report now signals parsing problems and their location and signals
also problems in parsing dates. Report also includes potential
problems in column names and signals missing data in important
columns. Added also the possibility to give various file formats in
input including *.xls(x)
formats.top_integrations
can now take additional parameters to
compute top n genes for each specified groupCIS_volcano_plot
due to poor
precision (easier to add faceting manually) and added parameters to
return the data frame that generated the plot as an additional result.
Also, it is now possible to specify a vector of gene names to
highlight even if they’re not above the annotation threshold.remove_collisions
CIS_grubbs
and cumulative_count_union
CIS_volcano_plot
sample_statistics
aggregate_values_by_key
has a simplified interface and supports
multi-quantification matricesimport_parallel_Vispa2Matrices_interactive
and
import_parallel_Vispa2Matrices_auto
now have an option to return a
multi-quantification matrix directly after import instead of a listthreshold_filter
, top_integrations
compute_abundance
comparison_matrix
that ignored custom column namesISanalytics is officially on bioconductor!
comparison_matrix
and
separate_quant_matrices
as_sparse_matrix
compute_near_integrations
remove_collisions
import_single_Vispa2Matrix
to remove non significant 0
valuesISADataFrame
: now the package only uses standard
tibblesFor help please contact the maintainer of the package or open an issue on GitHub.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.