A R package that determine the differentiation stage of pancreatic neuroendocrinal neoplasia (PANnen) based on bulk RNA-seq of or mRNA array exepriments on cancer-tissue.
Start an R session e.g. using RStudio
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("artdeco")
57 representative PANnen bulk RNA-seq samples are shipped along with the R package. These will in the following be analyzed by the artdeco algorithm to demonstrate the utilization of artdeco.
library("artdeco") # obtain path to testdata path_transcriptome_file = system.file( "/Data/Expression_data/PANnen_Test_Data.tsv", package="artdeco" ) # testrun deconvolution_results_absolute = Determine_differentiation_stage( transcriptome_file_path = path_transcriptome_file, baseline = "absolute" ) # show results deconvolution_results_absolute[1:2,]
The call returns a dataframe that contains the differentiation stage similarity predictions.
Percent-wise similarities of the testdata samples to all scRNA derived training samples are shown, both as written interpretation e.g. "significant similarity (to this cell type)" or "no significant similarity (to this cell type)" as well as percent scalars.
The percent value quantifies to what extend a given transcriptome can be reconstructed utilizing a given reference cell type e.g. by the insulin producing 'beta' cell type as basis. The nu-SVM regression based similarity determination has the limitation that the returned unit-less scalar is challenging to interpret: It is not clearly decidable when a scaler indicates a high or low similarity. Thus, artdeco interprets the returned scalar. Measurements may be calculated based on a trained model (absolute) or relative to all analyzed of within a given run (relative). The default 'absolute' mode interprets the return scalar for a given samples relative to the maximal similarity measured for each model subtype during training. I.e. the returned scaler is divided by the maximal similarity observed for training set samples. E.g. when a scRNA beta cell received the similarity 5.7 to the beta set, than all subsequent beta similarities are divided by 5.7 to show how similar they are relative to the self-identification of cell subtypes. The 'relative' mode divides by the maximal similarity measured for a given set of query data, i.e. the maximally measured similarity is taken as divisor.
# testrun in relative mode deconvolution_results_relative = Determine_differentiation_stage( transcriptome_file_path = path_transcriptome_file, baseline = "relative" ) # show results deconvolution_results_relative[1:2,]
It is important to interpret the simialrity predictions given the clustering of the data. In order to detect clusters, a correlation heatmap can be used along with an overlay of similarity predictions. The rational is, that samples that show a comparable differentiation state should cluster together, which is why a heatmap creator is included in the artdeco package and can be utilized as follows. First the analyzed transcriptome data is load and afterwards the similarity predictions from step one are being passed on to the visualization function.
visualization_data_path = system.file( "/Data/Expression_data/Visualization_PANnen.tsv", package="artdeco") create_heatmap_differentiation_stages( visualization_data_path, deconvolution_results_relative ) create_heatmap_differentiation_stages( visualization_data_path, deconvolution_results_absolute )
Adding a new model requires 1) training data and 2) a labeling vector of the training data specifying the cell type of each training sample
meta_data_path = system.file("Data/Meta_information/Meta_information.tsv", package = "artdeco") meta_data = read.table( meta_data_path, sep ="\t", header = TRUE, stringsAsFactors = FALSE ) subtype_vector = meta_data$Test_Subtype # extract the training sample subtype labels subtype_vector[1:6] # show subtype definition training_data_path = system.file( "Data/Expression_data/PANnen_Test_Data.tsv", package = "artdeco") add_deconvolution_training_model( transcriptome_data_path = training_data_path, model_name = "My_model", subtype_vector = subtype_vector, marker_gene_list = list(), training_nr_marker_genes = 5, training_p_value_threshold = 0.05, training_nr_permutations = 100 )
Note that the training in the current version (1.0.0) of artdeco has to have HGNC symbols as rownames.
Remove models is fairly straight forward:
models = list.files(system.file("Models/", package = "artdeco")) print(models) model_to_be_removed = models[length(models)] remove_model( model_name = model_to_be_removed )
Please note that the expression data always has to have HGNC symbols as rownames and that the first entry of the column names indicates the hgnc symbols, not the first sample name.
training_data_path = system.file( "Data/Expression_data/PANnen_Test_Data.tsv", package = "artdeco") expression_matrix = read.table( training_data_path, sep ="\t", header = TRUE, row.names = 1) expression_matrix[1:5,1:5]
Contact: raik.otto@hu-berlin.de
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.