knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width=7, fig.height=7 )
First, load the rbims package.
library(rbims)
First, I will read the InterProScan output in a long format and extract the PFAM abundance information.
If you want to follow this example, you can download the use rbims test file.
interpro_pfam_long<-read_interpro(data_interpro = "../inst/extdata/Interpro_test.tsv", database="Pfam", profile = F)
You can use the subsetting functions to create subsets of the InterPro profile table. Here, we will extract the most important PFAMs, and we need to use them as an input, not the profile output from read_interpro.
The function get_subset_pca calculates a PCA over the data to find the PFAM that explains the variation within the data.
important_PFAMs<-get_subset_pca(tibble_rbims=interpro_pfam_profile, cos2_val=0.95, analysis="PFAM")
head(important_PFAMs)
Let's plot the results.
plot_heatmap can help explore the results. We can perform two types of analyses; if we set the distance option as TRUE, we can plot to show how the samples could cluster based on the protein domains.
plot_heatmap(important_PFAMs, y_axis=PFAM, analysis = "INTERPRO", distance = T)
If we set that to FALSE, we observed the presence and absence of the domains across the genome samples.
plot_heatmap(important_PFAMs, y_axis=PFAM, analysis = "INTERPRO", distance = F)
plot_heatmap(important_PFAMs, y_axis=PFAM, analysis = "INTERPRO", distance = F)
We can also visualize using a bubble plot.
plot_bubble(important_PFAMs, y_axis=PFAM, x_axis=Bin_name, calc = "Binary", analysis = "INTERPRO", data_experiment = metadata, color_character = Clades)
First, I will read the InterProScan output in a wide format and extract the PFAM abundance information.
interpro_INTERPRO_profile<-read_interpro(data_interpro = "Interpro_test.tsv", database="INTERPRO", profile = F)
head(interpro_INTERPRO_profile)
We are going to look for the InterProScan IDs that conform the DNA topoisomerase 1
. To do this, we will create a vector of the IDs associated to that enzyme.
DNA_topoisomerase_1<-c("IPR013497", "IPR023406", "IPR013824")
With the function get_subset_pathway we can create a subset of the INTERPRO table.
DNA_tipo_INTERPRO<-get_subset_pathway(interpro_INTERPRO_profile, type_of_interest_feature=INTERPRO, interest_feature=DNA_topoisomerase_1)
head(DNA_tipo_INTERPRO)
We can create a bubble plot to visualize the distribution of these enzymes across the bins.
plot_bubble(DNA_tipo_INTERPRO, y_axis=INTERPRO, x_axis=Bin_name, calc = "Binary", analysis = "INTERPRO", data_experiment = metadata, color_character = Sample_site)
First, I will read the InterProScan output in a long format and extract the KEGG information. When you use the KEGG
option, the profile option is disabled.
interpro_KEGG_long<-read_interpro(data_interpro = "Interpro_test.tsv", database="KEGG")
head(interpro_KEGG_long)
We can use the mapping_ko function here, to get the extended KEGG table.
interpro_map<-mapping_ko(tibble_interpro = interpro_KEGG_long)
head(interpro_map)
We can plot all the KOs and the Modules to which they belong. An important thing here is that we will set analysis = "KEGG"
despite this workflow started with the InterProScan output in analysis.
plot_heatmap(tibble_ko=interpro_map, data_experiment = metadata, y_axis=KO, order_y = Module, order_x = Sample_site, split_y = TRUE, analysis = "KEGG", calc="Percentage")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.