knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Welcome to the second vignette of our software for the prediction of the activity of integenic fragments. We suggest to start from the first vignette because introduces the software. While, this vignette proposes a real application of Esearch3D and requires at least 32GB of RAM for being finished properly. It goes through all the main operations of the software with real data. Precisely, this vignette replicates the results of our software proposed in the publication where genes are connected to the chromatin fragemnt where their transctiption start site map to. This creates the CIN network called: TSS-based mESC DNaseI-capture CIN The only part that this vignette does not present is the classification of enhancers with machine learning. The latter topic is kept as last in order that only advance users can access to it. We hope that providing this vignette can help to understand the data, the operations and the results of our software.
Let us clean the R enviroment, set the working directory, load the software package, set the random seed generator for reproducing always the same results and define the number of cores available in the computer to run the operations. Be careful: set up the number of cores based on your resources, if you are not secure how to, then just set equal to 2
#Clean workspace and memory ---- rm(list=ls()) gc() #Set working directory ---- gps0=getwd() gps0=paste(gps0,"/%s",sep="") rootDir=gps0 setwd(gsub("%s","",rootDir)) #Load libraries ---- suppressWarnings(suppressMessages( library("Esearch3D", quietly = T) ) ) #Set variables ---- #Set seed to get always the same results out of this vignette set.seed(8) #Set number of cores to parallelize the tasks n_cores=5
For this vignette, We created a DNaseI-capture HiC derived CIN whereby captured regions harbour DNaseI sensible regions in mouse embryonic stem cells (mESC), enriching for interactions of chromatin accessible regions. A chromatin fragment representing a genomic locus is represented as a node; a fragment-fragment interaction as an edge. We then integrated genes as nodes within the CIN. In this case, they are connected by an edge to the node that their transctiption start site map to; we obtained the TSS-based mESC DNaseI-capture CIN.
#Load and set up the example data ---- data("tss_data_l") #gene - fragment interaction network generated from DNase_Prop1_mESC_TSS interactions data gf_net=tss_data_l$gf_net #gene-fragment-fragment interaction network generated from mESC_DNase_Net interactions data ff_net=tss_data_l$ff_net #sample profile with starting values for genes and fragments generated from mESC_bin_matrix_Prop1 input_m=tss_data_l$input_m #length of chromosomes chr_len=tss_data_l$chr_len #gene annotation ann_net_b=tss_data_l$ann_net_b #genes of interests gene_in=tss_data_l$gene_in
In the first propagation, the expression of the genes is propagated from their corresponding nodes into only the genic fragments. Be carefull: r is the isolation parameter, use low value for first step, use high value for second step In the second propagation, the gene expression is then propagated to the rest of the CIN. The genic and intergenic fragments receive an imputed activity score (IAS) reflecting the likelihood of enhancer activity.
#Two step propagation ----- #Propagated for the network gene-fragment gf_prop=rwr_OVprop(g=gf_net,input_m = input_m, no_cores=n_cores, r=0.1) #Propagated for the network fragment-fragment ff_prop=rwr_OVprop(g=ff_net,input_m = gf_prop, no_cores=n_cores, r=0.8) #Create igraph object with all the information included net=create_net2plot(gf_net,input_m,gf_prop,ann_net_b,frag_pattern="frag",ff_net,ff_prop) #Start GUI start_GUI(net, ann_net_b, chr_len, example=F)
The software performs the propagation of individual genes of interest belonging to a cell's expression profile. It then returns how much each gene of interest contributed to give information to the fragments. It then returns how much each fragment received information from the genes of interest. This function helps to understand the contribution of the individual genes in the two-step standard propagation It requries the name of the genes of interest. It requires the string pattern composing the names of the fragments. For example, F due to F1, F2, F3 and so on. It requires the distance between the genes of interest and the fragments to investigate.
#Single gene propagation ----- output_path="sgPropagation_results.rda" contrXgene_l=rwr_SGprop(gf_net, ff_net, gene_in[2:3], frag_pattern="frag", out_rda=output_path, degree = 4, r1 = 0.1, r2 = 0.8, no_cores = n_cores)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.