Description Usage Arguments Details Value Examples
sepiraInfNet()
is one of the two main functions in package SEPIRA
. Using it you can estimate tissue-specific regulatory networks in any tissue type of interest.
1 2 3 |
data |
A matrix, the normalized gene expression data matrix, with rows referring to unique genes and columns to samples from different tissue types. |
tissue |
A phenotype vector, indicating the tissue types of samples. It should have the same order as the columns of the matrix. |
toi |
A character, the tissue type of interest, a character telling the function the tissue type for which a user wants to estimate the network. |
cft |
A vector of tissue types to be used to adjust for confounding by immune or stromal cells infiltration in |
TFs |
A vector of TFs. Note that one should use the same annotation in different data sets throughout the analysis. |
sdth |
A numeric, the standard deviation threshold used to remove genes with little or zero standard deviation of its expression levels. |
sigth |
A numeric, the unadjusted p-value threshold used to call significant interactions after calculating the correlation coefficients between TFs and target genes. This threshold is used to binarize the correlation coefficient matrix. If this value is not specified by user, the function will do Bonferroni correction and then use 0.05 as the threshold. |
pcorth |
A numeric, the partial correlation threshold, in the range between 0 and 1, used to remove indirect interactions between TFs and their target genes. |
degth |
A vector of length three, thresholds of adjusted p-value to call significant TFs in 1) comparison between |
lfcth |
A vector of length three, thresholds of log2(fold-change) to call significant TFs in 1) comparison between |
minNtgts |
An integer used to filter out TFs with few targets. Only TFs with more than 'minNtgts' target genes can be kept in the network. |
ncores |
An integer, the number of cores to use when computing partial correlation. See |
sepiraInfNet
generates tissue specific TF regulatory networks from gene expression data across multi-tissue types.
The gene expression data set data
should be normalized by user before inputting to sepiraInfNet
, with rows referring to genes and columns to samples from different tissue types. Duplicated gene names/IDs should be averaged before normalization.
The user needs to input the tissue type of samples (tissue
) in the data set as well as the tissue type of interest (toi
). Please make sure the toi
is in the tissue
and spelled correctly.
Using differential gene expression analysis, we detect TFs that are highly active in toi
and less active in other tissue types. When doing such analyses, the results could be confounded due to cell-type heterogeneity. sepiraInfNet
provides a way to adjust for immune/ stromal cell contamination by doing additional comparisons between toi
and 1) blood; 2)spleen as long as expression data for any one/ both of the tissue types are available in data
.
TFs
is a vector containing the identifiers of all TFs (regulators). In our paper we used the 1313 TFs annotated as "transcription factors" in MSigDB. You could input your own list of TFs to sepiraInfNet
.
sdth
is a standard deviation threshold that is used to remove genes in user provided data set which are with small or close to zero standard deviation. By default the threshold is 0.25.
From the gene expression data matrix sepiraInfNet
estimates Pearson correlation coefficient between every TF-gene pair as well as corresponding p-value. The p-value threshold sigth
binarizes the network into "regulation" (1) /"no regulation" (0). This binarized network is used to determine the covariants when estimating the partial correlation between target genes and their regulators (TFs).
pcorth
is the partial correlation coefficient threshold for calling significant direct TF-gene interactions. By default pcorth
equals 0.2.
degth
and lfcth
are vectors each contains the 3 thresholds for adjusted p-value/log2 fold-change to call significant TFs in comparisons between toi
and 1) all other tissue types; 2) the 1st tissue type (blood) in cft
; 3) the 2nd tissue type (spleen) in cft
. These differential expression analyses are done to find tissue-specific TFs that are only highly activated in tissue type of interest.
When having detected tissue-specific TFs, we could get a network with only these TFs and their target genes. However sepiraInfNet
further remove TFs with less than minNtgts
target genes. By default the minimal number of TF targets in the final network is 10.
The step of calculating partial correlation coefficients is done by in parallel, by default sepiraInfNet
splits the work into 4 sub-processes. User could use more cores by specifying parameter ncores
.
A list with three entries:
$netTOI
the tissue specific network, rows refer to TF target genes, while columns refer to TFs.
$sumnet
a matrix summarizing the number of TF target genes and the number of positively/negatively regulated target genes for each TF in the inferred network.
$top
a list, entries are the tables summarizing the results of differential expression analyses. The first is the table from comparison between toi
and 1) all other tissue; The rest tables are resulted from comparison to 2) the blood or/ and 3) spleen.
1 2 3 4 5 6 7 8 9 10 | # gene expression data set (a subset of GTEx data set)
data("GeneExp")
# TFs
data("TFeid")
# run the function
cf <- "Blood"
coln <- colnames(GeneExp)
degth <- c(0.3,0.3) # 'degth = c(0.05, 0.05)' is recommended
# The resulted network is small due to the limited size of the 'GeneExp' data set
net.o <- sepiraInfNet(GeneExp,coln,"Lung",cf,TFeid,sigth=0.05,degth=degth,minNtgts=5,ncores=1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.