sepiraInfNet: Infer tissue-specific regulatory network from gene expression...
In SEPIRA: Systems EPigenomics Inference of Regulatory Activity

Description Usage Arguments Details Value Examples

sepiraInfNet() is one of the two main functions in package SEPIRA. Using it you can estimate tissue-specific regulatory networks in any tissue type of interest.

1
2
3

sepiraInfNet(data, tissue, toi, cft = NULL, TFs, sdth = 0.25,
  sigth = NULL, pcorth = 0.2, degth = c(0.05, 0.05), lfcth = c(1,
  log2(1.5)), minNtgts = 10, ncores = 4)

`data`	A matrix, the normalized gene expression data matrix, with rows referring to unique genes and columns to samples from different tissue types.
`tissue`	A phenotype vector, indicating the tissue types of samples. It should have the same order as the columns of the matrix.
`toi`	A character, the tissue type of interest, a character telling the function the tissue type for which a user wants to estimate the network.
`cft`	A vector of tissue types to be used to adjust for confounding by immune or stromal cells infiltration in `toi`. It can be blood and/or spleen, which we found using `ESTIMATE` package that they contain extremely high proportion of immune and stromal cells.
`TFs`	A vector of TFs. Note that one should use the same annotation in different data sets throughout the analysis.
`sdth`	A numeric, the standard deviation threshold used to remove genes with little or zero standard deviation of its expression levels.
`sigth`	A numeric, the unadjusted p-value threshold used to call significant interactions after calculating the correlation coefficients between TFs and target genes. This threshold is used to binarize the correlation coefficient matrix. If this value is not specified by user, the function will do Bonferroni correction and then use 0.05 as the threshold.
`pcorth`	A numeric, the partial correlation threshold, in the range between 0 and 1, used to remove indirect interactions between TFs and their target genes.
`degth`	A vector of length three, thresholds of adjusted p-value to call significant TFs in 1) comparison between `toi` and all other tissue types; 2) & 3) comparison between `toi` and blood/spleen in `cft`.
`lfcth`	A vector of length three, thresholds of log2(fold-change) to call significant TFs in 1) comparison between `toi` and all other tissue types; 2) & 3) comparison between `toi` and blood/spleen in `cft`.
`minNtgts`	An integer used to filter out TFs with few targets. Only TFs with more than 'minNtgts' target genes can be kept in the network.
`ncores`	An integer, the number of cores to use when computing partial correlation. See `mclapply`.

sepiraInfNet generates tissue specific TF regulatory networks from gene expression data across multi-tissue types.

The gene expression data set data should be normalized by user before inputting to sepiraInfNet, with rows referring to genes and columns to samples from different tissue types. Duplicated gene names/IDs should be averaged before normalization.

The user needs to input the tissue type of samples (tissue) in the data set as well as the tissue type of interest (toi). Please make sure the toi is in the tissue and spelled correctly.

Using differential gene expression analysis, we detect TFs that are highly active in toi and less active in other tissue types. When doing such analyses, the results could be confounded due to cell-type heterogeneity. sepiraInfNet provides a way to adjust for immune/ stromal cell contamination by doing additional comparisons between toi and 1) blood; 2)spleen as long as expression data for any one/ both of the tissue types are available in data.

TFs is a vector containing the identifiers of all TFs (regulators). In our paper we used the 1313 TFs annotated as "transcription factors" in MSigDB. You could input your own list of TFs to sepiraInfNet.

sdth is a standard deviation threshold that is used to remove genes in user provided data set which are with small or close to zero standard deviation. By default the threshold is 0.25.

From the gene expression data matrix sepiraInfNet estimates Pearson correlation coefficient between every TF-gene pair as well as corresponding p-value. The p-value threshold sigth binarizes the network into "regulation" (1) /"no regulation" (0). This binarized network is used to determine the covariants when estimating the partial correlation between target genes and their regulators (TFs).

pcorth is the partial correlation coefficient threshold for calling significant direct TF-gene interactions. By default pcorth equals 0.2.

degth and lfcth are vectors each contains the 3 thresholds for adjusted p-value/log2 fold-change to call significant TFs in comparisons between toi and 1) all other tissue types; 2) the 1st tissue type (blood) in cft; 3) the 2nd tissue type (spleen) in cft. These differential expression analyses are done to find tissue-specific TFs that are only highly activated in tissue type of interest.

When having detected tissue-specific TFs, we could get a network with only these TFs and their target genes. However sepiraInfNet further remove TFs with less than minNtgts target genes. By default the minimal number of TF targets in the final network is 10.

The step of calculating partial correlation coefficients is done by in parallel, by default sepiraInfNet splits the work into 4 sub-processes. User could use more cores by specifying parameter ncores.

A list with three entries:

$netTOI the tissue specific network, rows refer to TF target genes, while columns refer to TFs.

$sumnet a matrix summarizing the number of TF target genes and the number of positively/negatively regulated target genes for each TF in the inferred network.

$top a list, entries are the tables summarizing the results of differential expression analyses. The first is the table from comparison between toi and 1) all other tissue; The rest tables are resulted from comparison to 2) the blood or/ and 3) spleen.

# gene expression data set (a subset of GTEx data set)
data("GeneExp")
# TFs
data("TFeid")
# run the function
cf <- "Blood"
coln <- colnames(GeneExp)
degth <- c(0.3,0.3) # 'degth = c(0.05, 0.05)' is recommended
# The resulted network is small due to the limited size of the 'GeneExp' data set
net.o <- sepiraInfNet(GeneExp,coln,"Lung",cf,TFeid,sigth=0.05,degth=degth,minNtgts=5,ncores=1)