View source: R/anchor_pair_enrich.R
anchor_pair_enrich | R Documentation |
Determine whether motifs between paired bed regions have a statistically significant relationship. Options for significance are motif score correlation, motif count correlation, or hypergeometric motif co-occurrence.
anchor_pair_enrich(interaction_data, method = c("count", "score", "match"))
interaction_data |
an interactionData object of paired genomic regions | ||||||
method |
method for co-occurrence, valid options include:
|
an interactionData object where obj$pair_motif_enrich
contains
the p-values for significance of seeing a higher co-occurrence than
what we get by chance.
We assume motif scores follow a normal distribution and are independent between enhancers and promoters. We can therefore compute how correlated scores of any two transcription factor motifs are between enhancer and promoter regions using Pearson's product-moment correlation coefficient:
r = \frac{\sum (x^{\prime}_i - \bar{x}^{\prime})(y^{\prime}_i -
\bar{y}^{\prime})}{\sqrt{\sum(x^{\prime}_i -
\bar{x}^{\prime})^2\sum(y^{\prime}_i - \bar{y}^{\prime})^2}}
,
where the input vectors \boldsymbol{x}
and \boldsymbol{y}
from
above are transformed to vectors \boldsymbol{x^{\prime}}
and
\boldsymbol{y^{\prime}}
by replacing the set of scores with the
maximum score for each region:
x^{\prime}_i = \max x_i
x^{\prime}_i
is then the maximum motif score of motif a
in the
promoter region of interaction i
, y^{\prime}_i
is the maximum
motif score of motif b
in the enhancer region of interaction i
,
and \bar{x}^{\prime}
and \bar{y}^{\prime}
are the sample means.
Significance is then computed by transforming the correlation coefficient
r
to test statistic t
, which is Student t
-distributed
with n - 2
degrees of freedom.
t = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}}
All p-values are calculated as one-tailed p-values of the probability
that scores are greater than or equal to r
.
Instead of calculating the correlation of motif scores directly, the
count-based correlation metric first tallies the number of instances of a
given motif within an enhancer or a promoter region, which are defined as
all positions in those regions with motif score p-values of less than
5 * 10^{-5}
. Formally, the input vectors \boldsymbol{x}
and
\boldsymbol{y}
are transformed to vectors
\boldsymbol{x^{\prime\prime}}
and \boldsymbol{y^{\prime\prime}}
by replacing the set of scores with the cardinality of the set:
x^{\prime\prime}_i = |x_i|
And analogous for y^{\prime\prime}_i
. Finally, the correlation
coefficient r
between \boldsymbol{x^{\prime\prime}}
and
\boldsymbol{y^{\prime\prime}}
and its associated significance are
calculated as described above.
Instance co-occurrence uses the presence or absence of a motif within an
enhancer or promoter to determine a statistically significant association,
thus \boldsymbol{x^{\prime\prime\prime}}
and
\boldsymbol{y^{\prime\prime\prime}}
are defined by:
x^{\prime\prime\prime}_i = \boldsymbol{1}_{x^{\prime\prime}_i > 0}
Instance co-occurrence is computed using the hypergeometric test:
p = \sum_{k=I_{ab}}^{P_a} \frac{binom(P_a, k)
binom(n - P_a, E_b - k)}{binom(n, E_b)},
where I_{ab}
is the number of interactions that contain a match for
motif a
in the promoter and motif b
in the enhancer, P_a
is the number of promoters that contain motif a
(P_a = \sum^n_i x^{\prime\prime\prime}_i
), E_b
is the number of
enhancers that contain motif b
(E_b = \sum^n_i y^{\prime\prime\prime}_i
), and n
is the total
number of interactions, which is equal to the number of promoters and to
the number of enhancers.
Jennifer Hammelman
Konstantin Krismer
## Not run:
genome_id <- "BSgenome.Mmusculus.UCSC.mm9"
if (!(genome_id %in% rownames(utils::installed.packages()))) {
BiocManager::install(genome_id, update = FALSE, ask = FALSE)
}
genome <- BSgenome::getBSgenome(genome_id)
motifs_file <- system.file("extdata/motifs_subset.txt.gz",
package = "spatzie")
motifs <- TFBSTools::readJASPARMatrix(motifs_file, matrixClass = "PFM")
yy1_pd_interaction <- scan_motifs(spatzie::interactions_yy1, motifs, genome)
yy1_pd_interaction <- filter_motifs(yy1_pd_interaction, 0.4)
yy1_pd_count_corr <- anchor_pair_enrich(yy1_pd_interaction, method = "count")
## End(Not run)
res <- anchor_pair_enrich(spatzie::scan_interactions_example_filtered,
method = "score")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.