Description Usage Arguments Details Value Author(s) References See Also Examples
Converts a raw frequency matrix (PFMatrix) to a position weight matrix (PWMatrix). It takes the type, bases background frequencies, pseudocounts as parameters.
1 2 |
x |
For |
type |
The type of PWM generated, should be one of "log2probratio" or "prob". "log2probratio" will generate the PWM matrix in log-scale, while "prob" will give the PWM matrix in probability scale of 0 to 1. |
pseudocounts |
pseudocounts is a numeric non-negative vector, which means you can specify different pseudocounts for each site. The values will be recycled if shorter than the length of sites. 0.8 is recommended. See the reference below for more details. In the TFBS perl module, the squared root of the column sum of the matrix, i.e., the number of motifs used to construct the PFM, is used. |
bg |
bg is a vector of background frequencies of four bases
with names containing A, C, G, T.
When toPWM is applied to a |
The raw position frequency matrix (PFM) is usually converted into
a position weight matrix (PWM),
also known as position specific scoring matrix (PSSM).
The PWM provides the probability of each base at certain position and
used for scanning the genomic sequences.
The implementation here is slightly different from PWM
in
Biostrings
package by choosing the pseudocounts.
Pseudocounts is necessary for correcting the small number of counts
or eliminating the zero values before log transformation.
postProbs = (PFM + bg * pseudocounts) / (colSums(PFM) + sum(bg) * pseudocounts)
priorProbs = bg / sum(bg)
PWM_log2probratio = log2(postProbs / priorProbs)
PWM_prob = postProbs
A PWMatrix
object that contains the background frequency and
pseudocounts used.
Ge Tan
Wasserman, W. W., & Sandelin, A. (2004). Applied bioinformatics for the identification of regulatory elements. Nature Publishing Group, 5(4), 276-287. doi:10.1038/nrg1315
Nishida, K., Frith, M. C., & Nakai, K. (2009). Pseudocounts for transcription factor binding sites. Nucleic acids research, 37(3), 939-944. doi:10.1093/nar/gkn1019
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | ## Constructe a PFMatrix
pfm <- PFMatrix(ID="MA0004.1", name="Arnt", matrixClass="Zipper-Type",
strand="+", bg=c(A=0.25, C=0.25, G=0.25, T=0.25),
tags=list(family="Helix-Loop-Helix", species="10090",
tax_group="vertebrates",
medline="7592839", type="SELEX", ACC="P53762",
pazar_tf_id="TF0000003",
TFBSshape_ID="11", TFencyclopedia_ID="580"),
profileMatrix=matrix(c(4L, 19L, 0L, 0L, 0L, 0L,
16L, 0L, 20L, 0L, 0L, 0L,
0L, 1L, 0L, 20L, 0L, 20L,
0L, 0L, 0L, 0L, 20L, 0L),
byrow=TRUE, nrow=4,
dimnames=list(c("A", "C", "G", "T")))
)
## Convert it into a PWMatrix
pwm <- toPWM(pfm, type="log2probratio", pseudocounts=0.8)
## Conversion on PWMatrixList
data(MA0003.2)
data(MA0004.1)
pfmList <- PFMatrixList(pfm1=MA0003.2, pfm2=MA0004.1, use.names=TRUE)
pwmList <- toPWM(pfmList, pseudocounts=0.8)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.