combinatorialDist: Combinatrial model approximation of the number of motif hits
In motifcounter: R package for analysing TFBSs in DNA sequences

Description Usage Arguments Details Value See Also Examples

This function approxmiates the distribution of the number of motif hits. To this end, it sums over all combinations of obtaining k hits in a random sequence of a given length using an efficient dynamic programming algorithm.

1	combinatorialDist(seqlen, overlap)

`seqlen`	Integer-valued vector that defines the lengths of the individual sequences. For a given DNAStringSet, this information can be retrieved using `numMotifHits`.
`overlap`	An Overlap object.

This function is an alternative to compoundPoissonDist which requires fixed-length sequences and currently only supports the computation of the distribution of the number of hits when both DNA strands are scanned for motif hits.

List containing

dist: Distribution of the number of hits

compoundPoissonDist

numMotifHits

probOverlapHit

# Load sequences
seqfile = system.file("extdata", "seq.fasta", package = "motifcounter")
seqs = Biostrings::readDNAStringSet(seqfile)

# Load motif
motiffile = system.file("extdata", "x31.tab", package = "motifcounter")
motif = t(as.matrix(read.table(motiffile)))

# Load background model
bg = readBackground(seqs, 1)

# Compute overlap probabilities
op = motifcounter:::probOverlapHit(motif, bg, singlestranded = FALSE)

# Use 2 sequences of length 100 bp each
seqlen = rep(100, 2) 

# Computes the combinatorial distribution of the number of motif hits
dist = motifcounter:::combinatorialDist(seqlen, op)