topMotif: TOP Motif detection

View source: R/sequence_features.R

topMotifR Documentation

TOP Motif detection


Per leader, detect if the leader has a TOP motif at TSS (5' end of leader) TOP motif defined as: (C, then 4 pyrimidines)


topMotif(seqs, start = 1, stop = max(nchar(seqs)), return.sequence = TRUE)



the sequences (character vector, DNAStringSet), of 5' UTRs (leaders) start region. seqs must be of minimum widths start - stop + 1 to be included.
See example below for input.


position in seqs to start at (first is 1), default 1.


position in seqs to stop at (first is 1), default max(nchar(seqs)), that is the longest sequence length


logical, default TRUE, return as data.table with sequence as columns in addition to TOP class. If FALSE, return character vector.


default: return.sequence == FALSE, a character vector of either TOP, C or OTHER. C means leaders started on C, Other means not TOP and did not start on C. If return.sequence == TRUE, a data.table is returned with the base per position in the motif is included as additional columns (per position called seq1, seq2 etc) and a id column called X.gene_id (with names of seqs).


## Not run: 
if (requireNamespace("BSgenome.Hsapiens.UCSC.hg19")) {
  txdbFile <- system.file("extdata", "hg19_knownGene_sample.sqlite",
                          package = "GenomicFeatures")
  #Extract sequences of Coding sequences.
  leaders <- loadRegion(txdbFile, "leaders")

  # Should update by CAGE if not already done
  cageData <- system.file("extdata", "cage-seq-heart.bed.bgz",
                          package = "ORFik")
  leadersCage <- reassignTSSbyCage(leaders, cageData)
  # Get region to check
  seqs <- startRegionString(leadersCage, NULL,
        BSgenome.Hsapiens.UCSC.hg19::Hsapiens, 0, 4)
## End(Not run)

Roleren/ORFik documentation built on Feb. 17, 2025, 4:13 p.m.