runMeme: Identify motifs with MEME
In snystrom/memes: motif matching, comparison, and de novo discovery using the MEME Suite

runMeme

R Documentation

Identify motifs with MEME

Description

MEME performs de-novo discovery of ungapped motifs present in the input sequences. It can be used in both discriminative and non-discriminative modes.

Usage

runMeme(
  input,
  control = NA,
  outdir = "auto",
  alph = "dna",
  parse_genomic_coord = TRUE,
  combined_sites = FALSE,
  silent = TRUE,
  meme_path = NULL,
  ...
)

## S3 method for class 'list'
runMeme(
  input,
  control = NA,
  outdir = "auto",
  alph = "dna",
  parse_genomic_coord = TRUE,
  combined_sites = FALSE,
  silent = TRUE,
  meme_path = NULL,
  ...
)

## S3 method for class 'BStringSetList'
runMeme(
  input,
  control = NA,
  outdir = "auto",
  alph = "dna",
  parse_genomic_coord = TRUE,
  combined_sites = FALSE,
  silent = TRUE,
  meme_path = NULL,
  ...
)

## Default S3 method:
runMeme(
  input,
  control = NA,
  outdir = "auto",
  alph = "dna",
  parse_genomic_coord = TRUE,
  combined_sites = FALSE,
  silent = TRUE,
  meme_path = NULL,
  ...
)

Arguments

`input`	path to fasta, Biostrings::BStringSet list, or list of Biostrings::BStringSet (can generate using `get_sequence()`)
`control`	any data type as in `input`, or a character vector of `names(input)` to use those regions as control sequences. Using sequences as background requires an alternative objective function. Users must pass a non-default value of `objfun` to `...` if using a non-NA control set (default: NA)
`outdir`	(default: "auto") Directory where output data will be stored.
`alph`	one of c("dna", "rna", "protein") or path to alphabet file (default: "dna").
`parse_genomic_coord`	`logical(1)` whether to parse genomic coordinates from fasta headers. Requires headers are in the form: "chr:start-end", or will result in an error. Automatically set to `FALSE` if `alph = "protein"`. This setting only needs to be changed if using a custom-built fasta file without genomic coordinates in the header.
`combined_sites`	`logical(1)` whether to return combined sites information (coerces output to list) (default: FALSE)
`silent`	Whether to suppress printing stdout to terminal (default: TRUE)
`meme_path`	path to "meme/bin/". If unset, will use default search behavior: `meme_path` setting in `options()` `MEME_PATH` setting in `.Renviron` or `.bashrc`
`...`	additional arguments passed to MEME (see below)

Details

Note that MEME can take a long time to run. The more input sequences used, the wider the motifs searched for, and the more motifs MEME is asked to discover will drastically affect runtime. For this reason, MEME usually performs best on a few (<50) short (100-200 bp) sequences, although this is not a requirement. Additional details on how data size affects runtime can be found here.

MEME works best when specifically tuned to the analysis question. The default settings are unlikely to be ideal. It has several complex arguments documented here, which runMeme() accepts as R function arguments (see details below).

If discovering motifs within ChIP-seq, ATAC-seq, or similar peaks, MEME may perform best if using sequences flaking the summit (the site of maximum signal) of each peak rather than the center. ChIP-seq or similar data can also benefit from setting ⁠revcomp = TRUE, minw = 5, maxw = 20⁠. For more tips on using MEME to analyze ChIP-seq data, see the following tips page.

Additional arguments

runMeme() accepts all valid arguments to meme as arguments passed to .... For flags without values, pass them as flag = TRUE. The dna, rna, and protein flags should instead be passed to the alph argument of runMeme(). The arguments passed to MEME often have many interactions with each other, for a detailed description of each argument see MEME Commandline Documentation.

Value

MEME results in universalmotif_df format (see: universalmotif::to_df()). sites_hits is a nested data.frame column containing the position within each input sequence of matches to the identified motif.

Citation

If you use runMeme() in your analysis, please cite:

Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. pdf

Licensing

The MEME Suite is free for non-profit use, but for-profit users should purchase a license. See the MEME Suite Copyright Page for details.

Examples

if (meme_is_installed()) {
seqs <- universalmotif::create_sequences("CCRAAAW", seqnum = 4)
names(seqs) <- 1:length(seqs)
runMeme(seqs, parse_genomic_coord = FALSE)

}

snystrom/memes documentation built on Oct. 12, 2024, 2:42 a.m.