run_matrix_spma | R Documentation |
SPMA helps to illuminate the relationship between RBP binding evidence and the transcript sorting criterion, e.g., fold change between treatment and control samples.
run_matrix_spma(
sorted_transcript_sequences,
sorted_transcript_values = NULL,
transcript_values_label = "transcript value",
motifs = NULL,
n_bins = 40,
midpoint = 0,
x_value_limits = NULL,
max_model_degree = 1,
max_cs_permutations = 1e+07,
min_cs_permutations = 5000,
max_hits = 5,
threshold_method = "p_value",
threshold_value = 0.25^6,
max_fg_permutations = 1e+06,
min_fg_permutations = 1000,
e = 5,
p_adjust_method = "BH",
n_cores = 1,
cache = paste0(tempdir(), "/sc/")
)
sorted_transcript_sequences |
named character vector of ranked sequences
(only containing upper case characters A, C, G, T), where the
names are RefSeq identifiers and sequence
type qualifiers ( |
sorted_transcript_values |
vector of sorted transcript values, i.e.,
the fold change or signal-to-noise ratio or any other quantity that was used
to sort the transcripts that were passed to |
transcript_values_label |
label of transcript sorting criterion
(e.g., |
motifs |
a list of motifs that is used to score the specified sequences.
If |
n_bins |
specifies the number of bins in which the sequences will be divided, valid values are between 7 and 100 |
midpoint |
for enrichment values the midpoint should be |
x_value_limits |
sets limits of the x-value color scale (used to
harmonize color scales of different spectrum plots), see |
max_model_degree |
maximum degree of polynomial |
max_cs_permutations |
maximum number of permutations performed in Monte Carlo test for consistency score |
min_cs_permutations |
minimum number of permutations performed in Monte Carlo test for consistency score |
max_hits |
maximum number of putative binding sites per mRNA that are counted |
threshold_method |
either |
threshold_value |
semantics of the |
max_fg_permutations |
maximum number of foreground permutations performed in Monte Carlo test for enrichment score |
min_fg_permutations |
minimum number of foreground permutations performed in Monte Carlo test for enrichment score |
e |
integer-valued stop criterion for enrichment score Monte Carlo
test: aborting
permutation process after
observing |
p_adjust_method |
adjustment of p-values from Monte Carlo tests to
avoid alpha error
accumulation, see |
n_cores |
the number of cores that are used |
cache |
either logical or path to a directory where scores are cached.
The scores of each
motif are stored in a
separate file that contains a hash table with RefSeq identifiers and
sequence type
qualifiers as keys and the number of putative binding sites as values.
If |
In order to investigate how motif targets are distributed across a spectrum of transcripts (e.g., all transcripts of a platform, ordered by fold change), Spectrum Motif Analysis visualizes the gradient of RBP binding evidence across all transcripts.
The matrix-based approach skips the k-merization step of the k-mer-based approach and instead scores the transcript sequence as a whole with a position specific scoring matrix.
For each sequence in foreground and background sets and each sequence motif, the scoring algorithm evaluates the score for each sequence position. Positions with a relative score greater than a certain threshold are considered hits, i.e., putative binding sites.
By scoring all sequences in foreground and background sets, a hit count for each motif and each set is obtained, which is used to calculate enrichment values and associated p-values in the same way in which motif-compatible hexamer enrichment values are calculated in the k-mer-based approach. P-values are adjusted with one of the available adjustment methods.
An advantage of the matrix-based approach is the possibility of detecting clusters of binding sites. This can be done by counting regions with many hits using positional hit information or by simply applying a hit count threshold per sequence, e.g., only sequences with more than some number of hits are considered. Homotypic clusters of RBP binding sites may play a similar role as clusters of transcription factors.
A list with the following components:
foreground_scores | the result of score_transcripts
for the foreground
sets (the bins) |
background_scores | the result of score_transcripts
for the background
set |
enrichment_dfs | a list of data frames, returned by
calculate_motif_enrichment |
spectrum_info_df | a data frame with the SPMA results |
spectrum_plots | a list of spectrum plots, as generated by
score_spectrum |
classifier_scores | a list of classifier scores, as returned by
classify_spectrum
|
Other SPMA functions:
classify_spectrum()
,
run_kmer_spma()
,
score_spectrum()
,
subdivide_data()
Other matrix functions:
calculate_motif_enrichment()
,
run_matrix_tsma()
,
score_transcripts()
,
score_transcripts_single_motif()
# example data set
background_df <- transite:::ge$background_df
# sort sequences by signal-to-noise ratio
background_df <- dplyr::arrange(background_df, value)
# character vector of named and ranked (by signal-to-noise ratio) sequences
background_seqs <- gsub("T", "U", background_df$seq)
names(background_seqs) <- paste0(background_df$refseq, "|",
background_df$seq_type)
results <- run_matrix_spma(background_seqs,
sorted_transcript_values = background_df$value,
transcript_values_label = "signal-to-noise ratio",
motifs = get_motif_by_id("M178_0.6"),
n_bins = 20,
max_fg_permutations = 10000)
## Not run:
results <- run_matrix_spma(background_seqs,
sorted_transcript_values = background_df$value,
transcript_values_label = "SNR")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.