Description Usage Arguments Details Value See Also Examples
SPMA helps to illuminate the relationship between RBP binding evidence and the transcript sorting criterion, e.g., fold change between treatment and control samples.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | run_matrix_spma(
sorted_transcript_sequences,
sorted_transcript_values = NULL,
transcript_values_label = "transcript value",
motifs = NULL,
n_bins = 40,
midpoint = 0,
x_value_limits = NULL,
max_model_degree = 1,
max_cs_permutations = 1e+07,
min_cs_permutations = 5000,
max_hits = 5,
threshold_method = "p_value",
threshold_value = 0.25^6,
max_fg_permutations = 1e+06,
min_fg_permutations = 1000,
e = 5,
p_adjust_method = "BH",
n_cores = 1,
cache = paste0(tempdir(), "/sc/")
)
|
sorted_transcript_sequences |
named character vector of ranked sequences
(only containing upper case characters A, C, G, T), where the
names are RefSeq identifiers and sequence
type qualifiers ( |
sorted_transcript_values |
vector of sorted transcript values, i.e.,
the fold change or signal-to-noise ratio or any other quantity that was used
to sort the transcripts that were passed to |
transcript_values_label |
label of transcript sorting criterion
(e.g., |
motifs |
a list of motifs that is used to score the specified sequences.
If |
n_bins |
specifies the number of bins in which the sequences will be divided, valid values are between 7 and 100 |
midpoint |
for enrichment values the midpoint should be |
x_value_limits |
sets limits of the x-value color scale (used to
harmonize color scales of different spectrum plots), see |
max_model_degree |
maximum degree of polynomial |
max_cs_permutations |
maximum number of permutations performed in Monte Carlo test for consistency score |
min_cs_permutations |
minimum number of permutations performed in Monte Carlo test for consistency score |
max_hits |
maximum number of putative binding sites per mRNA that are counted |
threshold_method |
either |
threshold_value |
semantics of the |
max_fg_permutations |
maximum number of foreground permutations performed in Monte Carlo test for enrichment score |
min_fg_permutations |
minimum number of foreground permutations performed in Monte Carlo test for enrichment score |
e |
integer-valued stop criterion for enrichment score Monte Carlo
test: aborting
permutation process after
observing |
p_adjust_method |
adjustment of p-values from Monte Carlo tests to
avoid alpha error
accumulation, see |
n_cores |
the number of cores that are used |
cache |
either logical or path to a directory where scores are cached.
The scores of each
motif are stored in a
separate file that contains a hash table with RefSeq identifiers and
sequence type
qualifiers as keys and the number of putative binding sites as values.
If |
In order to investigate how motif targets are distributed across a spectrum of transcripts (e.g., all transcripts of a platform, ordered by fold change), Spectrum Motif Analysis visualizes the gradient of RBP binding evidence across all transcripts.
The matrix-based approach skips the k-merization step of the k-mer-based approach and instead scores the transcript sequence as a whole with a position specific scoring matrix.
For each sequence in foreground and background sets and each sequence motif, the scoring algorithm evaluates the score for each sequence position. Positions with a relative score greater than a certain threshold are considered hits, i.e., putative binding sites.
By scoring all sequences in foreground and background sets, a hit count for each motif and each set is obtained, which is used to calculate enrichment values and associated p-values in the same way in which motif-compatible hexamer enrichment values are calculated in the k-mer-based approach. P-values are adjusted with one of the available adjustment methods.
An advantage of the matrix-based approach is the possibility of detecting clusters of binding sites. This can be done by counting regions with many hits using positional hit information or by simply applying a hit count threshold per sequence, e.g., only sequences with more than some number of hits are considered. Homotypic clusters of RBP binding sites may play a similar role as clusters of transcription factors.
A list with the following components:
foreground_scores | the result of score_transcripts
for the foreground
sets (the bins) |
background_scores | the result of score_transcripts
for the background
set |
enrichment_dfs | a list of data frames, returned by
calculate_motif_enrichment |
spectrum_info_df | a data frame with the SPMA results |
spectrum_plots | a list of spectrum plots, as generated by
score_spectrum |
classifier_scores | a list of classifier scores, as returned by
classify_spectrum
|
Other SPMA functions:
classify_spectrum()
,
run_kmer_spma()
,
score_spectrum()
,
subdivide_data()
Other matrix functions:
calculate_motif_enrichment()
,
run_matrix_tsma()
,
score_transcripts_single_motif()
,
score_transcripts()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | # example data set
background_df <- transite:::ge$background_df
# sort sequences by signal-to-noise ratio
background_df <- dplyr::arrange(background_df, value)
# character vector of named and ranked (by signal-to-noise ratio) sequences
background_seqs <- gsub("T", "U", background_df$seq)
names(background_seqs) <- paste0(background_df$refseq, "|",
background_df$seq_type)
results <- run_matrix_spma(background_seqs,
sorted_transcript_values = background_df$value,
transcript_values_label = "signal-to-noise ratio",
motifs = get_motif_by_id("M178_0.6"),
n_bins = 20,
max_fg_permutations = 10000)
## Not run:
results <- run_matrix_spma(background_seqs,
sorted_transcript_values = background_df$value,
transcript_values_label = "SNR")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.