findPatternPos: Function to find positions of the nucleotide patterns in the...

View source: R/findPatternPos.R

findPatternPosR Documentation

Function to find positions of the nucleotide patterns in the sequence.

Description

This function finds all occurrences of a nucleotide pattern in the sequence. For each occurrence, the function returns the index of the middle nucleotide, computed as: ceiling(length(pattern) / 2). The function supports data for the plus and minus DNA strands; for the minus strand, all patterns are turned to complementary sequence.

Usage

    findPatternPos(patterns, sequence, strand)

Arguments

patterns

A list of nucleotide permutations of length n, as returned by nuclPerm.

sequence

A DNAString object storing the reference genomic sequence to search for the patterns in. The sequence corresponding to plus strand is expected.

strand

A character, indicating the plus (+) or minus strand (-). For the minus strand, the occurrences found for a particular pattern will be attributed to the pattern with complementary sequence.

Details

This function uses stringi::stri_locate_all_fixed().

This function aims to assist with addressing sequence bias in structure probing data. The sequence in the neighbourhood of a nucleotide is assumed to have an effect on its structural state. By considering sequence patterns of a certain length (specified by the user), this function finds indices of the middle nucleotide of each pattern's occurrences within the sequence. We then separately analyse the nucleotides occurring in the middle of each pattern, taking into account sequence dependency.

Value

This function returns a list where each component corresponds to a pattern (indicated by the field names) and contains indices of the middle nucleotides of that pattern's occurrences within the sequence.

Error

The following errors are returned if:

"Strand should be either plus or minus, specified with a sign." strand is not specified as "+" or "-";

"The sequence should be non-empty." provided sequence is empty;

"The list of patterns should be non-empty." the list of patterns to search for in the sequence is empty.

Author(s)

Alina Selega, Sander Granneman, Guido Sanguinetti

References

Selega et al. "Robust statistical modeling improves sensitivity of high-throughput RNA structure probing experiments", Nature Methods (2016).

See Also

See also nuclPerm.

Examples

    library(SummarizedExperiment)

    ## Extract the DNA sequence from se
    sequence <- subject(rowData(se)$nucl)

    ## Generate patterns of length 3
    n <- 3
    patterns <- nuclPerm(n)

    ## Find positions of pattern occurrences
    nuclPosition <- findPatternPos(patterns, sequence, '+')

alinaselega/BUMHMM documentation built on March 2, 2024, 10 p.m.