packSearch | R Documentation |
General use pipeline function for the Pack-TYPE transposon finding algorithm.
packSearch( tirSeq, Genome, mismatch = 0, elementLength, tsdLength, tsdMismatch = 0, fixed = TRUE )
tirSeq |
A |
Genome |
A |
mismatch |
The maximum edit distance to be considered for TIR
matches (indels + substitions). See
|
elementLength |
The maximum element length to be considered, as a vector
of two integers. E.g. |
tsdLength |
Integer referring to the length of the flanking TSD region. |
tsdMismatch |
An integer referring to the allowable mismatch
(substitutions or indels) between a transposon's TSD
sequences. |
fixed |
Logical that will be passed to the 'fixed' argument of
|
Finds potential pack-TYPE elements based on:
Similarity of TIR sequence to tirSeq
Proximity of potential TIR sequences
Directionality of TIR sequences
Similarity of TSD sequences
The algorithm finds potential forward and reverse TIR
sequences using identifyTirMatches
and
their associated TSD sequence via getTsds
.
The main filtering stage,
identifyPotentialPackElements
, filters
matches to obtain a dataframe of potential PACK elements.
Note that this pipeline does not consider the
possibility of discovered elements being autonomous
elements, so it is recommended to cluster and/or BLAST
elements for further analysis. Furthermore, only exact TSD
matches are considered, so supplying long sequences for
TSD elements may lead to false-negative results.
A dataframe, containing elements identified by thealgorithm. These may be autonomous or pack-TYPE elements. Will contain the following features:
start - the predicted element's start base sequence position.
end - the predicted element's end base sequence position.
seqnames - character string referring to the
sequence name in Genome
to which start
and end
refer to.
width - the width of the predicted element.
strand - the strand direction of the
transposable element. This will be set to "*" as the
packSearch
function does not consider
transposons to have a direction - only TIR sequences.
Passing the packMatches
dataframe to
packClust
will assign a direction to
each predicted Pack-TYPE element.
This dataframe is in the format produced by
coercing a link[GenomicRanges:GRanges-class]{GRanges}
object to a dataframe: data.frame(GRanges)
. Downstream
functions, such as packClust
, use this
dataframe to manipulate predicted transposable elements.
This algorithm does not consider:
Autonomous elements - autonomous elements will
be predicted by this algorithm as there is no BLAST
step. It is recommended that, after clustering
elements using packClust
, the user
analyses each group to determine which predicted
elements are autonomous and which are likely
Pack-TYPE elements. Alternatively, databases such as
Repbase (https://www.girinst.org/repbase/)
supply annotations for autonomous transposable
elements that can be used to filter autonomous matches.
TSD Mismatches - if two TIRs do not have exact matches for their terminal site duplications they will be ignored. Supplying longer TSD sequences will likely lead to a lower false-positive rate, however may also cause a greater rate of false-negative results.
Pattern matching is done via matchPattern
.
Jack Gisby
identifyTirMatches
, getTsds
,
identifyPotentialPackElements
, packClust
,
packMatches
,
DNAStringSet
,
DNAString
,
matchPattern
data(arabidopsisThalianaRefseq) packMatches <- packSearch( Biostrings::DNAString("CACTACAA"), arabidopsisThalianaRefseq, elementLength = c(300, 3500), tsdLength = 3 )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.