Cut rDNA sequences into domains using covariance models
LSUx uses Infernal (INFERence of RNA ALignment) to match the 5.8S and LSU regions. This means that Infernal must be installed for it to function. Follow the instructions on the Infernal webpage to accomplish this.
Note that Infernal does not work on Windows.
The R interface to Infernal is in package inferrnal
(two "r"s).
You can install the development versions of inferrnal
and LSUx
from GitHub with:
# install.packages("devtools") devtools::install_github("brendanf/inferrnal") devtools::install_github("brendanf/LSUx")
library(LSUx)
LSUx extracts alternating variable and conserved domains from the contiguous rDNA regions which form the eukaryotic ribosomal large subunit, i.e. 5.8S RNA, ITS2, and 28S RNA. For the purposes of this document, this region will be referred to as the 32S precursor RNA, as in humans, although its actual size in Svedberg units varies between lineages.
Input sequences should contain, at a minimum, a significant fraction of the
5.8S RNA, which is used to define the 5' end of 32S.
Any base pairs before
the 5' end of 5.8S will be considered to be ITS1 (ITS1 = TRUE
) or
discarded (ITS1 = FALSE
).
Input sequences should not extend past the
end of the 32S model at the 3' end.
LSUx requires two covariance models: one for 5.8S, which is used in
to locate the 5' end of the target region, and one for 32S.
The 5.8S model can be
RF00002 from Rfam (the default),
or an equivalent.
If using a custom CM, it must be calibrated using cmcalibrate
from Infernal.
The 32S model must include annotations in the reference line ("#=GC RF
" in
the seed alignment) to distinguish conserved and variable regions. The
annotations should be sequential characters in the range 1..9A..Z
for conserved domains, "v
" for variable domains, and ".
" for
unaligned gaps in the seed alignment.
In the output, the conserved domains will be named "5_8S", "LSU1", "LSU2",
...; the variable domains will be named "ITS2", "V1", "V2", ...
Two example models are included, both based on the RDP fungal LSU CM, and annotated with variable regions according to Raué (1988). The first, includes the full LSU region.
cm_full <- system.file( file.path("extdata", "fungi_32S.cm"), package = "LSUx" )
The second, is truncated at the binding site of the LR5 primer, and should be faster for input sequences which do not extend past that point.
cm_trunc <- system.file( file.path("extdata", "fungi_32S_LR5.cm"), package = "LSUx" )
The seed alignments are also provided; they can be accessed using:
system.file(file.path("extdata", "fungi_32S.stk"), package = "LSUx") system.file(file.path("extdata", "fungi_32S_LR5.stk"), package = "LSUx")
The sample data is from an environmental metabarcoding study focusing on fungi.
seq <- system.file("extdata/sample.fasta", package = "inferrnal")
The sequences in the sample data were amplified using primers ITS1 and LR5, so using the truncated CM is appropriate.
cm_32S_trunc <- system.file( file.path("extdata", "fungi_32S_LR5.cm"), package = "LSUx" )
LSUx gives a tibble
with the start and end locations of each region for each
sequence.
regions <- lsux(seq, cm_32S = cm_32S_trunc, ITS1 = TRUE, cpu = 1) regions
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.