Description Usage Arguments Details Value Author(s) References See Also Examples
View source: R/msaCheckNames.R
This function checks and fixed sequence names of multiple
alignment objects if they contain characters
that might lead to LaTeX problems when using
msaPrettyPrint
.
1 | msaCheckNames(x, replacement=" ", verbose=TRUE)
|
x |
an object of class |
replacement |
a character string specifying with which character(s) potentially problematic characters should be replaced. |
verbose |
if |
The Biostrings package does not impose any restrictions on
the names of sequences. Consequently, msa also allows all
possible ASCII strings as sequence (row) names in multiple alignments.
As soon as msaPrettyPrint
is used for pretty-printing
multiple sequence alignments, however, the sequence names are
interpreted as plain LaTeX source code. Consequently, LaTeX errors
may arise because of characters or words in the sequence names that LaTeX
does not or cannot interpret as plain text correctly. This
particularly includes appearances of special characters and backslash
characters in the sequence names.
The msaCheckNames
function takes a multiple alignment object
and checks sequence names for possibly problematic characters, which
are all characters but letters (upper and lower case), digits,
spaces, commas, colons, semicolons, periods, question and exclamation
marks, dashes, braces, single quotes, and double quotes.
All other characters are
considered problematic. The function allows for both checking and
fixing the sequence names. If called with verbose=TRUE
(default), the function prints a warning if a problematic character is
found. At the same time, regardless of the verbose
argument,
the function invisibly returns a copy of x
in whose sequence
names all problematic characters have been replaced by the string
that is supplied via the replacement
argument (the default is
a single space).
In any case, the best solution is to check sequence names carefully and to avoid problematic sequence names from the beginning.
The function invisibly returns a copy of the argument x
(therefore, an object of the same class as x
), but
with modified sequence/row names (see details above).
Ulrich Bodenhofer <msa@bioinf.jku.at>
http://www.bioinf.jku.at/software/msa
U. Bodenhofer, E. Bonatesta, C. Horejs-Kainrath, and S. Hochreiter (2015). msa: an R package for multiple sequence alignment. Bioinformatics 31(24):3997-3999. DOI: 10.1093/bioinformatics/btv494.
msaPrettyPrint
,
MsaAAMultipleAlignment
,
MsaDNAMultipleAlignment
,
MsaRNAMultipleAlignment
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ## create toy example
mySeqs <- DNAStringSet(c("ACGATCGATC", "ACGACGATC", "ACGATCCCCC"))
names(mySeqs) <- c("Seq. #1", "Seq. \2", "Seq. ~3")
## perform multiple alignment
myAlignment <- msa(mySeqs)
myAlignment
## check names
msaCheckNames(myAlignment)
## fix names
myAlignment <- msaCheckNames(myAlignment, replacement="", verbose=FALSE)
myAlignment
|
Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, basename, cbind, colMeans, colSums, colnames,
dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
intersect, is.unsorted, lapply, lengths, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which, which.max, which.min
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following object is masked from 'package:base':
expand.grid
Loading required package: IRanges
Loading required package: XVector
Attaching package: 'Biostrings'
The following object is masked from 'package:base':
strsplit
use default substitution matrix
CLUSTAL 2.1
Call:
msa(mySeqs)
MsaDNAMultipleAlignment with 3 rows and 10 columns
aln names
[1] ACGACGATC- Seq.
[2] ACGATCCCCC Seq. ~3
[3] ACGATCGATC Seq. #1
Con ACGATC??CC Consensus
sequence names contain invalid characters
CLUSTAL 2.1
Call:
msa(mySeqs)
MsaDNAMultipleAlignment with 3 rows and 10 columns
aln names
[1] ACGACGATC- Seq.
[2] ACGATCCCCC Seq. 3
[3] ACGATCGATC Seq. 1
Con ACGATC??CC Consensus
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.