Description Usage Arguments Details Value Author(s) References See Also Examples
This method computes a consensus sequence from a multiple alignment or a previously computed consensus matrix. Currently, two different ways of these computations are available.
1 2 3 4 5 | ## S4 method for signature 'matrix'
msaConsensusSequence(x, type=c("Biostrings", "upperlower"),
thresh=c(80, 20), ignoreGaps=FALSE, ...)
## S4 method for signature 'MultipleAlignment'
msaConsensusSequence(x, ...)
|
x |
an object of class |
type |
a character string specifying how to compute the consensus
sequence. Currently, types |
thresh |
a decreasing two-element numeric vector of numbers
between 0 and 100 corresponding to the two conservation thresholds.
Only relevant for |
ignoreGaps |
a logical (default: |
... |
when the method is called for a
|
The method takes a MultipleAlignment
object or a
previously computed consensus matrix and computes a consensus
sequence. For type="Biostrings"
, the method
consensusString
from the Biostrings package is
used to compute the consensus sequence. For type="upperlower"
,
two thresholds (argument thresh
, see above) are used to
compute the consensus sequence:
If the relative frequency of the most frequent letter at a given position is at least as large as the first threshold (default: 80%), then this most frequent letter is used for the consensus sequence at this position as it is.
If the relative frequency of the most frequent letter at a given position is smaller than the first threshold, but at least as large as the second threshold (default: 20%), then this most frequent letter is used for the consensus sequence at this position, but converted to lower case.
If the relative frequency of the most frequent letter in a column is even smaller than the second threshold, then a dot is used for the consensus sequence at this position.
If ignoreGaps=FALSE
(which is the default),
gaps are treated like all other
letters except for the fact that obviously no lowercase conversion
takes place in case that the relative frequency is between the
two thresholds. If ignoreGaps=TRUE
, gaps are ignored
completely, and the consensus sequence is computed from the
non-gap letters only.
If the consensus matrix of a multiple alignment of nucleotide sequences contains rows labeled ‘+’ and/or ‘.’, these rows are ignored.
The function returns a character string with the consensus sequence.
Ulrich Bodenhofer <msa@bioinf.jku.at>
http://www.bioinf.jku.at/software/msa
U. Bodenhofer, E. Bonatesta, C. Horejs-Kainrath, and S. Hochreiter (2015). msa: an R package for multiple sequence alignment. Bioinformatics 31(24):3997-3999. DOI: 10.1093/bioinformatics/btv494.
msa
, MsaAAMultipleAlignment
,
MsaDNAMultipleAlignment
,
MsaRNAMultipleAlignment
,
MsaMetaData
,
MultipleAlignment
,
consensusString
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | ## read sequences
filepath <- system.file("examples", "HemoglobinAA.fasta", package="msa")
mySeqs <- readAAStringSet(filepath)
## perform multiple alignment
myAlignment <- msa(mySeqs)
## regular consensus sequence using consensusString() method from the
## 'Biostrings' package
msaConsensusSequence(myAlignment)
## use the other method
msaConsensusSequence(myAlignment, type="upperlower")
## use the other method with custom parameters
msaConsensusSequence(myAlignment, type="upperlower", thresh=c(50, 20),
ignoreGaps=TRUE)
## compute a consensus matrix first
conMat <- consensusMatrix(myAlignment)
msaConsensusSequence(conMat)
|
Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colMeans, colSums, colnames, do.call,
duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
setdiff, sort, table, tapply, union, unique, unsplit, which,
which.max, which.min
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following object is masked from 'package:base':
expand.grid
Loading required package: IRanges
Loading required package: XVector
Attaching package: 'Biostrings'
The following object is masked from 'package:base':
strsplit
use default substitution matrix
[1] "-VLS?ADK?NVKA?WGK?GGHA?EYGAEALERMF?SFPTTKTYFPHF-DLSHGSAQVKGHGKKVADALT?AV?H?DDLPGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLA?H?PA?FTPAVHASLDKFLA?VSTVLTSKYR"
[1] "-vLsaadKtnvkaawgkvgghageygaEaLeRmflsfPtTKTYFphf-dlshgSaqvkghGkkvadAlt.AvahlddlpgalsaLSdLHAhkLrVDPvNFklLshcllVtla.hhpadftPavhaslDKFlasvstvLtskYR"
[1] ".TKRn.CIsMTI..VfItFfGffDWFfD.KDQLEkrENSSISWENGE.CKRGFR.PTIfGFIIT.C.KSd.TfGkCCkNf.KR.KRCKG.GIKQTCNTMEIKKRGAKKTSK.rGgN.cESNdTG.RKCIEK.rTRSTKSRIWQ"
[1] "-VLS?ADK?NVKA?WGK?GGHA?EYGAEALERMF?SFPTTKTYFPHF-DLSHGSAQVKGHGKKVADALT?AV?H?DDLPGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLA?H?PA?FTPAVHASLDKFLA?VSTVLTSKYR"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.