alignmentStatistics: Compute statistics for a multiple sequence alignments

Description Usage Arguments Details Value Author(s) Examples

Description

Functions to compute covariation, percent identity conservation, and percent canonical basepairs given a multiple sequence alignment and optionally a secondary structure. Statistics can be computed for a single base, basepair, helix or entire alignment.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15

Arguments

helix

A helix data.frame

msa

A multiple sequence alignment. Can be either a Biostrings XStringSet object or a named array of strings like ones obtained from converting XStringSet with as.character.

pos, pos.5p, pos.3p

Positions of bases or basepairs for which statistics shall be calculated for.

Details

Conservation values have a range of [0, 1], where 0 is the absence of primary sequence conservation (all bases different), and 1 is full primary sequence conservation (all bases identical).

Canonical values have a range of [0, 1], where 0 is a complete lack of basepair potential, and 1 indicates that all basepairs are valid

Covariation values have a range of [-2, 2], where -2 is a complete lack of basepair potential and sequence conservation, 0 is complete sequence conservation regardless of basepairing potential, and 2 is a complete lack of sequence conservation but maintaining full basepair potential.

helix values are average of base/basepair values, and the alignment values are averages of helices or all columns depending on whether the helix argument is required.

alignmentPercentGaps simply returns the percentage of nucleotides that are gaps in a sequence for each sequence of the alignment.

Value

baseConservation, basepairConservation, basepairCovariation, basepairCanonical, alignmentConservation, alignmentCovariation, and alignmentCanonical return a single decimal value.

helixConservation, helixCovariation, helixCanonical return a list of values whose length equals the number of rows in helix.

alignmentPercentGaps returns a list of values whose length equals the number of sequences in the multiple sequence alignment.

Author(s)

Jeff Proctor, Daniel Lai

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17

Example output

Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package:BiocGenericsThe following objects are masked frompackage:parallel:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked frompackage:stats:

    IQR, mad, sd, var, xtabs

The following objects are masked frompackage:base:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package:S4VectorsThe following object is masked frompackage:base:

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package:BiostringsThe following object is masked frompackage:base:

    strsplit

G 
1 
        G 
0.7619048 
       GU 
0.4761905 
[1] 1
 [1] 0.2644841 0.2857143 0.3666667 0.4285714 0.6619048 0.4583333 0.2678571
 [8] 0.2644841 0.3809524 0.7857143 0.4761905 0.5238095 0.2857143 0.3452381
[15] 0.5918367 0.5416667 0.4970238 0.4095238 0.6507937 0.5158730 0.6269841
[22] 0.7380952 0.6944444 0.6598639 0.2952381 0.7321429 0.8690476 0.3690476
[29] 0.5238095 0.5238095 0.4764286 0.4080357 0.2869048 0.5095238 0.8928571
[36] 0.5190476 0.6250000 0.5476190 0.3125000 0.4482684 0.5555556 0.7500000
[43] 0.9642857 0.8666667 0.4136364 0.8857143 0.9047619 0.8392857 0.8650794
[50] 0.8285714 0.4190476 0.8333333 0.8630952 0.8928571 0.8452381 0.5306122
 [1]  1.03174603  1.42857143  1.05714286  1.14285714  0.67619048  0.86904762
 [7]  0.94047619  1.03174603 -1.23809524  0.42857143 -1.04761905  0.95238095
[13]  1.42857143 -0.83333333  0.81632653  0.75000000  0.93452381  0.24761905
[19]  0.60317460  0.79894180  0.63492063 -0.04761905  0.51587302  0.50340136
[25]  0.51428571  0.53571429  0.02380952  0.67460317 -0.95238095  0.95238095
[31]  0.67142857  0.96428571 -0.69047619  0.35238095  0.07142857  0.44761905
[37]  0.34523810  0.40952381  0.30357143  0.51948052  0.36507937  0.19047619
[43] -0.07142857 -0.03809524  0.51515152  0.00000000  0.04761905 -0.17857143
[49] -0.07936508 -0.11428571  0.40000000  0.00000000  0.13095238  0.07142857
[55]  0.16666667  0.13605442
 [1] 0.9285714 1.0000000 0.9714286 1.0000000 1.0000000 0.9642857 0.9285714
 [8] 0.9285714 0.1428571 1.0000000 0.7142857 1.0000000 1.0000000 0.4285714
[15] 1.0000000 0.9642857 0.9821429 0.7714286 0.9761905 0.9682540 0.9761905
[22] 0.8571429 0.8571429 0.9591837 0.8285714 1.0000000 0.9285714 0.8809524
[29] 0.5000000 1.0000000 0.9000000 0.9642857 0.5000000 0.8285714 0.9642857
[36] 0.8857143 0.8928571 0.8857143 0.7678571 0.8441558 0.8809524 0.9142857
[43] 0.9642857 0.9142857 0.8181818 0.9428571 0.9642857 0.8571429 0.9047619
[50] 0.8571429 0.8000000 0.8928571 0.9642857 0.9642857 0.9642857 0.7959184
[1] 0.523439
[1] 0.4796748
[1] 0.902439
AF183905.1/5647-5848 AF218039.1/6028-6228 AB017037.1/6286-6484 
          0.03809524           0.04285714           0.05238095 
AB006531.1/6003-6204 AF014388.1/6078-6278 AF022937.1/6935-7121 
          0.03809524           0.04285714           0.10952381 
AF178440.1/5925-6123 
          0.05238095 

R4RNA documentation built on Nov. 8, 2020, 5:15 p.m.