Description Usage Arguments Methods Note Author(s) See Also Examples
kmerKLPlot
calls calcKL
, which calculates the
Kullback-Leibler divergence between the k-mer distribution at each
position compared to the k-mer distribution across all
positions. kmerKLPlot
then plots each k-mer's contribution to
the total K-L divergence by stack bars, for a subset of the
k-mers. Since there are 4^k possible k-mers for some value k-mers,
plotting each often dilutes the interpretation; however one can
increase n.kmers
to a number greater than the possible number
of k-mers to force kmerKLPlot
to plot the entire K-L divergence
and all terms (which are k-mers) in the sum.
If a x
is a list
, the K-L k-mer plots are faceted by
sample; this allows comparison to a FASTA file of random reads.
Again, please note that this is not the total K-L divergence,
but rather the K-L divergence calculated on a subset of the sample
space (those of the top n.kmers
k-mers selected).
1 | kmerKLPlot(x, n.kmers=20)
|
x |
an S4 object a class that inherits from |
n.kmers |
a integer value indicating the size of top k-mers to include. |
signature(x = "SequenceSummary")
kmerKLPlot
will plot the K-L divergence for a subset of k-mers for a single object that
inherits from SequenceSummary
.
signature(x = "list")
kmerKLPlot
will plot the K-L divergence for a susbet of
k-mers for each of the objects that inherit from
SequenceSummary
in the list and display them in a series of
panels.
The K-L divergence calculation in calcKL
uses base 2 in the
log; the units are in bits.
Also, note that ggplot2
warns that "Stacking is not well defined when ymin
!= 0". This occurs when some k-mers are less frequent in the positional
distribution than the distribution across all positions, and the term of
the K-L sum is negative (producing a bar below zero). This does not
appear to affect the plot much. In examples below, warnings are
suppressed, but the given this is a valid concern from ggplot2
,
warnings are not suppressed in the function itself.
Vince Buffalo <vsbuffalo@ucdavis.edu>
getKmer
, calcKL
,
kmerEntropyPlot
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ## Load a somewhat contaminated FASTQ file
s.fastq <- readSeqFile(system.file('extdata', 'test.fastq',
package='qrqc'), hash.prop=1)
## Load a really contaminated FASTQ file
s.contam.fastq <- readSeqFile(system.file('extdata',
'test-contam.fastq', package='qrqc'), hash.prop=1)
## Load a random (equal base frequency) FASTA file
s.random.fasta <- readSeqFile(system.file('extdata',
'random.fasta', package='qrqc'), type="fasta", hash.prop=1)
## Make K-L divergence plot - shows slight 5'-end bias. Note units
## (bits)
suppressWarnings(kmerKLPlot(s.fastq))
## Plot multiple K-L divergence plots
suppressWarnings(kmerKLPlot(list("highly contaminated"=s.contam.fastq, "less
contaminated"=s.fastq, "random"=s.random.fasta)))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.