require(ggfastqc) knitr::opts_chunk$set( comment = "#", error = FALSE, tidy = FALSE, cache = FALSE, collapse=TRUE) # options(datatable.auto.index=FALSE)
The ggfastqc
package allows quick summary plots of
FastQC
reports from Next Generation Sequencing data.
There are four functions for plotting various summary statistics:
plot_gc_stats()
-- GC percentage
plot_dup_stats()
-- Sequence duplication percentage
plot_total_sequence_stats()
-- Total sequenced reads
plot_sequence_quality()
-- Per base sequence quality
The function fastqc()
loads the entire report as an object of class fastqc
which can be used to generate any additional plots that are required.
The fastqc()
function loads data from FastQC generated reports via the
argument sample_info
which should be a file containing info about samples.
The file should contain at least these three columns:
sample
-- contains the sample name.
pair
-- in case of paired end reads, 1
or 2
corresponding to first and
second pair, and in case of single end reads, NA
.
path
-- full path to the fastqc summary report (.txt file) for each sample.
If just the file name (.txt
) is provided, it is assumed that the file is
in the same folder as the input file provided to sample_info
argument.
It can also optionally contain a group
column. If present, the plots
generated will take it into account and color / facet accordingly.
It is recommended to have a group column.
path = system.file("tests/fastqc-sample", package="ggfastqc") ann_file = file.path(path, "annotation.txt")
path = "./" ann_file = file.path(path, "annotation.txt")
Here's how an annotation file might look like.
data.table::fread(ann_file)
fastqc()
to load reportsobj = fastqc(ann_file) obj class(obj)
obj
is an object of class fastqc
.
Each element of value
is itself a data.table.
plot_gc_stats()
provides a plot of GC percentage in each of the samples. By
default the argument interactive = TRUE
, in which case it will try to plot a
jitter plot using the plotly
package. Jitter plots are possible only when
interactive = TRUE
.
The other two types of plots possible are point
and bar
. Plots can be
interactive or static for these two types of plots. If static, the function
returns a ggplot2
plot.
plotly
plot_gc_stats(sample=obj)
pl = plot_gc_stats(sample=obj) ll = htmltools::tagList() ll[[1L]] = plotly::as.widget(pl) ll
Note that the facet is automatically named sample
which was the name
provided to the input argument. More than one such fastqc
object can be
provided to a single function to generate facetted plot as shown above, for
e.g., plot_gc_stats(s1 = obj1, s2 = obj2)
.
Using interactive=FALSE
would result in a static ggplot2
plot, but jitter
geom is not possible then.
ggplot2
plot_gc_stats(sample=obj, interactive=FALSE, geom="point") # or "bar"
plot_dup_stats()
provides a plot of total reads sequenced. The
usage is also identical to plot_gc_stats
.
plotly
plot_dup_stats(sample=obj)
pl = plot_dup_stats(sample=obj) ll = htmltools::tagList() ll[[1L]] = plotly::as.widget(pl) ll
ggplot2
plot_dup_stats(sample=obj, interactive=FALSE, geom="point") # or "bar"
plot_total_sequence_stats()
provides a plot of total reads sequenced. The
usage is also identical to plot_gc_stats
.
plotly
plot_total_sequence_stats(sample=obj)
pl = plot_total_sequence_stats(sample=obj) ll = htmltools::tagList() ll[[1L]] = plotly::as.widget(pl) ll
ggplot2
plot_total_sequence_stats(sample=obj, interactive=FALSE, geom="bar") # or "point"
plot_sequence_quality()
provides a plot of per base sequence quality. The only
geom implemented is line
. Both interactive and non-interactive plots are
possible, as shown below.
plotly
plot_sequence_quality(sample=obj)
pl = plot_sequence_quality(sample=obj) ll = htmltools::tagList() ll[[1L]] = plotly::as.widget(pl) ll
ggplot2
plot_sequence_quality(sample=obj, interactive=FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.