qQCReport | R Documentation |
Generate quality control plots for a qProject
object or a vector of
fasta/fastq/bam files. The available plots vary depending on the types of
available input (fasta, fastq, bam files or qProject
object;
paired-end or single-end).
qQCReport(
input,
pdfFilename = NULL,
chunkSize = 1000000L,
useSampleNames = FALSE,
clObj = NULL,
a4layout = TRUE,
...
)
input |
A vector of files or a |
pdfFilename |
The path and name of a pdf file to store the report.
If |
chunkSize |
The number of sequences, sequence pairs (for paired-end data) or alignments that will be sampled from each data file to collect quality statistics. |
useSampleNames |
If TRUE, the plots will be labelled using the sample
names instead of the file names. Sample names are obtained from the
|
clObj |
A cluster object to be used for parallel processing of multiple input files. |
a4layout |
A logical scalar. If TRUE, the output of mapping rate and uniqueness plots will be adjusted for a4 format devices. |
... |
Additional arguments that will be passed to the functions generating the individual quality control plots, see ‘Details’. |
This function generates quality control plots for all input files or the
sequence and alignment files contained in a qProject
object,
allowing assessment of the quality of a sequencing experiment.
qQCReport
uses functionality from the ShortRead package to
collect quality data, and visualizes the results similarly as the
‘FastQC’ quality control tool from Simon Andrews (see
‘References’ below). It is recommended to create PDF reports
(pdfFilename
argument), for which the plot layouts have been optimised.
Some plots will only be generated if the necessary information is available (e.g. base qualities in fastq sequence files).
The currently available plot types are:
shows the distribution of base quality values as a box plot for each position in the input sequence. The background color (green, orange or red) indicates ranges of high, intermediate and low qualities.
plot shows the frequency of A, C, G, T and N bases by position in the read.
plot shows for each sample the fraction of reads observed at different duplication levels (e.g. once, two-times, three-times, etc.). In addition, the most frequent sequences are listed.
shows fractions of reads that were (un)mappable to the reference genome.
shows fractions of unique read(-pair) alignment positions, as a measure of the complexity in the sequencing library. Please note that this measure is not independent from the total number of reads in a library, and is best compared between libraries of similar sizes.
shows the frequency and position (relative to the read sequence) of mismatches in the alignments against the reference genome.
shows the frequency of read bases that caused mismatches in the alignments to the reference genome, separately for each genome base.
shows the distribution of fragment sizes inferred from aligned read pairs.
One approach to assess the quality of a sample is to compare its control plots to the ones from other samples and search for relative differences. Special quality measures are expected for certain types of experiments: A genomic re-sequencing sample with an overrepresentation of T bases may be suspicious, while such a nucleotide bias is normal for a directed bisulfite-sequencing sample.
Additional arguments can be passed to the internal functions that
generate the individual quality control plots using ...{}
:
lmat
:a matrix (e.g. matrix(1:12, ncol=2)
) used
by an internal call to the layout
function to specify the
positioning of multiple plot panels on a device page. Individual panels
correspond to different samples.
breaks
:a numerical vector
(e.g. c(1:10)
) defining the bins used by
the ‘Duplication level’ plot.
The function is called for its side effect of generating quality control plots. It invisibly returns a list with components that contain the data used to generate each of the QC plots. Available components are (depending on input data, see ‘Details’):
: quality score boxplot
: nucleotide frequency plot
: duplication level plot
: mapping statistics barplot
: library complexity barplot
: mismatch frequency plot
: mismatch type plot
: fragment size distribution plot
Anita Lerch, Dimos Gaidatzis and Michael Stadler
FastQC quality control tool at http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
qProject
, qAlign
,
ShortRead
package
# copy example data to current working directory
file.copy(system.file(package="QuasR", "extdata"), ".", recursive=TRUE)
# create alignments
sampleFile <- "extdata/samples_chip_single.txt"
genomeFile <- "extdata/hg19sub.fa"
proj <- qAlign(sampleFile, genomeFile)
# create quality control report
qQCReport(proj, pdfFilename="qc_report.pdf")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.