View source: R/deseq_functions.R
getDESeqDataSet | R Documentation |
This is a convenience function for generating DESeqDataSet
objects,
but this function also adds support for counting reads across non-contiguous
regions.
getDESeqDataSet(
dataset.list,
regions.gr,
sample_names = NULL,
gene_names = NULL,
sizeFactors = NULL,
field = "score",
blacklist = NULL,
expand_ranges = FALSE,
ncores = getOption("mc.cores", 2L),
quiet = FALSE
)
dataset.list |
An object containing GRanges datasets that can be passed
to |
regions.gr |
A GRanges object containing regions of interest. |
sample_names |
Names for each dataset in |
gene_names |
An optional character vector giving gene names, or any
other identifier over which reads should be counted. Gene names are
required if counting is to be performed over non-contiguous ranges, i.e. if
any genes have multiple ranges. If supplied, gene names are added to the
resulting |
sizeFactors |
DESeq2 |
field |
Argument passed to |
blacklist |
An optional GRanges object containing regions that should be excluded from signal counting. Use of this argument is distinct from the use of non-contiguous gene regions (see details below), and the two can be used simultaneously. Blacklisting doesn't affect the ranges returned as rowRanges in the output DESeqDataSet object (unlike the use of non-contiguous regions). |
expand_ranges |
Logical indicating if ranges in |
ncores |
Number of cores to use for read counting across all samples. By default, all available cores are used. |
quiet |
If |
A DESeqData
object in which rowData
are given as
rowRanges
, which are equivalent to regions.gr
, unless there
are non-contiguous gene regions (see note below). Samples (as seen in
colData
) are factored so that samples are grouped by
replicate
and condition
, i.e. all non-replicate samples are
treated as distinct, and the DESeq2 design = ~condition
.
In DESeq2, genes must be defined
by single, contiguous chromosomal locations. In contrast, this function
allows individual genes to be encompassed by multiple distinct ranges in
regions.gr
. To use non-contiguous gene regions, provide
gene_names
in which some names are duplicated. For each unique gene
in gene_names
, this function will generate counts across all ranges
for that gene, but be aware that it will only keep the largest range for
each gene in the resulting DESeqDataSet
object's rowRanges
.
If the desired use is to blacklist certain sites in a genelist, note that
the blacklist
argument can be used.
DESeq2 sizeFactors
are
sample-specific normalization factors that are applied by division, i.e.
counts_{norm,i}=counts_i / sizeFactor_i
. This is in contrast to normalization factors as defined in
this package (and commonly elsewhere), which are applied by multiplication.
Also note that DESeq2's "normalizationFactors
" are not sample
specific, but rather gene specific factors used to correct for
ascertainment bias across different genes (e.g. as might be relevant for
GSEA or Go analysis).
Certain gene names can cause this function to return an error. We've never encountered errors using conventional, systematic naming schemes (e.g. ensembl IDs), but we have seen errors when using Drosophila (Flybase) "symbols". We expect this is due to the unconventional use of non-alphanumeric characters in some Drosophila gene names.
Mike DeBerardine
DESeq2::DESeqDataSet
,
getDESeqResults
suppressPackageStartupMessages(require(DESeq2))
data("PROseq") # import included PROseq data
data("txs_dm6_chr4") # import included transcripts
# divide PROseq data into 6 toy datasets
ps_a_rep1 <- PROseq[seq(1, length(PROseq), 6)]
ps_b_rep1 <- PROseq[seq(2, length(PROseq), 6)]
ps_c_rep1 <- PROseq[seq(3, length(PROseq), 6)]
ps_a_rep2 <- PROseq[seq(4, length(PROseq), 6)]
ps_b_rep2 <- PROseq[seq(5, length(PROseq), 6)]
ps_c_rep2 <- PROseq[seq(6, length(PROseq), 6)]
ps_list <- list(A_rep1 = ps_a_rep1, A_rep2 = ps_a_rep2,
B_rep1 = ps_b_rep1, B_rep2 = ps_b_rep2,
C_rep1 = ps_c_rep1, C_rep2 = ps_c_rep2)
# make flawed dataset (ranges in txs_dm6_chr4 not disjoint)
# this means there is double-counting
# also using discontinuous gene regions, as gene_ids are repeated
dds <- getDESeqDataSet(ps_list,
txs_dm6_chr4,
gene_names = txs_dm6_chr4$gene_id,
quiet = TRUE,
ncores = 1)
dds
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.