tximeta | R Documentation |
tximeta
leverages the hashed digest of the Salmon or piscem index,
in addition to a number of core Bioconductor packages (GenomicFeatures,
ensembldb, AnnotationHub, GenomeInfoDb, BiocFileCache) to automatically
populate metadata for the user, without additional effort from the user.
For other quantifiers see the customMetaInfo
argument below.
tximeta(
coldata,
type = NULL,
txOut = TRUE,
skipMeta = FALSE,
skipSeqinfo = FALSE,
useHub = TRUE,
markDuplicateTxps = FALSE,
cleanDuplicateTxps = FALSE,
customMetaInfo = NULL,
skipFtp = FALSE,
...
)
coldata |
a data.frame with at least two columns (others will propogate to object):
if |
type |
what quantifier was used (see |
txOut |
whether to output transcript-level data.
|
skipMeta |
whether to skip metadata generation
(e.g. to avoid errors if not connected to internet).
This calls |
skipSeqinfo |
whether to skip the addition of Seqinfo, which requires an internet connection to download the relevant chromosome information table from UCSC |
useHub |
whether to first attempt to download a TxDb/EnsDb
object from AnnotationHub, rather than creating from a
GTF file from FTP (default is TRUE). If FALSE, it will
force |
markDuplicateTxps |
whether to mark the status
( |
cleanDuplicateTxps |
whether to try to clean duplicate transcripts (exact sequence duplicates) by replacing the transcript names that do not appear in the GTF with those that do appear in the GTF |
customMetaInfo |
the relative path to a custom metadata
information JSON file, relative to the paths in |
skipFtp |
whether to avoid |
... |
arguments passed to |
Most of the code in tximeta
works to add metadata and transcript ranges
when the quantification was performed with Salmon. However,
tximeta
can be used with any quantification type
that is supported
by tximport
, where it will return an non-ranged SummarizedExperiment.
tximeta
performs a lookup of the hashed digest of the index
(stored in an auxilary information directory of the Salmon output)
against a database of known transcriptomes, which lives within the tximeta
package and is continually updated on Bioconductor's release schedule.
In addition, tximeta
performs a lookup of the digest against a
locally stored table of linkedTxome
's (see link{makeLinkedTxome}
).
If tximeta
detects a match, it will automatically populate,
e.g. the transcript locations, the transcriptome release,
the genome with correct chromosome lengths, etc. It allows for automatic
and correct summarization of transcript-level quantifications to the gene-level
via summarizeToGene
without the need to manually build
a tx2gene
table.
tximeta
on the first run will ask where the BiocFileCache for
this package should be kept, either using a default location or a temporary
directory. At any point, the user can specify a location using
setTximetaBFC
and this choice will be saved for future sessions.
Multiple users can point to the same BiocFileCache, such that
transcript databases (TxDb or EnsDb) associated with certain Salmon indices
and linkedTxomes
can be accessed by different users without additional
effort or time spent downloading and building the relevant TxDb / EnsDb.
Note that, if the TxDb or EnsDb is present in AnnotationHub, tximeta
will
use this object instead of downloading and building a TxDb/EnsDb from GTF
(to disable this set useHub=FALSE).
In order to allow that multiple users can read and write to the same location, one should set the BiocFileCache directory to have group write permissions (g+w).
a SummarizedExperiment with metadata on the rowRanges
.
(if the hashed digest in the Salmon or Sailfish index does not match
any known transcriptomes, or any locally saved linkedTxome
,
tximeta
will just return a non-ranged SummarizedExperiment)
# point to a Salmon quantification file:
dir <- system.file("extdata/salmon_dm", package="tximportData")
files <- file.path(dir, "SRR1197474", "quant.sf")
coldata <- data.frame(files, names="SRR1197474", condition="A", stringsAsFactors=FALSE)
# normally we would just run the following which would download the appropriate metadata
# se <- tximeta(coldata)
# for this example, we instead point to a local path where the GTF can be found
# by making a linkedTxome:
indexDir <- file.path(dir, "Dm.BDGP6.22.98_salmon-0.14.1")
fastaFTP <- c("ftp://ftp.ensembl.org/pub/release-98/fasta/drosophila_melanogaster/cdna/Drosophila_melanogaster.BDGP6.22.cdna.all.fa.gz",
"ftp://ftp.ensembl.org/pub/release-98/fasta/drosophila_melanogaster/ncrna/Drosophila_melanogaster.BDGP6.22.ncrna.fa.gz")
gtfPath <- file.path(dir, "Drosophila_melanogaster.BDGP6.22.98.gtf.gz")
makeLinkedTxome(indexDir=indexDir, source="LocalEnsembl", organism="Drosophila melanogaster",
release="98", genome="BDGP6.22", fasta=fastaFTP, gtf=gtfPath, write=FALSE)
se <- tximeta(coldata)
# to clear the entire linkedTxome table
# (don't run unless you want to clear this table!)
# bfcloc <- getTximetaBFC()
# bfc <- BiocFileCache(bfcloc)
# bfcremove(bfc, bfcquery(bfc, "linkedTxomeTbl")$rid)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.