Minor changes:
makeGRangesFromGff
: Don't error on Seqinfo
download failure from Ensembl
via getChromInfoFromEnsembl
. This is fragile for old GFF files currently,
such as "Mus_musculus.GRCm38.90.gtf.gz"
. We have filed a bug report with
Bioconductor for GenomeInfoDb, but we may need to write our own variant of
this function that provides better support for old Ensembl releases. Consider
referring to ensembldb code for an alternate approach.Minor changes:
makeGRangesFromEnsembl
: Fixed support for legacy GRCh38 release 87 dataset,
which doesn't have gene versions defined, and would otherwise error if
ignoreVersion = FALSE
is set.Major changes:
EnsemblToNcbi
and NcbiToEnsembl
: Added support for disabling strict
mode, which is useful for mapping all genes in a reference genome. This is
not a breaking change, as strict mode remains enabled by default. Also
reworked the internal code to speed up 1:1 mapping return.Minor changes:
Hgnc
: Renamed "symbol"
column to "geneName"
and "name"
to
"description"
, better matching naming conventions in other functions.Minor changes:
HumanToMouse
to JaxHumanMouse
, which indicates the source
(Jackson Laboratory) more clearly.New functions / classes:
HumanToMouse
: Downloads human-to-mouse gene mappings from the Jackson
Laboratory MGI server. Returns with unique 1:1 mappings by default, which
can be disabled with the unique
argument.Major changes:
"To"
instead of the numeric "2"
in
the class name. Sorry Prince, but this just looks weird for some functions.
Applies to Ensembl2Ncbi
, Gene2Symbol
, Ncbi2Ensembl
, Protein2Gene
,
and Tx2Gene
.validObject
for our custom classes, ensuring
that metadata is consistently slotted, including date and package version.Minor changes:
inst/extdata
to save internally inside the
package via sysdata.rda
file. Applies to detectOrganism
and
mapNcbiTaxId
currently.Major changes:
downloadGencodeGenome
: This genome download function now sanitizes the
transcriptome FASTA to only include transcript identifiers in the header
without additional information separated by the "|"
(pipe) delimeter. This
approach is not commonly used in FASTA files, and results in unwanted
downstream behavior when quantifying at transcript level using kallisto and
minimap2. Note that salmon can currently handle this edge case when setting
the --gencode
flag during the genome index step. This action is
non-destructive and returns a "transcriptome_fixed" FASTA file. We are now
symlinking this fixed file by default, but the unmodified original is
retained in the transcriptome download folder.Minor changes:
currentEnsemblGenomeBuild
: Fixed internal REST API query to Ensembl server.
This now requires "content-type=application/json"
to be defined in the URL,
otherwise the Ensembl server returns text instead of JSON.currentEnsemblVersion
: Now parses current_README
file on FTP server
instead of the top level README
. We changed this because the README
symlink can break during Ensembl release updates (e.g. 109 to 110, in
progress).Starting a new release series to denote potential breaking changes with legacy
objects saved with entrezId
instead of ncbiGeneId
.
New functions:
mapGeneNamesToEnsembl
,
mapGeneNamesToHGNC
, mapGeneNamesToNCBI
. These are covered against
Homo sapiens and Mus musculus.Major changes:
GRanges
(e.g. EnsemblGenes
, EnsemblTranscripts
,
GencodeGenes
, GencodeTranscripts
) now intentionally fail class checks if
entrezId
is defined instead of ncbiGeneId
in mcols
metadata. This makes
downstream handoff to GSEA functions in AcidGSEA easier to manage. For
legacy objects, use updateObject
to resolve this check.makeGRangesFromEnsembl
and makeGRangesFromGFF
now attempt to fetch
additional useful gene metadata, including gene synonyms from the Ensembl FTP
server when applicable. This currently applies to gene annotation files from
Ensembl and GENCODE. Note that extra metadata is not supported for legacy
Homo sapiens GRCh37 genome build.Minor changes:
EntrezGeneInfo
function to NcbiGeneInfo
.geneSynonyms
from NAMESPACE
. Consider using NcbiGeneInfo
or
HGNC
instead for synonym information.HGNC2Ensembl
and MGI2Ensembl
. Just use HGNC
and MGI
function
return instead.downloadEnsemblGenome
now downloads additional useful metadata files.updateObject
support to update legacy objects that may fail new
entrezId
class checks.Minor changes:
makeTx2GeneFromGFF
, Tx2Gene
: Ensure that rownames are defined for RefSeq
genome annotations, which are constructed from GenomicRangesList
method.New functions:
gencodeReleaseHistory
: This function scrapes the GENCODE website to return
the full release history for either human or mouse genomes.Minor changes:
currentEnsemblVersion
: Fix for breaking change on Ensembl FTP server.
The file this function parses has been renamed from current_README
to
simply README
.mapGencodeToEnsembl
: Now using gencodeReleaseHistory
internally to
dynamically fetch metadata directly from the GENCODE website, instead of
relying on an internal CSV mapping file. This helps avoid having to update
the package every time a new Ensembl/GENCODE release comes out.Minor changes:
requireNamespaces
import from AcidBase to goalie.Seqinfo
fetch step to fail, to avoid unit test
issues with Mus musulus Ensembl 90 GTF file (for bcbioRNASeq).Minor changes:
downloadUCSCGenome
: Added a manual override for defaulting to "hg38"
for Homo sapiens, which has switched over to experimental "hs1"
T2T
genome build.mapGencodeToEnsembl
: Added support for mapping Mus musculus releases.Seqinfo
generators now default to pulling annotations from Ensembl
for GENCODE reference, rather than UCSC. Note that UCSC Seqinfo
function
is currently broken for hg38, but will be fixed in pending GenomeInfoDb
v1.34.8.Minor changes:
EntrezGeneInfo
: Added column name checks. Renamed xTaxId
to taxonomyId
.importFrom
calls to imports.R
file.export
: Updated methods to match new conventions defined in pipette.Minor chagnes:
makeGRangesFromEnsembl
: Hardened internal code to suppress spurious warnings
from rtracklayer due to masking of download.file
function. See
issue #71 for details.tempdir2
and
unlink2
internally, which improves support for continuous integration (CI)
checks on Windows.Minor changes:
Ensembl2Entrez
: Simplified format
"1:1"
handling, based on new code
approach used in AcidGSEA package. This now keeps track of original rownames,
which is necessary for bcbioRNASeq clusterProfiler R Markdown template.Minor changes:
Major changes:
DFrame
instead of DataFrame
, as the previous definition approach no longer works
with Bioconductor 3.15. This applies to Ensembl2Entrez
, Entrez2Ensembl
,
EntrezGeneInfo
, Gene2Symbol
, HGNC
, HGNC2Ensembl
, Protein2Gene
,
and Tx2Gene
classes.downloadGencodeGenome
: Added support for Entrez and RefSeq identifiers,
which are now defined in mcols
of GRanges
objects.Minor changes:
makeTx2GeneFromFASTA
: Added ignoreVersions
argument, which is now enabled
by default, to match other Tx2Gene
functions.New functions:
mapGencodeToEnsembl
: Convenience function for mapping human GENCODE release
(e.g. 39
) to corresponding Ensembl release (e.g. 105
).Minor changes:
GenomicRanges
mcols
column name validity checks to no longer
require strict camelCase formatting. This check can run into issues when
slotting into a DESeqDataSet
object, which appends non-camelCase-formatted
columns into the mcols
(corresponding to the rowRanges
of the object).Ensembl2Entrez
, Entrez2Ensembl
, Gene2Symbol
, Tx2Gene
.GRanges
to GenomicRanges
(virtual class)
where applicable in the package.Ensembl2Entrez
and Entrez2Ensembl
methods
into separate files.setMethod
calls, particular in the signature
argument, where applicable.AnnotationHub
internally now should never prompt the user about
whether to create cache directory. This is achieved by setting ask = FALSE
internally.assert
calls to check for
GenomicRanges
virtual class rather than GRanges
.Seqinfo
,
genome
, seqinfo
, and seqlevels
. Note that corresponding assignment
methods (if applicable) are intentionally not reexported here.Minor changes:
Tx2Gene
class check: disabling check that looks for identical transcript
and gene identifiers. This check is not compatible with the
Saccharomyces cerevisiae (sacCer3) reference genome. Thanks for pointing
this out @amizeranschi.Minor changes:
getEnsDb
/ makeGRangesFromEnsembl
: Quieted down package loading from
Bioconductor when obtaining annotations for GRCh37 (EnsDb.Hsapiens.v75
release package).Minor changes:
downloadRefSeqGenome
, downloadUCSCGenome
: Improved the genomeBuild
documentation, with more specific examples.Major changes:
Gene2Symbol
: Hardened internal identifier mapping code in switch
call
to support format
argument. Improved unit testing for expected behavior
of format
argument. Fixed "1:1" mapping to split based on geneName
column
rather than geneId
column.Tx2Gene
: Improved code coverge and cleaned up internal complete.cases
handling.Ensembl2Entrez
, Gene2Symbol
,
Tx2Gene
) now check for complete.cases
in S4 validity methods.Minor changes:
currentEnsemblVersion
and mapHumanOrthologs
working examples are now
re-enabled, wrapped in a try
call.Major changes:
mapEnsemblBuildToUCSC
and mapUCSCBuildToEnsembl
functions. Also
removed mapping support for UCSC genome build names (e.g. "hg38") inside of
makeGRangesFromEnsembl
calls, since this is not technically the correct
genome build name.downloadEnsemblGenome
, downloadGencodeGenome
, etc. now support file
caching by default with cache = TRUE
argument.Minor changes:
stripGeneVersions
and stripTranscriptVersions
documentation
into separate files.export
: Hardened Tx2Gene
method to ensure that rownames are consistently
removed prior to export. Noticed that this was an issue with UCSC genome
build download.Minor changes:
mapUCSCBuildToEnsembl
and downloadEnsemblGenome
, in particular.
Note that *_chr_patch_hapl_scaff
GFF and GTF files are no longer available
on the Egnyte FTP server for GRCm39 (only GRCm38 and GRCh38).Minor changes:
Gene2Symbol
: Improve handling when gene identifiers are integer, such as
is the case with NCBI Entrez gene identifiers.Major changes:
Minor changes:
HGNC
now returns columns with split values as CharacterList
, instead of
as character strings containing "|".Minor changes:
mapHumanOrthologs
: Hardened mouse-to-human matching.makeGRangesFromEnsembl
: No longer hard-coding minimum release version check
at 87, in case older releases are ported to AnnotationHub in a future release.ignoreVersion = TRUE
by default for genome annotation
importers, as this is typically what users expect by default.Minor changes:
Gene2Symbol
functions now preserve metadata, as expected. This was
causing pointillism package to error, due to unwanted breaking change.Tx2Gene
: Improved consistency of metadata return, ensuring call
and
synonyms
are not defined.Minor changes:
EnsemblGenes
and EnsemblTranscripts
.Minor changes:
makeGRangesFromGFF
: Improved support and code coverage for handling of
bcbio-nextgen ref-transcripts.gtf
genome file.Minor changes:
EntrezGeneInfo
: Improved column formatting.Minor changes:
downloadEnsemblGenome
) now return
relative symlinks instead of absolute paths.mapHumanOrthologs
internal join step. Now returns humanGeneId
and humanGeneName
columns instead of hgncId
and hgncName
columns, which
technically were incorrect, since these map to Ensembl.New functions:
EntrezGeneInfo
: New utility for obtaining gene annotations from NCBI.Major changes:
geneSynonyms
: Reworked internal code, extending EntrezGeneInfo
.Minor changes:
HGNC
and MGI2Ensembl
.Minor changes:
Minor changes:
New functions:
Major changes:
pipette::cacheURL
internally) to automatically
cache GFF/GTF files when used in makeGRangesFromGFF
.ignoreTxVersion
to simply ignoreVersion
, where applicable.
We want this setting to also apply at gene level.ignoreVersion = FALSE
. Previous releases of AcidGenomes and basejump had
this set to ignoreVersion = TRUE
by default. Note that both modes are now
non-destructive.GRanges
mcols
now return with tx
prefix instead of transcript
.GRanges
mcols
now use strict camel case formatting
(e.g. geneId
instead of geneID
).Minor changes:
makeGene2SymbolFromEnsembl
for example, which is causing running examples
to fail in pointillism without a fix.Initial release, consisting of functions migrated from basejump.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.