Description Details Constructor Accessor methods Subsetting Coercion Combining Seqinfo objects Author(s) See Also Examples
A Seqinfo object is a table-like object that contains basic information about a set of genomic sequences. The table has 1 row per sequence and 1 column per sequence attribute. Currently the only attributes are the length, circularity flag, and genome provenance (e.g. hg19) of the sequence, but more attributes might be added in the future as the need arises.
Typically Seqinfo objects are not used directly but are part of
higher level objects. Those higher level objects will generally
provide a seqinfo
accessor for getting/setting their
Seqinfo component.
Seqinfo(seqnames, seqlengths=NA, isCircular=NA, genome=NA)
:
Create a Seqinfo object and populate it with the supplied data.
One special form of calling the Seqinfo()
constructor is
to specify only the genome
argument and set it to the name
of an NCBI assembly (e.g. Seqinfo(genome="GRCh38.p12")
)
or UCSC genome (e.g. Seqinfo(genome="hg38")
), in which
case the sequence information is fetched from NCBI or UCSC.
See Examples section below for some examples.
In the code snippets below, x
is a Seqinfo object.
length(x)
:
Return the number of sequences in x
.
seqnames(x)
, seqnames(x) <- value
:
Get/set the names of the sequences in x
.
Those names must be non-NA, non-empty and unique.
They are also called the sequence levels or the keys
of the Seqinfo object.
Note that, in general, the end user should not try to alter the
sequence levels with seqnames(x) <- value
. The recommended way
to do this is with seqlevels(x) <- value
as described below.
names(x)
, names(x) <- value
:
Same as seqnames(x)
and seqnames(x) <- value
.
seqlevels(x)
:
Same as seqnames(x)
.
seqlevels(x) <- value
:
Can be used to rename, drop, add and/or reorder the sequence levels.
value
must be either a named or unnamed character vector.
When value
has names, the names only serve the purpose of
mapping the new sequence levels to the old ones.
Otherwise (i.e. when value
is unnamed) this mapping is
implicitly inferred from the following rules:
(1) If the number of new and old levels are the same, and if the
positional mapping between the new and old levels shows that
some or all of the levels are being renamed, and if the levels
that are being renamed are renamed with levels that didn't exist
before (i.e. are not present in the old levels), then
seqlevels(x) <- value
will just rename the sequence levels.
Note that in that case the result is the same as with
seqnames(x) <- value
but it's still recommended to use
seqlevels(x) <- value
as it is safer.
(2) Otherwise (i.e. if the conditions for (1) are not satisfied)
seqlevels(x) <- value
will consider that the sequence
levels are not being renamed and will just perform
x <- x[value]
.
See below for some examples.
seqlengths(x)
, seqlengths(x) <- value
:
Get/set the length for each sequence in x
.
isCircular(x)
, isCircular(x) <- value
:
Get/set the circularity flag for each sequence in x
.
genome(x)
, genome(x) <- value
:
Get/set the genome identifier or assembly name for each sequence
in x
.
In the code snippets below, x
is a Seqinfo object.
x[i]
:
A Seqinfo object can be subsetted only by name i.e. i
must be a character vector.
This is a convenient way to drop/add/reorder the rows (aka the
sequence levels) of a Seqinfo object.
See below for some examples.
In the code snippets below, x
is a Seqinfo object.
as.data.frame(x)
:
Turns x
into a data frame.
There are no c
or rbind
method for Seqinfo objects.
Both would be expected to just append the rows in y
to the rows
in x
resulting in an object of length length(x) + length(y)
.
But that would tend to break the constraint that the seqnames of a Seqinfo
object must be unique keys.
So instead, a merge
method is provided.
In the code snippet below, x
and y
are Seqinfo objects.
merge(x, y)
:
Merge x
and y
into a single Seqinfo object where the
keys (aka the seqnames) are union(seqnames(x), seqnames(y))
.
If a row in y
has the same key as a row in x
, and if
the 2 rows contain compatible information (NA values are compatible
with anything), then they are merged into a single row in the result.
If they cannot be merged (because they contain different seqlengths,
and/or circularity flags, and/or genome identifiers), then an error
is raised.
In addition to check for incompatible sequence information,
merge(x, y)
also compares seqnames(x)
with
seqnames(y)
and issues a warning if each of them has names not
in the other. The purpose of these checks is to try to detect situations
where the user might be combining or comparing objects based on
different reference genomes.
intersect(x, y)
: Finds the intersection between
two Seqinfo
objects by merging them and subsetting for the
intersection of their sequence names. This makes it easy to avoid
warnings about the objects not being subsets of each other during
overlap operations.
A convenience wrapper, checkCompatibleSeqinfo()
, is provided
for checking whether 2 objects have compatible seqinfo components
or not. checkCompatibleSeqinfo(x, y)
is equivalent to
merge(seqinfo(x), seqinfo(y))
so will work on any objects
x
and y
that support seqinfo()
.
H. Pagès
seqinfo
The getChromInfoFromNCBI
and
getChromInfoFromUCSC
utility functions
that are used behind the scene to generate a Seqinfo
object for a given assembly/genome (see examples below).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | ## ---------------------------------------------------------------------
## A. MAKING A Seqinfo OBJECT FOR A GIVEN NCBI ASSEMBLY OR UCSC GENOME
## ---------------------------------------------------------------------
## One special form of calling the 'Seqinfo()' constructor is to specify
## only the 'genome' argument and set it to the name of an NCBI assembly
## or UCSC genome, in which case the sequence information is fetched
## from NCBI or UCSC ('getChromInfoFromNCBI()' or 'getChromInfoFromUCSC()'
## are used behind the scene for this so internet access is required).
if (interactive()) {
## NCBI assemblies (see '?registered_NCBI_assemblies' for the list of
## NCBI assemblies that are currently supported):
Seqinfo(genome="GRCh38")
Seqinfo(genome="GRCh38.p12")
Seqinfo(genome="Amel_HAv3.1")
Seqinfo(genome="WBcel235")
Seqinfo(genome="TAIR10.1")
## UCSC genomes (see '?registered_UCSC_genomes' for the list of UCSC
## genomes that are currently supported):
Seqinfo(genome="hg38")
Seqinfo(genome="mm10")
Seqinfo(genome="rn6")
Seqinfo(genome="bosTau9")
Seqinfo(genome="canFam3")
Seqinfo(genome="musFur1")
Seqinfo(genome="galGal6")
Seqinfo(genome="dm6")
Seqinfo(genome="ce11")
Seqinfo(genome="sacCer3")
}
## ---------------------------------------------------------------------
## B. BASIC MANIPULATION OF A Seqinfo OBJECT
## ---------------------------------------------------------------------
## Note that all the arguments (except 'genome') must have the
## same length. 'genome' can be of length 1, whatever the lengths
## of the other arguments are.
x <- Seqinfo(seqnames=c("chr1", "chr2", "chr3", "chrM"),
seqlengths=c(100, 200, NA, 15),
isCircular=c(NA, FALSE, FALSE, TRUE),
genome="toy")
x
## Accessors:
length(x)
seqnames(x)
names(x)
seqlevels(x)
seqlengths(x)
isCircular(x)
genome(x)
## Get a compact summary:
summary(x)
## Subset by names:
x[c("chrY", "chr3", "chr1")]
## Rename, drop, add and/or reorder the sequence levels:
xx <- x
seqlevels(xx) <- sub("chr", "ch", seqlevels(xx)) # rename
xx
seqlevels(xx) <- rev(seqlevels(xx)) # reorder
xx
seqlevels(xx) <- c("ch1", "ch2", "chY") # drop/add/reorder
xx
seqlevels(xx) <- c(chY="Y", ch1="1", "22") # rename/reorder/drop/add
xx
## ---------------------------------------------------------------------
## C. MERGING 2 Seqinfo OBJECTS
## ---------------------------------------------------------------------
y <- Seqinfo(seqnames=c("chr3", "chr4", "chrM"),
seqlengths=c(300, NA, 15))
y
## This issues a warning:
merge(x, y) # rows for chr3 and chrM are merged
## To get rid of the above warning, either use suppressWarnings() or
## set the genome on 'y':
suppressWarnings(merge(x, y))
genome(y) <- genome(x)
merge(x, y)
## Note that, strictly speaking, merging 2 Seqinfo objects is not
## a commutative operation, i.e., in general 'z1 <- merge(x, y)'
## is not identical to 'z2 <- merge(y, x)'. However 'z1' and 'z2'
## are guaranteed to contain the same information (i.e. the same
## rows, but typically not in the same order):
merge(y, x)
## This contradicts what 'x' says about circularity of chr3 and chrM:
isCircular(y)[c("chr3", "chrM")] <- c(TRUE, FALSE)
y
if (interactive()) {
merge(x, y) # raises an error
}
## Sanity checks:
stopifnot(identical(x, merge(x, Seqinfo())))
stopifnot(identical(x, merge(Seqinfo(), x)))
stopifnot(identical(x, merge(x, x)))
## ---------------------------------------------------------------------
## D. checkCompatibleSeqinfo()
## ---------------------------------------------------------------------
library(GenomicRanges)
gr1 <- GRanges("chr3:15-25", seqinfo=x)
gr2 <- GRanges("chr3:105-115", seqinfo=y)
if (interactive()) {
checkCompatibleSeqinfo(gr1, gr2) # raises an error
}
|
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, basename, cbind, colMeans, colSums, colnames,
dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
intersect, is.unsorted, lapply, lengths, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which, which.max, which.min
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following object is masked from 'package:base':
expand.grid
Loading required package: IRanges
Seqinfo object with 4 sequences (1 circular) from toy genome:
seqnames seqlengths isCircular genome
chr1 100 NA toy
chr2 200 FALSE toy
chr3 NA FALSE toy
chrM 15 TRUE toy
[1] 4
[1] "chr1" "chr2" "chr3" "chrM"
[1] "chr1" "chr2" "chr3" "chrM"
[1] "chr1" "chr2" "chr3" "chrM"
chr1 chr2 chr3 chrM
100 200 NA 15
chr1 chr2 chr3 chrM
NA FALSE FALSE TRUE
chr1 chr2 chr3 chrM
"toy" "toy" "toy" "toy"
[1] "4 sequences (1 circular) from toy genome"
Seqinfo object with 3 sequences from 2 genomes (NA, toy):
seqnames seqlengths isCircular genome
chrY NA NA <NA>
chr3 NA FALSE toy
chr1 100 NA toy
Seqinfo object with 4 sequences (1 circular) from toy genome:
seqnames seqlengths isCircular genome
ch1 100 NA toy
ch2 200 FALSE toy
ch3 NA FALSE toy
chM 15 TRUE toy
Seqinfo object with 4 sequences (1 circular) from toy genome:
seqnames seqlengths isCircular genome
chM 15 TRUE toy
ch3 NA FALSE toy
ch2 200 FALSE toy
ch1 100 NA toy
Seqinfo object with 3 sequences from 2 genomes (toy, NA):
seqnames seqlengths isCircular genome
ch1 100 NA toy
ch2 200 FALSE toy
chY NA NA <NA>
Seqinfo object with 3 sequences from 2 genomes (NA, toy):
seqnames seqlengths isCircular genome
Y NA NA <NA>
1 100 NA toy
22 NA NA <NA>
Seqinfo object with 3 sequences from an unspecified genome:
seqnames seqlengths isCircular genome
chr3 300 NA <NA>
chr4 NA NA <NA>
chrM 15 NA <NA>
Seqinfo object with 5 sequences (1 circular) from 2 genomes (toy, NA):
seqnames seqlengths isCircular genome
chr1 100 NA toy
chr2 200 FALSE toy
chr3 300 FALSE toy
chrM 15 TRUE toy
chr4 NA NA <NA>
Warning message:
In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': chr1, chr2
- in 'y': chr4
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
Seqinfo object with 5 sequences (1 circular) from 2 genomes (toy, NA):
seqnames seqlengths isCircular genome
chr1 100 NA toy
chr2 200 FALSE toy
chr3 300 FALSE toy
chrM 15 TRUE toy
chr4 NA NA <NA>
Seqinfo object with 5 sequences (1 circular) from toy genome:
seqnames seqlengths isCircular genome
chr1 100 NA toy
chr2 200 FALSE toy
chr3 300 FALSE toy
chrM 15 TRUE toy
chr4 NA NA toy
Seqinfo object with 5 sequences (1 circular) from toy genome:
seqnames seqlengths isCircular genome
chr3 300 FALSE toy
chr4 NA NA toy
chrM 15 TRUE toy
chr1 100 NA toy
chr2 200 FALSE toy
Seqinfo object with 3 sequences (1 circular) from toy genome:
seqnames seqlengths isCircular genome
chr3 300 TRUE toy
chr4 NA NA toy
chrM 15 FALSE toy
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.