GTuples-class | R Documentation |
The GTuples
class is a container for the genomic tuples and their
associated annotations.
GTuples
extends GRanges
as a container
for genomic tuples rather than genomic ranges. GTuples
is a vector of
genomic locations and associated annotations. Each element in the vector is
comprised of a sequence name, a tuple, a strand
,
and optional metadata columns (e.g. score, GC content, etc.). This
information is stored in four components:
seqnames
a 'factor' Rle
object
containing the sequence names.
tuples
externally, a matrix-link object containing the
tuples. Internally, an IRanges
object storing the
first and last position of each tuple and, if required, a matrix storing
the "internal" positions of each tuple (see description of
internalPos
below).
strand
a Rle
object containing
the strand information.
mcols
a DataFrame
object
containing the metadata columns. Columns cannot be named "seqnames
"
"ranges
", "tuples
", "internalPos
", "size
",
"strand
", "seqlevels
", "seqlengths
",
"isCircular
", "start
", "end
", "width
", or
"element
".
seqinfo
a DataFrame
object
containing information about the set of genomic sequences present in the
GTuples
object.
Since the GTuples
class extends the
GRanges
class it contains the seqnames
,
ranges
, strand
, elementMetadata
, seqinfo
and
metadata
. The GTuples
class also contains two additional slots,
size
and internalPos
.
size
An integer. The size
of the genomic tuples
stored in the GTuples
object.
internalPos
If the size
of the genomic tuples is
greater than 2, internalPos
is an integer
matrix
storing the "internal" positions of each genomic tuple.
Otherwise internalPos
is NULL
.
GTuples(seqnames = Rle(), tuples = matrix(),
strand = Rle("*", length(seqnames)), ..., seqlengths = NULL,
seqinfo = NULL)
: Creates a GTuples
object.
seqnames
Rle
object, character
vector, or factor containing the sequence names.
tuples
matrix object containing the positions of the tuples. The first column should refer to pos1, the second to pos2, etc.
strand
Rle
object, character
vector, or factor containing the strand information.
...
Optional metadata columns. These columns cannot be
named "start
", "end
", "width
", or
"element
". A named integer vector "seqlength
" can be
used instead of seqinfo
.
seqlengths
an integer vector named with the sequence
names and containing the lengths (or NA
) for each
level(seqnames)
.
seqinfo
a DataFrame
object
containing allowed sequence names and lengths (or NA
) for each
level(seqnames)
.
In the code snippets below, x
is a GTuples object.
as.data.frame(x, row.names = NULL, optional = FALSE, ...)
:Creates a data.frame with columns seqnames
(factor),
tuples
(integer), strand
(factor), as well as
the additional metadata columns stored in mcols(x)
. Pass an
explicit stringsAsFactors=TRUE/FALSE
argument via ...
to
override the default conversions for the metadata columns in
mcols(x)
.
as.character(x, ignore.strand=FALSE)
:Turn GTuples object x
into a character vector where each
tuples in x
is represented by a string in format
chr1:100,109,115:+
. If ignore.strand
is TRUE or if
all the ranges in x
are unstranded (i.e. their strand
is set to *
), then all the strings in the output are in
format chr1:100,109,115
.
The names on x
are propagated to the returned character vector.
Its metadata (metadata(x)
) and metadata columns (mcols(x)
)
are ignored.
as.factor(x)
:Equivalent to
factor(as.character(x), levels=as.character(sort(unique(x))))
as(x, "GRanges"), granges(x)
:Creates a GRanges
object from a GTuples
object. WARNING: This is generally a destructive
operation because all "internal" positions will be dropped.
In the following code snippets, x is a GTuples
object.
size(x)
:Get the size of the genomic tuples stored in x
.
length(x)
:Get the number of elements.
seqnames(x)
, seqnames(x) <- value
:Get or set the sequence names. value
can be an
Rle
object, a character vector, or a factor.
tuples(x)
, tuples(x) <- value
:Get the positions of the tuples, which are returned as an integer matrix.
value
can be an integer matrix.
ranges(x, use.mcols = FALSE)
, ranges(x) <- value
:Get or set the ranges in the form of a CompressedIRangesList. value
can be a IntegerRangesList object.
WARNING: The use of ranges
with GTuples
objects is
strongly discouraged. It will only get/set pos_{1}
and
pos_{m}
of the tuples, where m
is the size
of the
tuples, as these are what are stored in the "ranges" slot of a
GTuples
object.
names(x)
, names(x) <- value
:Get or set the names of the elements.
strand(x)
, strand(x) <- value
:Get or set the strand. value
can be an Rle
object, character vector, or factor.
mcols(x, use.names=FALSE)
, mcols(x) <- value
:Get or set the metadata columns. If use.names=TRUE
and the metadata
columns are not NULL
, then the names of x
are propagated as
the row names of the returned DataFrame
object. When setting the metadata columns, the supplied value must be
NULL
or a data.frame-like object (i.e.
DataFrame
or data.frame
)
object holding element-wise metadata.
elementMetadata(x)
, elementMetadata(x) <- value
,
values(x)
, values(x) <- value
:Alternatives to mcols
functions. Their use is discouraged.
seqinfo(x)
, seqinfo(x) <- value
:Get or set the information about the underlying sequences. value
must be a DataFrame
object.
seqlevels(x)
, seqlevels(x, force=FALSE) <- value
:Get or set the sequence levels. seqlevels(x)
is equivalent to
seqlevels(seqinfo(x))
or to levels(seqnames(x))
, those 2
expressions being guaranteed to return identical character vectors on a
GTuples
object. value
must be a character vector with no
NA
s. See ?seqlevels
for more
information.
seqlengths(x)
, seqlengths(x) <- value
:Get or set the sequence lengths. seqlengths(x)
is equivalent to
seqlengths(seqinfo(x))
. value
can be a named non-negative
integer or numeric vector eventually with NA
s.
isCircular(x)
, isCircular(x) <- value
:Get or set the circularity flags. isCircular(x)
is equivalent to
isCircular(seqinfo(x))
. value
must be a named logical
vector eventually with NA
s.
genome(x)
, genome(x) <- value
:Get or set the genome identifier or assembly name for each sequence.
genome(x)
is equivalent to genome(seqinfo(x))
. value
must be a named character vector eventually with NA
s.
seqlevelsStyle(x)
, seqlevelsStyle(x) <- value
:Get or set the seqname style for x
. See the
seqlevelsStyle
generic getter and setter in
the GenomeInfoDb package for more information.
score(x)
, score(x) <- value
:Get or set the "score" column from the element metadata.
In the following code snippets, x is a GTuples
object.
WARNING: The preferred setter is tuples(x) <- value
and the
use of start(x) <- value
, end(x) <- value
and
width(x) <- value is discouraged
.
start(x)
, start(x) <- value
:Get or set pos_{1}
of the tuples. WARNING: The use of
width(x) <- value
is discouraged; instead, construct the tuples as
the appropriate integer matrix, mvalue
, and use
tuples(x) <- mvalue
.
end(x)
, end(x) <- value
:Get or set pos_{m}
of the tuples, where m
is the size
of the tuples. WARNING: The use of end(x) <- value
is
discouraged; instead, construct the tuples as the appropriate integer
matrix, mvalue
, and use tuples(x) <- mvalue
.
IPD(x)
:Get the intra-pair distances (IPD
). IPD
is
only defined for tuples with size
> 1. The IPD
of a tuple
with size
= m
is the vector of intra-pair distances,
(pos_{2} - pos_{1}, \ldots, pos_{m} - pos_{m - 1})
.
width(x)
, width(x) <- value
:Get or set pos_{m} - pos_{1}
of the tuples, where m
is the
size
of the tuples. If using width(x) <- value
,
pos_{1}
is held fixed and pos_{m}
is altered.
WARNING: The use of width(x) <- value
is discouraged;
instead, construct the tuples as the appropriate integer matrix,
mvalue
, and use tuples(x) <- mvalue
.
In the following code snippets, x is a GTuples
object.
append(x, values, after = length(x))
:Inserts the values into x
at the position given by after, where
x
and values
are of the same class.
c(x, ...)
:Combines x
and the GTuples
objects in ...
together.
Any object in ...
must belong to the same class as x
, or to
one of its subclasses, or must be NULL
. The result is an object of
the same class as x
.
c(x, ..., ignore.mcols=FALSE)
:If the GTuples
objects have metadata columns (represented as one
DataFrame
per object), each such
DataFrame
must have the same columns in
order to combine successfully. In order to circumvent this restraint, you
can pass in an ignore.mcols=TRUE
argument which will combine all
the objects into one and drop all of their metadata columns.
split(x, f, drop=FALSE)
:Splits x
according to
f
to create a GTuplesList
object. If f
is a
list-like object then drop
is ignored and f
is treated as
if it was rep(seq_len(length(f)), sapply(f, length))
, so the
returned object has the same shape as f
(it also receives the
names of f
). Otherwise, if f
is not a list-like object,
empty list elements are removed from the returned object if drop
is TRUE
.
In the following code snippets, x is a GTuples
object.
x[i, j]
, x[i, j] <- value
:Get or set elements i
with optional metadata columns
mcols(x)[,j]
, where i
can be missing; an NA
-free
logical, numeric, or character vector; or a 'logical'
Rle
object.
x[i,j] <- value
:Replaces elements i
and optional metadata columns j
with
value
.
head(x, n = 6L)
:If n
is non-negative, returns the first n
elements of the
GTuples
object. If n
is negative, returns all but the last
abs(n)
elements of the GTuples
object.
rep(x, times, length.out, each)
:Repeats the values in x
through one of the following conventions:
times
Vector giving the number of times to repeat each
element if of length length(x)
, or to repeat the whole vector
if of length 1.
length.out
Non-negative integer. The desired length of the output vector.
each
Non-negative integer. Each element of x
is
repeated each times.
subset(x, subset)
:Returns a new object of the same class as x
made of the subset
using logical vector subset, where missing values are taken as
FALSE
.
tail(x, n = 6L)
:If n
is non-negative, returns the last n
elements of the
GTuples
object. If n
is negative, returns all but the first
abs(n)
elements of the GTuples
object.
window(x, start = NA, end = NA, width = NA, frequency = NULL, delta = NULL, ...)
:Extracts the subsequence window from the GTuples
object using:
start
, end
, width
The start, end, or width of the window. Two of the three are required.
frequency
, delta
Optional arguments that specify the sampling frequency and increment within the window.
In general, this is more efficient than using "["
operator.
window(x, start = NA, end = NA, width = NA, keepLength = TRUE) <- value
:Replaces the subsequence window specified on the left (i.e. the subsequence
in x
specified by start
, end
and width
)
by value
.
value
must either be of class class(x)
, belong to a subclass
of class(x)
, be coercible to class(x)
, or be NULL
.
If keepLength
is TRUE
, the elements of value
are
repeated to create a GTuples
object with the same number of elements
as the width of the subsequence window it is replacing.
If keepLength
is FALSE
, this replacement method can modify
the length of x
, depending on how the length of the left
subsequence window compares to the length of value
.
x$name
, x$name <- value
:Shortcuts for mcols(x)$name
and mcols(x)$name <- value
,
respectively. Provided as a convenience, from
GRanges
as the result of strong popular demand. Note
that those methods are not consistent with the other $
and
$<-
methods in the IRanges
/
GenomicRanges
infrastructure, and might
confuse some users by making them believe that a
GRanges
object can be manipulated as a
data.frame-like object. Therefore we recommend using them only
interactively, and we discourage their use in scripts or packages. For
the latter, use mcols(x)$name
and mcols(x)$name <- value
,
instead of x$name
and x$name <- value
, respectively.
show(x)
:By default the show method displays 5 head and 5
tail elements. This can be changed by setting the global options
showHeadLines
and showTailLines
. If the object length is
less than (or equal to) the sum of these 2 options plus 1, then the
full object is displayed. Note that these options also affect the
display of GRanges
objects (defined in
the GenomicRanges package),
GAlignments
and
GAlignmentPairs
objects (defined in the GenomicAlignments package), as well
as other objects defined in the IRanges and Biostrings
packages (e.g. IRanges
and
DNAStringSet
objects).
Peter Hickey
GTuplesList-class
,
seqinfo
,
Vector
,
Rle
,
DataFrame
,
GRanges
,
intra-tuple-methods
,
findOverlaps-methods
,
nearest-methods
,
## Create example 4-tuples
seqinfo <- Seqinfo(paste0("chr", 1:3), c(1000, 2000, 1500), NA, "mock1")
gt4 <- GTuples(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"),
c(1, 3, 2, 4)),
tuples = matrix(c(1:10, 2:11, 3:12, 4:13), ncol = 4),
strand = Rle(strand(c("-", "+", "*", "+", "-")),
c(1, 2, 2, 3, 2)),
score = 1:10, GC = seq(1, 0, length = 10), seqinfo = seqinfo)
gt4
## Summarizing elements
table(seqnames(gt4))
sum(width(gt4))
summary(mcols(gt4)[,"score"])
## Renaming the underlying sequences
seqlevels(gt4)
seqlevels(gt4) <- sub("chr", "Chrom", seqlevels(gt4))
gt4
seqlevels(gt4) <- sub("Chrom", "chr", seqlevels(gt4)) # revert
## Combining objects
gt4_a <- GTuples(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"),
c(1, 3, 2, 4)),
tuples = matrix(c(1:10, 21:30, 31:40, 41:50), ncol = 4),
strand = Rle(strand(c("-", "+", "*", "+", "-")),
c(1, 2, 2, 3, 2)),
score = 1:10, seqinfo = seqinfo)
gt4_b <- GTuples(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"),
c(1, 3, 2, 4)),
tuples = matrix(c(101:110, 121:130, 131:140, 141:150),
ncol = 4),
strand = Rle(strand(c("-", "+", "*", "+", "-")),
c(1, 2, 2, 3, 2)),
score = 1:10, seqinfo = seqinfo)
some_gt4 <- c(gt4_a, gt4_b)
## all_gt4 <- c(gt4, gt4_a, gt4_b) ## (This would fail)
all_gt4 <- c(gt4, gt4_a, gt4_b, ignore.mcols=TRUE)
## The number of lines displayed in the 'show' method
## are controlled with two global options.
options("showHeadLines" = 7)
options("showTailLines" = 2)
all_gt4
## Revert to default values
options("showHeadLines"=NULL)
options("showTailLines"=NULL)
## Get the size of the tuples stored in the GTuples object
size(gt4)
## Get the tuples
tuples(gt4)
## Get the matrix of intra-pair distances (IPD)
IPD(all_gt4)
## Can't combine genomic tuples of different sizes
gt1 <- GTuples('chr1', matrix(30:40))
gt1
## Not run:
## Returns error
c(gt4, gt1)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.