GTuples-class: GTuples objects

GTuples-classR Documentation

GTuples objects

Description

The GTuples class is a container for the genomic tuples and their associated annotations.

Details

GTuples extends GRanges as a container for genomic tuples rather than genomic ranges. GTuples is a vector of genomic locations and associated annotations. Each element in the vector is comprised of a sequence name, a tuple, a strand, and optional metadata columns (e.g. score, GC content, etc.). This information is stored in four components:

seqnames

a 'factor' Rle object containing the sequence names.

tuples

externally, a matrix-link object containing the tuples. Internally, an IRanges object storing the first and last position of each tuple and, if required, a matrix storing the "internal" positions of each tuple (see description of internalPos below).

strand

a Rle object containing the strand information.

mcols

a DataFrame object containing the metadata columns. Columns cannot be named "seqnames" "ranges", "tuples", "internalPos", "size", "strand", "seqlevels", "seqlengths", "isCircular", "start", "end", "width", or "element".

seqinfo

a DataFrame object containing information about the set of genomic sequences present in the GTuples object.

Slots

Since the GTuples class extends the GRanges class it contains the seqnames, ranges, strand, elementMetadata, seqinfo and metadata. The GTuples class also contains two additional slots, size and internalPos.

size

An integer. The size of the genomic tuples stored in the GTuples object.

internalPos

If the size of the genomic tuples is greater than 2, internalPos is an integer matrix storing the "internal" positions of each genomic tuple. Otherwise internalPos is NULL.

Constructor

GTuples(seqnames = Rle(), tuples = matrix(), strand = Rle("*", length(seqnames)), ..., seqlengths = NULL, seqinfo = NULL):

Creates a GTuples object.

seqnames

Rle object, character vector, or factor containing the sequence names.

tuples

matrix object containing the positions of the tuples. The first column should refer to pos1, the second to pos2, etc.

strand

Rle object, character vector, or factor containing the strand information.

...

Optional metadata columns. These columns cannot be named "start", "end", "width", or "element". A named integer vector "seqlength" can be used instead of seqinfo.

seqlengths

an integer vector named with the sequence names and containing the lengths (or NA) for each level(seqnames).

seqinfo

a DataFrame object containing allowed sequence names and lengths (or NA) for each level(seqnames).

Coercion

In the code snippets below, x is a GTuples object.

as.data.frame(x, row.names = NULL, optional = FALSE, ...):

Creates a data.frame with columns seqnames (factor), tuples (integer), strand (factor), as well as the additional metadata columns stored in mcols(x). Pass an explicit stringsAsFactors=TRUE/FALSE argument via ... to override the default conversions for the metadata columns in mcols(x).

as.character(x, ignore.strand=FALSE):

Turn GTuples object x into a character vector where each tuples in x is represented by a string in format chr1:100,109,115:+. If ignore.strand is TRUE or if all the ranges in x are unstranded (i.e. their strand is set to *), then all the strings in the output are in format chr1:100,109,115.

The names on x are propagated to the returned character vector. Its metadata (metadata(x)) and metadata columns (mcols(x)) are ignored.

as.factor(x):

Equivalent to

  factor(as.character(x), levels=as.character(sort(unique(x))))
as(x, "GRanges"), granges(x):

Creates a GRanges object from a GTuples object. WARNING: This is generally a destructive operation because all "internal" positions will be dropped.

Accessors

In the following code snippets, x is a GTuples object.

size(x):

Get the size of the genomic tuples stored in x.

length(x):

Get the number of elements.

seqnames(x), seqnames(x) <- value:

Get or set the sequence names. value can be an Rle object, a character vector, or a factor.

tuples(x), tuples(x) <- value:

Get the positions of the tuples, which are returned as an integer matrix. value can be an integer matrix.

ranges(x, use.mcols = FALSE), ranges(x) <- value:

Get or set the ranges in the form of a CompressedIRangesList. value can be a IntegerRangesList object.

WARNING: The use of ranges with GTuples objects is strongly discouraged. It will only get/set pos_{1} and pos_{m} of the tuples, where m is the size of the tuples, as these are what are stored in the "ranges" slot of a GTuples object.

names(x), names(x) <- value:

Get or set the names of the elements.

strand(x), strand(x) <- value:

Get or set the strand. value can be an Rle object, character vector, or factor.

mcols(x, use.names=FALSE), mcols(x) <- value:

Get or set the metadata columns. If use.names=TRUE and the metadata columns are not NULL, then the names of x are propagated as the row names of the returned DataFrame object. When setting the metadata columns, the supplied value must be NULL or a data.frame-like object (i.e. DataFrame or data.frame) object holding element-wise metadata.

elementMetadata(x), elementMetadata(x) <- value, values(x), values(x) <- value:

Alternatives to mcols functions. Their use is discouraged.

seqinfo(x), seqinfo(x) <- value:

Get or set the information about the underlying sequences. value must be a DataFrame object.

seqlevels(x), seqlevels(x, force=FALSE) <- value:

Get or set the sequence levels. seqlevels(x) is equivalent to seqlevels(seqinfo(x)) or to levels(seqnames(x)), those 2 expressions being guaranteed to return identical character vectors on a GTuples object. value must be a character vector with no NAs. See ?seqlevels for more information.

seqlengths(x), seqlengths(x) <- value:

Get or set the sequence lengths. seqlengths(x) is equivalent to seqlengths(seqinfo(x)). value can be a named non-negative integer or numeric vector eventually with NAs.

isCircular(x), isCircular(x) <- value:

Get or set the circularity flags. isCircular(x) is equivalent to isCircular(seqinfo(x)). value must be a named logical vector eventually with NAs.

genome(x), genome(x) <- value:

Get or set the genome identifier or assembly name for each sequence. genome(x) is equivalent to genome(seqinfo(x)). value must be a named character vector eventually with NAs.

seqlevelsStyle(x), seqlevelsStyle(x) <- value:

Get or set the seqname style for x. See the seqlevelsStyle generic getter and setter in the GenomeInfoDb package for more information.

score(x), score(x) <- value:

Get or set the "score" column from the element metadata.

Tuples methods

In the following code snippets, x is a GTuples object. WARNING: The preferred setter is tuples(x) <- value and the use of start(x) <- value, end(x) <- value and width(x) <- value is discouraged.

start(x), start(x) <- value:

Get or set pos_{1} of the tuples. WARNING: The use of width(x) <- value is discouraged; instead, construct the tuples as the appropriate integer matrix, mvalue, and use tuples(x) <- mvalue.

end(x), end(x) <- value:

Get or set pos_{m} of the tuples, where m is the size of the tuples. WARNING: The use of end(x) <- value is discouraged; instead, construct the tuples as the appropriate integer matrix, mvalue, and use tuples(x) <- mvalue.

IPD(x):

Get the intra-pair distances (IPD). IPD is only defined for tuples with size > 1. The IPD of a tuple with size= m is the vector of intra-pair distances, (pos_{2} - pos_{1}, \ldots, pos_{m} - pos_{m - 1}).

width(x), width(x) <- value:

Get or set pos_{m} - pos_{1} of the tuples, where m is the size of the tuples. If using width(x) <- value, pos_{1} is held fixed and pos_{m} is altered. WARNING: The use of width(x) <- value is discouraged; instead, construct the tuples as the appropriate integer matrix, mvalue, and use tuples(x) <- mvalue.

Splitting and Combining

In the following code snippets, x is a GTuples object.

append(x, values, after = length(x)):

Inserts the values into x at the position given by after, where x and values are of the same class.

c(x, ...):

Combines x and the GTuples objects in ... together. Any object in ... must belong to the same class as x, or to one of its subclasses, or must be NULL. The result is an object of the same class as x.

c(x, ..., ignore.mcols=FALSE):

If the GTuples objects have metadata columns (represented as one DataFrame per object), each such DataFrame must have the same columns in order to combine successfully. In order to circumvent this restraint, you can pass in an ignore.mcols=TRUE argument which will combine all the objects into one and drop all of their metadata columns.

split(x, f, drop=FALSE):

Splits x according to f to create a GTuplesList object. If f is a list-like object then drop is ignored and f is treated as if it was rep(seq_len(length(f)), sapply(f, length)), so the returned object has the same shape as f (it also receives the names of f). Otherwise, if f is not a list-like object, empty list elements are removed from the returned object if drop is TRUE.

Subsetting

In the following code snippets, x is a GTuples object.

x[i, j], x[i, j] <- value:

Get or set elements i with optional metadata columns mcols(x)[,j], where i can be missing; an NA-free logical, numeric, or character vector; or a 'logical' Rle object.

x[i,j] <- value:

Replaces elements i and optional metadata columns j with value.

head(x, n = 6L):

If n is non-negative, returns the first n elements of the GTuples object. If n is negative, returns all but the last abs(n) elements of the GTuples object.

rep(x, times, length.out, each):

Repeats the values in x through one of the following conventions:

times

Vector giving the number of times to repeat each element if of length length(x), or to repeat the whole vector if of length 1.

length.out

Non-negative integer. The desired length of the output vector.

each

Non-negative integer. Each element of x is repeated each times.

subset(x, subset):

Returns a new object of the same class as x made of the subset using logical vector subset, where missing values are taken as FALSE.

tail(x, n = 6L):

If n is non-negative, returns the last n elements of the GTuples object. If n is negative, returns all but the first abs(n) elements of the GTuples object.

window(x, start = NA, end = NA, width = NA, frequency = NULL, delta = NULL, ...):

Extracts the subsequence window from the GTuples object using:

start, end, width

The start, end, or width of the window. Two of the three are required.

frequency, delta

Optional arguments that specify the sampling frequency and increment within the window.

In general, this is more efficient than using "[" operator.

window(x, start = NA, end = NA, width = NA, keepLength = TRUE) <- value:

Replaces the subsequence window specified on the left (i.e. the subsequence in x specified by start, end and width) by value. value must either be of class class(x), belong to a subclass of class(x), be coercible to class(x), or be NULL. If keepLength is TRUE, the elements of value are repeated to create a GTuples object with the same number of elements as the width of the subsequence window it is replacing. If keepLength is FALSE, this replacement method can modify the length of x, depending on how the length of the left subsequence window compares to the length of value.

x$name, x$name <- value:

Shortcuts for mcols(x)$name and mcols(x)$name <- value, respectively. Provided as a convenience, from GRanges as the result of strong popular demand. Note that those methods are not consistent with the other $ and $<- methods in the IRanges/ GenomicRanges infrastructure, and might confuse some users by making them believe that a GRanges object can be manipulated as a data.frame-like object. Therefore we recommend using them only interactively, and we discourage their use in scripts or packages. For the latter, use mcols(x)$name and mcols(x)$name <- value, instead of x$name and x$name <- value, respectively.

Other methods

show(x):

By default the show method displays 5 head and 5 tail elements. This can be changed by setting the global options showHeadLines and showTailLines. If the object length is less than (or equal to) the sum of these 2 options plus 1, then the full object is displayed. Note that these options also affect the display of GRanges objects (defined in the GenomicRanges package), GAlignments and GAlignmentPairs objects (defined in the GenomicAlignments package), as well as other objects defined in the IRanges and Biostrings packages (e.g. IRanges and DNAStringSet objects).

Author(s)

Peter Hickey

See Also

GTuplesList-class, seqinfo, Vector, Rle, DataFrame, GRanges, intra-tuple-methods, findOverlaps-methods, nearest-methods,

Examples

  ## Create example 4-tuples
  seqinfo <- Seqinfo(paste0("chr", 1:3), c(1000, 2000, 1500), NA, "mock1")
  gt4 <- GTuples(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"),
                                c(1, 3, 2, 4)),
                 tuples = matrix(c(1:10, 2:11, 3:12, 4:13), ncol = 4),
                 strand = Rle(strand(c("-", "+", "*", "+", "-")),
                              c(1, 2, 2, 3, 2)),
                 score = 1:10, GC = seq(1, 0, length = 10), seqinfo = seqinfo)
  gt4
  
  ## Summarizing elements
  table(seqnames(gt4))
  sum(width(gt4))
  summary(mcols(gt4)[,"score"])
  
  ## Renaming the underlying sequences
  seqlevels(gt4)
  seqlevels(gt4) <- sub("chr", "Chrom", seqlevels(gt4))
  gt4
  seqlevels(gt4) <- sub("Chrom", "chr", seqlevels(gt4)) # revert
  
  ## Combining objects
  gt4_a <- GTuples(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"),
                                c(1, 3, 2, 4)),
                 tuples = matrix(c(1:10, 21:30, 31:40, 41:50), ncol = 4),
                 strand = Rle(strand(c("-", "+", "*", "+", "-")),
                              c(1, 2, 2, 3, 2)),
                 score = 1:10, seqinfo = seqinfo)
                 
  gt4_b <- GTuples(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"),
                                c(1, 3, 2, 4)),
                 tuples = matrix(c(101:110, 121:130, 131:140, 141:150), 
                                 ncol = 4),
                 strand = Rle(strand(c("-", "+", "*", "+", "-")),
                              c(1, 2, 2, 3, 2)),
                 score = 1:10, seqinfo = seqinfo)
                 
  some_gt4 <- c(gt4_a, gt4_b)

  ## all_gt4 <- c(gt4, gt4_a, gt4_b) ## (This would fail)
  all_gt4 <- c(gt4, gt4_a, gt4_b, ignore.mcols=TRUE)
  
  ## The number of lines displayed in the 'show' method
  ## are controlled with two global options.
  options("showHeadLines" = 7)
  options("showTailLines" = 2)
  all_gt4
  
  ## Revert to default values
  options("showHeadLines"=NULL)
  options("showTailLines"=NULL)
  
  ## Get the size of the tuples stored in the GTuples object
  size(gt4)
  
  ## Get the tuples
  tuples(gt4)
  
  ## Get the matrix of intra-pair distances (IPD)
  IPD(all_gt4)  
  
  ## Can't combine genomic tuples of different sizes
  gt1 <- GTuples('chr1', matrix(30:40))
  gt1
  ## Not run: 
  ## Returns error
  c(gt4, gt1)
  
## End(Not run)

PeteHaitch/GenomicTuples documentation built on Nov. 5, 2024, 2 p.m.