Vector-comparison: Compare, order, tabulate vector-like objects

Vector-comparisonR Documentation

Compare, order, tabulate vector-like objects

Description

Generic functions and methods for comparing, ordering, and tabulating vector-like objects.

Usage

## Element-wise (aka "parallel") comparison of 2 Vector objects
## ------------------------------------------------------------

pcompare(x, y)

## S4 method for signature 'Vector,Vector'
e1 == e2
## S4 method for signature 'Vector,ANY'
e1 == e2
## S4 method for signature 'ANY,Vector'
e1 == e2

## S4 method for signature 'Vector,Vector'
e1 <= e2
## S4 method for signature 'Vector,ANY'
e1 <= e2
## S4 method for signature 'ANY,Vector'
e1 <= e2

## S4 method for signature 'Vector,Vector'
e1 != e2
## S4 method for signature 'Vector,ANY'
e1 != e2
## S4 method for signature 'ANY,Vector'
e1 != e2

## S4 method for signature 'Vector,Vector'
e1 >= e2
## S4 method for signature 'Vector,ANY'
e1 >= e2
## S4 method for signature 'ANY,Vector'
e1 >= e2

## S4 method for signature 'Vector,Vector'
e1 < e2
## S4 method for signature 'Vector,ANY'
e1 < e2
## S4 method for signature 'ANY,Vector'
e1 < e2

## S4 method for signature 'Vector,Vector'
e1 > e2
## S4 method for signature 'Vector,ANY'
e1 > e2
## S4 method for signature 'ANY,Vector'
e1 > e2

## sameAsPreviousROW()
## -------------------

sameAsPreviousROW(x)

## match()
## -------

## S4 method for signature 'Vector,Vector'
match(x, table, nomatch = NA_integer_,
    incomparables = NULL, ...)

## selfmatch()
## -----------

selfmatch(x, ...)

## duplicated() & unique()
## -----------------------

## S4 method for signature 'Vector'
duplicated(x, incomparables=FALSE, ...)

## S4 method for signature 'Vector'
unique(x, incomparables=FALSE, ...)

## %in%
## ----

## S4 method for signature 'Vector,Vector'
x %in% table
## S4 method for signature 'Vector,ANY'
x %in% table
## S4 method for signature 'ANY,Vector'
x %in% table

## findMatches() & countMatches()
## ------------------------------

findMatches(x, table, select=c("all", "first", "last"), ...)
countMatches(x, table, ...)

## sort()
## ------

## S4 method for signature 'Vector'
sort(x, decreasing=FALSE, na.last=NA, by)

## rank()
## ------

## S4 method for signature 'Vector'
rank(x, na.last = TRUE, ties.method = c("average",
        "first", "last", "random", "max", "min"), by)

## xtfrm()
## -------

## S4 method for signature 'Vector'
xtfrm(x)

## table()
## -------

## S4 method for signature 'Vector'
table(...)

Arguments

x, y, e1, e2, table

Vector-like objects.

nomatch

See ?base::match.

incomparables

The duplicated method for Vector objects does NOT support this argument.

The unique method for Vector objects, which is implemented on top of duplicated, propagates this argument to its call to duplicated.

See ?base::duplicated and ?base::unique for more information about this argument for these generics.

The match method for Vector objects does support this argument, see ?base::match for details.

select

Only select="all" is supported at the moment. Note that you can use match if you want to do select="first". Otherwise you're welcome to request this on the Bioconductor mailing list.

ties.method

See ?base::rank.

decreasing, na.last

See ?base::sort.

by

A formula referencing the metadata columns by which to sort, e.g., ~ x + y sorts by column “x”, breaking ties with column “y”.

...

A Vector object for table (the table method for Vector objects can only take one input object).

Otherwise, extra arguments supported by specific methods. In particular:

  • The default selfmatch method, which is implemented on top of match, propagates the extra arguments to its call to match.

  • The duplicated method for Vector objects, which is implemented on top of selfmatch, accepts extra argument fromLast and propagates the other extra arguments to its call to selfmatch. See ?base::duplicated for more information about this argument.

  • The unique method for Vector objects, which is implemented on top of duplicated, propagates the extra arguments to its call to duplicated.

  • The default findMatches and countMatches methods, which are implemented on top of match and selfmatch, propagate the extra arguments to their calls to match and selfmatch.

  • The sort method for Vector objects, which is implemented on top of order, only accepts extra argument na.last and propagates it to its call to order.

Details

Doing pcompare(x, y) on 2 vector-like objects x and y of length 1 must return an integer less than, equal to, or greater than zero if the single element in x is considered to be respectively less than, equal to, or greater than the single element in y. If x or y have a length != 1, then they are typically expected to have the same length so pcompare(x, y) can operate element-wise, that is, in that case it returns an integer vector of the same length as x and y where the i-th element is the result of compairing x[i] and y[i]. If x and y don't have the same length and are not zero-length vectors, then the shortest is first recycled to the length of the longest. If one of them is a zero-length vector then pcompare(x, y) returns a zero-length integer vector.

selfmatch(x, ...) is equivalent to match(x, x, ...). This is actually how the default ANY method is implemented. However note that the default selfmatch(x, ...) for Vector x will typically be more efficient than match(x, x, ...), and can be made even more so if a specific selfmatch method is implemented for a given subclass.

findMatches is an enhanced version of match which, by default (i.e. if select="all"), returns all the matches in a Hits object.

countMatches returns an integer vector of the length of x containing the number of matches in table for each element in x.

Value

For pcompare: see Details section above.

For sameAsPreviousROW: a logical vector of length equal to x, indicating whether each entry of x is equal to the previous entry. The first entry is always FALSE for a non-zero-length x.

For match and selfmatch: an integer vector of the same length as x.

For duplicated, unique, and %in%: see ?BiocGenerics::duplicated, ?BiocGenerics::unique, and ?`%in%`.

For findMatches: a Hits object by default (i.e. if select="all").

For countMatches: an integer vector of the length of x containing the number of matches in table for each element in x.

For sort: see ?BiocGenerics::sort.

For xtfrm: see ?base::xtfrm.

For table: a 1D array of integer values promoted to the "table" class. See ?BiocGeneric::table for more information.

Note

The following notes are for developers who want to implement comparing, ordering, and tabulating methods for their own Vector subclass.

Subclass comparison methods can be split into various categories. The first category must be implemented for each subclass, as these do not have sensible defaults for arbitrary Vector objects:

  • The S4Vectors package provides no order method for Vector objects. So calling order on a Vector derivative for which no specific order method is defined will use base::order, which calls xtfrm, with in turn calls order, which calls xtfrm, and so on. This infinite recursion of S4 dispatch eventually results in an error about reaching the stack limit.

    To avoid this behavior, a specialized order method needs to be implemented for specific Vector subclasses (e.g. for Hits and IntegerRanges objects).

  • sameAsPreviousROW is default implemented on top of the == method, so will work out-of-the-box on Vector objects for which == works as expected. However, == is default implemented on top of pcompare, which itself has a default implementation that relies on sameAsPreviousROW! This again leads to infinite recursion and an error about the stack limit.

    To avoid this behavior, a specialized sameAsPreviousROW method must be implemented for specific Vector subclasses.

The second category contains methods that have default implementations provided for all Vector objects and their derivatives. These methods rely on the first category to provide sensible default behaviour without further work from the developer. However, it is often the case that greater efficiency can be achieved for a specific data structure by writing a subclass-specific version of these methods.

  • The pcompare method for Vector objects is implemented on top of order and sameAsPreviousROW, and so will work out-of-the-box on Vector derivatives for which order and sameAsPreviousROW work as expected.

  • The xtfrm method for Vector objects is also implemented on top of order and sameAsPreviousROW, and so will also work out-of-the-box on Vector derivatives for which order and sameAsPreviousROW work as expected.

  • selfmatch is itself implemented on top of xtfrm (indirectly, via grouping) so it will work out-of-the-box on Vector objects for which xtfrm works as expected.

  • The match method for Vector objects is implemented on top of selfmatch, so works out-of-the-box on Vector objects for which selfmatch works as expected.

(A careful reader may notice that xtfrm and order could be swapped between categories to achieve the same effect. Similarly, sameAsPreviousROW and pcompare could also be swapped. The exact categorization of these methods is left to the discretion of the developer, though this is mostly academic if both choices are specialized.)

The third category also contains methods that have default implementations, but unlike the second category, these defaults are straightforward and generally do not require any specialization for efficiency purposes.

  • The 6 traditional binary comparison operators are: ==, !=, <=, >=, <, and >. The S4Vectors package provides the following methods for these operators:

    setMethod("==", c("Vector", "Vector"),
        function(e1, e2) { pcompare(e1, e2) == 0L }
    )
    setMethod("<=", c("Vector", "Vector"),
        function(e1, e2) { pcompare(e1, e2) <= 0L }
    )
    setMethod("!=", c("Vector", "Vector"),
        function(e1, e2) { !(e1 == e2) }
    )
    setMethod(">=", c("Vector", "Vector"),
        function(e1, e2) { e2 <= e1 }
    )
    setMethod("<", c("Vector", "Vector"),
        function(e1, e2) { !(e2 <= e1) }
    )
    setMethod(">", c("Vector", "Vector"),
        function(e1, e2) { !(e1 <= e2) }
    )
              

    With these definitions, the 6 binary operators work out-of-the-box on Vector objects for which pcompare works the expected way. If pcompare is not implemented, then it's enough to implement == and <= methods to have the 4 remaining operators (!=, >=, <, and >) work out-of-the-box.

  • The duplicated, unique, and %in% methods for Vector objects are implemented on top of selfmatch, duplicated, and match, respectively, so they work out-of-the-box on Vector objects for which selfmatch, duplicated, and match work the expected way.

  • Also the default findMatches and countMatches methods are implemented on top of match and selfmatch so they work out-of-the-box on Vector objects for which those things work the expected way.

  • The sort method for Vector objects is implemented on top of order, so it works out-of-the-box on Vector objects for which order works the expected way.

  • The table method for Vector objects is implemented on top of selfmatch, order, and as.character, so it works out-of-the-box on a Vector object for which those things work the expected way.

Author(s)

Hervé Pagès, with contributions from Aaron Lun

See Also

  • The Vector class.

  • Hits-comparison for comparing and ordering hits.

  • Vector-setops for set operations on vector-like objects.

  • Vector-merge for merging vector-like objects.

  • IntegerRanges-comparison in the IRanges package for comparing and ordering ranges.

  • == and %in% in the base package, and BiocGenerics::match, BiocGenerics::duplicated, BiocGenerics::unique, BiocGenerics::order, BiocGenerics::sort, BiocGenerics::rank in the BiocGenerics package for general information about the comparison/ordering operators and functions.

  • The Hits class.

  • BiocGeneric::table in the BiocGenerics package.

Examples

## ---------------------------------------------------------------------
## A. SIMPLE EXAMPLES
## ---------------------------------------------------------------------

y <- c(16L, -3L, -2L, 15L, 15L, 0L, 8L, 15L, -2L)
selfmatch(y)

x <- c(unique(y), 999L)
findMatches(x, y)
countMatches(x, y)

## See ?`IntegerRanges-comparison` for more examples (on IntegerRanges
## objects). You might need to load the IRanges package first.

## ---------------------------------------------------------------------
## B. FOR DEVELOPERS: HOW TO IMPLEMENT THE BINARY COMPARISON OPERATORS
##    FOR YOUR Vector SUBCLASS
## ---------------------------------------------------------------------

## The answer is: don't implement them. Just implement pcompare() and the
## binary comparison operators will work out-of-the-box. Here is an
## example:

## (1) Implement a simple Vector subclass.

setClass("Raw", contains="Vector", representation(data="raw"))

setMethod("length", "Raw", function(x) length(x@data))

setMethod("[", "Raw",
    function(x, i, j, ..., drop) { x@data <- x@data[i]; x }
)

x <- new("Raw", data=charToRaw("AB.x0a-BAA+C"))
stopifnot(identical(length(x), 12L))
stopifnot(identical(x[7:3], new("Raw", data=charToRaw("-a0x."))))

## (2) Implement a "pcompare" method for Raw objects.

setMethod("pcompare", c("Raw", "Raw"),
    function(x, y) {as.integer(x@data) - as.integer(y@data)}
)

stopifnot(identical(which(x == x[1]), c(1L, 9L, 10L)))
stopifnot(identical(x[x < x[5]], new("Raw", data=charToRaw(".-+"))))

Bioconductor/S4Vectors documentation built on Nov. 2, 2024, 4:34 p.m.