with new updates in S4Vectors, 1. define extractCOLS for SQLDataFrame, and apply to [ method. -- done! 2. add replaceCOLS -- done! 3. add mergeROWS -- added! - [<-,DataFrame => mergeROWS => .subassign_columns 4. replaceROWS? 3. replaceROWS (lower-level), mergeROWS are newly added and they call .subassign_columns (newly added). mergeROWS calls .make_rownames which was re-written by adding nsbs argument. .make_rownames <- function(x, i, nsbs, value). .make_rownames calls replaceROWS. See R/subsetting-utils.R for recycleSingleBracketReplacementValue() function. See R/DataFrame-class.R for normalizeSingleBracketReplacementValue() function. See R/Vector-class.R for implementation examples. 3. debug, update [<- method and fix test errors. setReplaceMethod("[", "DataFrame") calls normalizeSingleBracketReplacementValue & replaceCOLS, recycleSingleBracketReplacementValue() & .add_missing_columns & mergeROWS. 4. R/subsetting-utils.R: - setMethod("complement", "NSBS", function(x) is newly added generic and method for NSBS. - setGeneric("normalizeSingleBracketReplacementValue", signature="x", function(value, x) - recycleSingleBracketReplacementValue<-function(value, x, i) newly added. - 5 internal generics to ease implementation of [ and [<- subsetting for Vector and DataFrame subclasses: extractROWS(), replaceROWS(), mergeROWS(), extractCOLS(), replaceCOLS(). - mergeROWS() is a composition of replaceROWS() and bindROWS() to support appending in [<-(). Vector subclasses never need to implement mergeROWS(), but a custom method may be useful for e.g. optimization.

setGeneric("mergeROWS", signature=c("x", "i"),
    function(x, i, value) standardGeneric("mergeROWS")
)
setGeneric("extractCOLS", signature=c("x", "i"),
    function(x, i) standardGeneric("extractCOLS")
)

setGeneric("replaceCOLS", signature=c("x", "i"),
    function(x, i, value) standardGeneric("replaceCOLS")
)
default_replaceROWS <- function(x, i, value)
{
    mergeROWS(x, i, value)
}
default_mergeROWS <- function(x, i, value)
setMethod("replaceROWS", c("ANY", "ANY"), default_replaceROWS)
setMethod("mergeROWS", c("ANY", "ANY"), default_mergeROWS)

git log commits before the refactoring of [, [<-, ...

before submission (09/07/2018)

  1. do extra check:
setMethod("cbind", "DelayedDataFrame", function(..., deparse.level = 1)
 if (!any(isDDF))
     callNextMethod()
 if (!is(objects[[1]], "DelayedDataFrame"))
     stop(paste('there must be at least one "DelayedDataFrame"',
                'object to have "cbind,DelayedDataFrame" work.')

5/11/2018, review code with Herve & Daniel:

DelayedDataFrame code polish.

-- todo: debug "test_DelayedDataFrame". constructor. "concatenateObjects,DDF"... "cbind,DataFrame" calls "DataFrame()" constructor. maybe we should set "cbind,DDF" to call "DelayedDataFrame()" constructor, where if all inputs are DDF, they will concatenate the existing lazyIndex's and concatenate the listData. -- done! debug "test_DelayedDataFrame", "[[<-".

-- meeting with Martin/Herve for the "cbind" performance in "DDF". Now "rbind" realizes the lazyIndex. how about "cbind". should we call "DDF()" constructor, where check inputs if all DDF, then "concatenateObject,LazyInde"? or call "cbind,DF" and reconstruct a DDF?
for "c" calls "do.call(cbind, list(objects))", calls "DataFrame()" constructor. Am doing DelayedDataFrame() constructor calls cbind (if inputs are all DDF) / DataFrame() constructor for general inputs. cbind, DDF combines the lazyIndex and do not incur realization. So it would be "c,DDF">>"cbind,DDF"(no realization). Now cbind,DDF works as intended, need to make the DDF(DDF inputs) work. Environment difference: https://devteam-bioc.slack.com/archives/G7KH1498U/p1528903667000847 (need to import cbind generic from BiocGenerics!! ) - If "cbind,DelayedDataFrame" realize and construct a new DDF, there would be no problem with the test functions for "constructor" and "[[<-" - cbind(DDF, DF) returns what? cbind(DDF) calls DelayedDataFrame() constructor. rename the current cbind,DDF as something else? cbind,DDF calls DDF() and cbind,DF calls DF(), returns warning message:

In methods:::.selectDotsMethod(classes, .MTable, .AllMTable) :
  multiple direct matches: "DelayedDataFrame", "DataFrame"; using the first of these

summary of meeting:

  1. if rbind,DDF callNextMethod(), then it will incur realization automatically. Is it something we should expect DDF to do?
  2. cbind,DDF incur realization? use case: adding new columns (realize) / cbind(DDF,DDF).
  3. cbind(DDF,DF) returns what? need definition separately? because cbind functions are defined separately on DDF and DF.
  4. What methods to export? -- export all "setMethod()", "setReplaceMethod()". grep for "setAs()" to export. constructor to export. getters? NO. setters? NO. back-end function "lazyIndex<-" not to export to users. :::, str... explain the DDF working flow in manual, vignette, with examples.

as.list, names, extractROWS, coerce, [... OR as long as DDF is been depended by other package, there is no need to export methods? and they will be available automatically? What's the difference between export methods and export functions??

-- todo: 1. rename "c,LazyIndex" into "cbind,LazyIndex". "bindROWS,LazyIndex" -- done! 2. DataFrame: c -- cbind -- DataFrame() DDF: c -- cbind -- .cbind_DDF -- done! DDF() -- cbind -- done! 3. cbind, coercion all objects into DDF. Do not need to define coercion of DDF to DDF with no-op because as() does not do the correct thing. still need is(x,"DDF") to check if DDF. -- done! 4. Question: cbind(DDF, DF) would incur both "cbind,DDF" and "cbind,DF". How to distinguish? just to make sure the first argument to be of the class we desire the "cbind" method to work. call "DelayedDataFrame:::.cbind_DelayedDataFrame" inside "cbind,DataFrame" after checking if any argument is of DDF derivative. And "Depends/imports" DelayedDataFrame in DESCRIPTION.

  1. checkings?? cbind to make sure all DDF objects to have same nrows. otherwise, iterate atomic ...

S3 & S4: S3: function: class could be arbitrary; method could be defined directly without generic, even when class "x" does not exist. func.X <- function(x, ...); you can use S3 method on S4 class. S4: setClass(), setGeneric("method"), setMethod("method", signature="X")

think of SimpleList as rectangular!

bugs in pkg: methods

package attributes stripped with [[, and as.list()

further implementation in S4Vectors

acbind/arbind

future new features to implement

todo:

  1. understand the "bindROWS" functions in S4Vectors.
  2. implement "bindROWS" for "DelayedDataFrame".
  3. implement "bindROWS" for "LazyIndex".
  4. modify the old "concatenateObjects, LazyIndex" into "bindCOLS, LazyIndex".


Bioconductor/DelayedDataFrame documentation built on Nov. 2, 2024, 7:21 a.m.