R/Bimap.R

Defines functions createSimpleBimap makeFlatBimapUsingSelect .hasSingleProbes .hasMultiProbes .toggleFilter as.data.frame.Bimap .is.submap .Bimap.summary .key.summary mappedLkeysIsNotAvailable .GetLKeyLength .checkKeys .checkKeysAreWellFormed .GetUniqueLKeys .inslot.Rkeys .inslot.Lkeys .normalize.direction

Documented in as.data.frame.Bimap createSimpleBimap

### =========================================================================
### The bimap concept and the Bimap interface
### -----------------------------------------


### The bimap concept
### -----------------
###
### A bimap is made of:
###
###   - 2 sets of objects: the left objects and the right objects. All the
###     objects have a name and this name is unique in each set (i.e. in the
###     left set and in the right set). The names of the left (resp. right)
###     objects are called the left (resp. right) keys or the Lkeys (resp. the
###     Rkeys).
###     
###   - Any number of links (edges) between the left and right objects. Note
###     that the links can be tagged. In our model, for a given bimap, either
###     none or all the links are tagged.
###
### In other words, a bimap is a bipartite graph.
###
### Here are some examples:
###
###   1. bimap B1:
###
###      4 left objects (Lkeys): "a", "b", "c", "d"
###      3 objects on the right (Rkeys): "A", "B", "C"
###
###      Links (edges):
###        "a" <--> "A"
###        "a" <--> "B"
###        "b" <--> "A"
###        "d" <--> "C"
###
###      Note that:
###        - There can be any number of links starting from or ending at a
###          given object.
###        - The links in this example are untagged.
###
###   2. bimap B2:
###
###      4 left objects (Lkeys): "a", "b", "c", "d"
###      3 objects on the right (Rkeys): "A", "B", "C"
###
###      Tagged links (edges):
###        "a" <-"x"-> "A"
###        "a" <-"y"-> "B"
###        "b" <-"x"-> "A"
###        "d" <-"x"-> "C"
###        "d" <-"y"-> "C"
###
###      Note that there are 2 links between objects "d" and "C":
###      1 with tag "x" and 1 with tag "y".
###
###
### Flat representation of a bimap
### ------------------------------
###
### The flat representation of a bimap is a data frame. For example,
### for B1, it is:
###
###   left  right
###      a      A 
###      a      B
###      b      A
###      d      C
###
### If in addition the right objects have 1 multi-valued attribute, for
### example, a numeric vector:
###
###   A <-- c(1.2, 0.9)
###   B <-- character(0)
###   C <-- -1:1
###
### then the flat representation of B1 becomes:
###
###   left  right  Rattrib1
###      a      A       1.2
###      a      A       0.9
###      a      B        NA
###      b      A       1.2
###      b      A       0.9
###      d      C        -1
###      d      C         0
###      d      C         1
###
### Note that now the number of rows is greater than the number of links!
###
###
### The Bimap interface in AnnotationDbi
### ------------------------------------
###
### AnnDbBimap and FlatBimap objects:
###
###    A AnnDbBimap object is a bimap whose data are stored in a data base.
###    A FlatBimap object is a bimap whose data (left keys, right keys and
###    links) are stored in memory (in a data frame for the links).
###    Conceptually, AnnDbBimap and FlatBimap objects are the same (only
###    their internal representation differ) so it's natural to try to define
###    a set of methods that make sense for both (so they can be manipulated
###    in a similar way). This common interface is the Bimap interface.
###
### The "flatten" generic:
###
###      flatten(x) converts AnnDbBimap object x into FlatBimap object y
###      with no loss of information
###
###    Note that a FlatBimap object can't be converted into an AnnDbBimap
###    object (well, in theory maybe it could be, but for now the data bases
###    we use to store the data of the AnnDbBimap objects are treated as
###    read-only). This conversion from AnnDbBimap to FlatBimap is performed
###    by the "flatten" generic function (with methods for AnnDbBimap objects
###    only). 
###
### Property0:
###
###    The "flatten" generic plays a very useful role when we need to
###    understand or explain exactly what a given Bimap method f will do when
###    applied to an AnnDbBimap object. It's generally easier to explain what
###    it does on a FlatBimap object and then to just say "and it does the
###    same thing on a AnnDbBimap object". This is exactly what Property0
###    says:
###
###      for any AnnDbBimap object x, f(x) is expected to be indentical
###      to f(flatten(x))
###
###    Of course, this implies that the f method for AnnDbBimap objects
###    return the same type of object than the f method for FlatBimap objects.
###    In this sense, the "revmap" and "subset" Bimap methods are particular
###    because they are expected to return an object of the same class as
###    their argument x, so f(x) can't be identical to f(flatten(x)). For
###    these methods, Property0 says:
###
###      for any AnnDbBimap object x, flatten(f(x)) is expected to
###      be identical to f(flatten(x))
###
###    The checkProperty0() function (AnnDbPkg-checker.R file) checks that
###    Property0 is satisfied on all the AnnDbBimap objects defined in a given
###    package (FIXME: checkProperty0 is currently broken).
###
### Finally, note that both AnnDbBimap and FlatBimap objects have a read-only
### semantic: the user can subset them but cannot change their data.
###
### -------------------------------------------------------------------------


### KEEP THIS IN SYNC WITH THE STATE OF AFFAIRS! Only methods of the first and
### second group go here.
Bimap_methods <- c(
    ## GROUP 1: 15 methods that _must_ be defined for FlatBimap objects
    ## _and_ AnnDbBimap objects
    "colnames",
    "colmetanames",
    "Rattribnames<-",
    "direction",
    "direction<-",
    "Lkeys", "Rkeys",
    "Lkeys<-", "Rkeys<-",
    "subset",
    "mappedLkeys", "mappedRkeys",
    "nrow",
    "links",
    "toLList", "toRList",
    ## GROUP 2: Methods for which a default is provided (in this file) but
    ## some of them are redefined for AnnDbBimap objects to obtain better
    ## performance
    "Lkeyname", "Rkeyname", "tagname",
    "Rattribnames",
    "revmap",
    "Llength", "Rlength",
    "count.mappedLkeys", "count.mappedRkeys",
    "count.links",
    ## GROUP 3: Directed methods (i.e. what they return depends on the
    ## direction of the map). All what they do is to dispatch on the
    ## corresponding undirected method according to the value of direction(x)
    "keyname",
    "keys",
    "length",
    "mappedkeys",
    "count.mappedkeys",
    "toList",
    "isNA"
)


### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "direction" and "direction<-" methods.
###

.DIRECTION_STR2INT <- c("L --> R"=1L, "L <-- R"=-1L, "undirected"=0L)

.normalize.direction <- function(direction)
{
    if ((!is.numeric(direction) && !is.character(direction))
     || length(direction) != 1 || is.na(direction))
        stop("'direction' must be a single (non-NA) integer or string")
    if (is.character(direction)) {
        direction <- match.arg(tolower(direction), names(.DIRECTION_STR2INT))
        return(do.call(switch, c(EXPR=direction, as.list(.DIRECTION_STR2INT))))
    }
    if (!(direction %in% .DIRECTION_STR2INT))
        stop("when a numeric value, 'direction' should be one of 1, -1 or 0")
    as.integer(direction)
}

setMethod("direction", "FlatBimap",
    function(x) x@direction)

setMethod("direction", "AnnDbBimap",
    function(x) x@direction)



setReplaceMethod("direction", "FlatBimap",
    function(x, value)
    {
        x@direction <- .normalize.direction(value)
        x
    }
)

setReplaceMethod("direction", "AnnDbBimap",
    function(x, value)
    {
        direction <- .normalize.direction(value)
        if (direction == 0)
            stop("undirected AnnDbBimap objects are not supported")
        if (direction != x@direction) {
            x@objName <- paste0("revmap(", x@objName, ")")
            x@direction <- direction
        }
        x
    }
)


setReplaceMethod("direction", "AnnDbMap",
    function(x, value)
    {
        stop("changing the direction of an \"", class(x), "\" object is not supported")
    }
)


### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "revmap" methods.
###
### Conceptual definition:
###     revmap(x) is the reverse of map x i.e. the map that provides the
###     reverse lookup (or mapping)
###
### I would have liked to use "reverse" instead of "revmap" but "reverse" is
### already defined as a generic in Biostrings and it seems that the second
### of the 2 packages to be loaded breaks the generic and attached methods
### defined in the first. Don't know how to deal with this situation :-/
###
### The "rev" generic defined in package:base can't be used neither because
### it doesn't allow passing additional arguments to or from methods (i.e. it
### has no '...' arg) and we want to be able to pass the 'objName' arg.
### Other generics defined in package:base where having a '...' arg could be
### useful: "unlist", "t" and "scale" (just in case someone feels brave enough
### to request this on R-devel).
###
### Note that "revmap" for "AnnDbBimap" objects does _not_ query the database!
###

setMethod("revmap", "Bimap",
    function(x) { direction(x) <- - direction(x); x }
)

setMethod("revmap", "AnnDbBimap",
    function(x, objName=NULL)
    {
        x <- callNextMethod(x) # calls "revmap" method for "Bimap" objects
        if (!is.null(objName))
            x@objName <- toString(objName)
        x
    }
)

setMethod("revmap", "environment",
    function(x) list2env(reverseSplit(as.list(x)))
)

setMethod("revmap", "list",
    function(x) reverseSplit(x)
)




### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "colnames" and "colmetanames" methods.
###

setMethod("colnames", "FlatBimap",
    function(x, do.NULL=TRUE, prefix="col")
        colnames(x@data)
)

setMethod("colnames", "AnnDbBimap",
    function(x, do.NULL=TRUE, prefix="col")
        L2Rchain.colnames(x@L2Rchain)
)

setMethod("colmetanames", "FlatBimap",
    function(x)
        x@colmetanames
)

setMethod("colmetanames", "AnnDbBimap",
    function(x)
        L2Rchain.colmetanames(x@L2Rchain)
)


### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "Lkeyname", "Rkeyname" and "keyname" methods.
###

setMethod("Lkeyname", "Bimap",
    function(x)
    {
        colnames <- colnames(x)
        names(colnames) <- colmetanames(x)
        colnames["Lkeyname"]
    }
)

setMethod("Lkeyname", "AnnDbBimap",
    function(x) L2Rchain.Lkeyname(x@L2Rchain))

setMethod("Rkeyname", "Bimap",
    function(x)
    {
        colnames <- colnames(x)
        names(colnames) <- colmetanames(x)
        colnames["Rkeyname"]
    }
)

setMethod("Rkeyname", "AnnDbBimap",
    function(x) L2Rchain.Rkeyname(x@L2Rchain))

setMethod("keyname", "Bimap",
    function(x)
        switch(as.character(direction(x)),
                "1"=Lkeyname(x),
               "-1"=Rkeyname(x),
                    stop("keyname() is undefined for an undirected bimap"))
)


### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "tagname" methods.
###

setMethod("tagname", "Bimap",
    function(x)
    {
        colnames <- colnames(x)
        names(colnames) <- colmetanames(x)
        colnames["tagname"]
    }
)

setMethod("tagname", "AnnDbBimap",
    function(x) L2Rchain.tagname(x@L2Rchain))


### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "Rattribnames" and "Rattribnames<-" methods.
###

setMethod("Rattribnames", "Bimap",
    function(x)
    {
        colnames(x)[-seq_along(colmetanames(x))]
    }
)

setMethod("Rattribnames", "AnnDbBimap",
    function(x) L2Rchain.Rattribnames(x@L2Rchain))

setReplaceMethod("Rattribnames", "FlatBimap",
    function(x, value)
    {
        colnames0 <- colnames(x@data)
        if (!is.null(value) && !is.character(value))
            stop("Rattrib names must be a character vector or NULL")
        if (!all(value %in% Rattribnames(x)))
            stop("invalid Rattrib names")
        if (any(duplicated(value)))
            stop("can't assign duplicated Rattrib names")
        ii <- c(seq_along(colmetanames(x)), match(value, colnames0))
        x@data <- x@data[ii]
        if (length(ii) < length(colnames0))
            x@data <- unique(x@data)
        ## Needed because subsetting a data frame can change the names
        ## of its cols (for the duplicated names)
        colnames(x@data) <- colnames0[ii]
        x
    }
)

setReplaceMethod("Rattribnames", "AnnDbBimap",
    function(x, value)
    {
        Rattribnames0 <- Rattribnames(x)
        L2Rchain.Rattribnames(x@L2Rchain) <- value
        if (length(Rattribnames(x)) < length(Rattribnames0))
            x <- as(x, Class="AnnDbBimap", strict=TRUE)
        x
    }
)

setReplaceMethod("Rattribnames", "Go3AnnDbBimap",
    function(x, value)
    {
        stop("can't modify the Rattrib names of a ", class(x), " object")
    }
)


### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "Lkeys", "Rkeys" and "keys" methods.
###

setMethod("Lkeys", "FlatBimap",
    function(x)
    {
        if (length(x@Lkeys) == 1 && is.na(x@Lkeys))
            return(mappedLkeys(x))
        x@Lkeys
    }
)

.inslot.Lkeys <- function(x)
{
    length(x@Lkeys) != 1 || !is.na(x@Lkeys)
}

setMethod("Lkeys", "AnnDbBimap",
    function(x)
    {
        if (.inslot.Lkeys(x))
            return(x@Lkeys)
        dbUniqueVals(dbconn(x), Ltablename(x), Lkeyname(x),
                                Lfilter(x), x@datacache)
    }
)

setMethod("Rkeys", "FlatBimap",
    function(x)
    {
        if (length(x@Rkeys) == 1 && is.na(x@Rkeys))
            return(mappedRkeys(x))
        x@Rkeys
    }
)

.inslot.Rkeys <- function(x)
{
    length(x@Rkeys) != 1 || !is.na(x@Rkeys)
}

setMethod("Rkeys", "AnnDbBimap",
    function(x)
    {
        if (.inslot.Rkeys(x))
            return(x@Rkeys)
        dbUniqueVals(dbconn(x), Rtablename(x), Rkeyname(x),
                                Rfilter(x), x@datacache)
    }
)

setMethod("Rkeys", "Go3AnnDbBimap",
    function(x)
    {
        if (.inslot.Rkeys(x))
            return(x@Rkeys)
        getNames <- function(ontology)
        {
            tablename <- Rtablename(x)[ontology]
            dbUniqueVals(dbconn(x), tablename, "go_id",
                                    Rfilter(x), x@datacache)
        }
        ## Because a given go_id can only belong to 1 of the 3 ontologies...
        ## (if not, then apply unique to this result)
        c(getNames("BP"), getNames("CC"), getNames("MF"))
    }
)

setMethod("Rkeys", "AnnDbMap",
    function(x)
    {
        stop("Rkeys() is not supported for an \"", class(x), "\" object")
    }
)

setMethod("keys", "Bimap",
    function(x)
        switch(as.character(direction(x)),
                "1"=Lkeys(x),
               "-1"=Rkeys(x),
                    stop("keys() is undefined for an undirected bimap"))
)



## - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
## For the Probe mappings, I need to be able to
## just gets the Lkeys() with NO FILTERING.  

.GetUniqueLKeys = function(x){
        if (.inslot.Lkeys(x))
            return(x@Lkeys)
        dbUniqueVals(dbconn(x), Ltablename(x), Lkeyname(x),
                                "1", x@datacache)
}

setMethod("Lkeys", "ProbeAnnDbBimap",function(x){.GetUniqueLKeys(x)})
setMethod("Lkeys", "ProbeAnnDbMap",function(x){.GetUniqueLKeys(x)})
setMethod("Lkeys", "ProbeIpiAnnDbMap",function(x){.GetUniqueLKeys(x)})
setMethod("Lkeys", "ProbeGo3AnnDbBimap",function(x){.GetUniqueLKeys(x)})




### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "Lkeys<-", "Rkeys<-" and "keys<-" replacement methods.
###

.checkKeysAreWellFormed <- function(keys)
    if (!is.null(keys) && (!is.character(keys) || any(is.na(keys))))
        stop("keys must be supplied in a character vector with no NAs")

.checkKeys <- function(keys, valid.keys, ifnotfound)
{
    .checkKeysAreWellFormed(keys)
    if (length(ifnotfound) == 0) {
        not_found <- which(!(keys %in% valid.keys))
        if (length(not_found) != 0)
            stop("value for \"", keys[not_found[1]], "\" not found")
    }
}

setReplaceMethod("Lkeys", "FlatBimap",
    function(x, value)
    {
        if (!is.null(value)) {
            .checkKeys(value, Lkeys(x), x@ifnotfound)
            if (!is.null(names(value)))
                names(value) <- NULL
            x@Lkeys <- value
            ii <- which(x@data[[1]] %in% value)
            cn <- colnames(x@data)
            x@data <- x@data[ii, ]
            colnames(x@data) <- cn
        }
        x
    }
)

setReplaceMethod("Lkeys", "AnnDbBimap",
    function(x, value)
    {
        if (!is.null(value)) {
            .checkKeys(value, Lkeys(x), x@ifnotfound)
            if (!is.null(names(value)))
                names(value) <- NULL
            x@Lkeys <- value
        }
        x
    }
)

setReplaceMethod("Rkeys", "FlatBimap",
    function(x, value)
    {
        if (!is.null(value)) {
            .checkKeys(value, Rkeys(x), x@ifnotfound)
            if (!is.null(names(value)))
                names(value) <- NULL
            x@Rkeys <- value
            ii <- which(x@data[[2]] %in% value)
            cn <- colnames(x@data)
            x@data <- x@data[ii, ]
            colnames(x@data) <- cn
        }
        x
    }
)

setReplaceMethod("Rkeys", "AnnDbBimap",
    function(x, value)
    {
        if (!is.null(value)) {
            .checkKeys(value, Rkeys(x), x@ifnotfound)
            if (!is.null(names(value)))
                names(value) <- NULL
            x@Rkeys <- value
        }
        x
    }
)

setReplaceMethod("keys", "Bimap",
    function(x, value)
    {
        switch(as.character(direction(x)),
                "1"=`Lkeys<-`(x, value),
               "-1"=`Rkeys<-`(x, value),
                    stop("keys<- is undefined for an undirected bimap"))
    }
)


### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "subset" methods.
###

setMethod("subset", "Bimap",
    function(x, Lkeys=NULL, Rkeys=NULL, drop.invalid.keys=FALSE)
    {
        if (drop.invalid.keys) {
            .checkKeysAreWellFormed(Lkeys)
            .checkKeysAreWellFormed(Rkeys)
            Lkeys <- Lkeys[Lkeys %in% Lkeys(x)]
            Rkeys <- Rkeys[Rkeys %in% Rkeys(x)]
        }
        Lkeys(x) <- Lkeys
        Rkeys(x) <- Rkeys
        x
    }
)

setMethod("subset", "AnnDbBimap",
    function(x, Lkeys=NULL, Rkeys=NULL, drop.invalid.keys=FALSE, objName=NULL)
    {
        ## Call "subset" method for "Bimap" objects
        x <- callNextMethod(x, Lkeys=Lkeys, Rkeys=Rkeys, drop.invalid.keys=drop.invalid.keys)
        if (!is.null(objName))
            x@objName <- toString(objName)
        x
    }
)


### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "[" methods.
###

### Supported 'i' types: character vector, numeric vector, logical vector,
### NULL and missing.
setMethod("[", "Bimap",
    function(x, i, j, ..., drop)
    {
        if (!missing(j) || length(list(...)) > 0)
            stop("invalid subsetting")
        if (missing(i))
            return(x)
        keys <- keys(x)
        if (is.character(i)) {
            keys(x) <- i
        } else {
            keys(x) <- keys[i]
        }
        x
    }
)


### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "Llength", "Rlength" and "length" methods.
###

setMethod("Llength", "Bimap",
    function(x) length(Lkeys(x)))

setMethod("Llength", "AnnDbBimap",
    function(x)
    {
        if (.inslot.Lkeys(x))
            return(length(x@Lkeys))
        dbCountUniqueVals(dbconn(x), Ltablename(x), Lkeyname(x),
                                     Lfilter(x), x@datacache)
    }
)

setMethod("Rlength", "Bimap",
    function(x) length(Rkeys(x)))

setMethod("Rlength", "AnnDbBimap",
    function(x)
    {
        if (.inslot.Rkeys(x))
            return(length(x@Rkeys))
        dbCountUniqueVals(dbconn(x), Rtablename(x), Rkeyname(x),
                                     Rfilter(x), x@datacache)
    }
)

setMethod("Rlength", "Go3AnnDbBimap",
    function(x)
    {
        if (.inslot.Rkeys(x))
            return(length(x@Rkeys))
        countNames <- function(ontology)
        {
            tablename <- Rtablename(x)[ontology]
            dbCountUniqueVals(dbconn(x), tablename, "go_id", Rfilter(x), x@datacache)
        }
        ## Because a given go_id can only belong to 1 of the 3 ontologies...
        countNames("BP") + countNames("CC") + countNames("MF")
    }
)

setMethod("Rlength", "AnnDbMap",
    function(x)
    {
        stop("Rlength() is not supported for an \"", class(x), "\" object")
    }
)

setMethod("length", "Bimap",
    function(x)
        switch(as.character(direction(x)),
                "1"=Llength(x),
               "-1"=Rlength(x),
                    stop("length() is undefined for an undirected bimap"))
)




## - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
## For the Probe mappings, I need to be get length
## of the Lkeys() with NO FILTERING.  

.GetLKeyLength = function(x){
        if (.inslot.Lkeys(x))
            return(length(x@Lkeys))
        dbCountUniqueVals(dbconn(x), Ltablename(x), Lkeyname(x),
                                     "1", x@datacache)
}

setMethod("Llength", "ProbeAnnDbBimap",function(x){.GetLKeyLength(x)})
setMethod("Llength", "ProbeAnnDbMap",function(x){.GetLKeyLength(x)})
setMethod("Llength", "ProbeIpiAnnDbMap",function(x){.GetLKeyLength(x)})
setMethod("Llength", "ProbeGo3AnnDbBimap",function(x){.GetLKeyLength(x)})



##library(ALL);library(hgu95av2.db);library(genefilter);data(ALL);nsFilter(ALL)



### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "isNA" methods.
###

### Like "is.na", "isNA" returns a named logical vector that associates each
### key in the map with TRUE except for those keys that are actually mapped
### to something (other than an NA).
setMethod("isNA", "Bimap",
    function(x)
    {
        mapped_keys <- mappedkeys(x)
        keys <- keys(x)
        ans <- !(keys %in% mapped_keys)
        names(ans) <- keys
        ans
    }
)

### "is.na" on environments has a silly semantic but since it is sealed then
### it can't be redefined.
setMethod("isNA", "environment",
    function(x) is.na(as.list(x, all.names=TRUE))
)

### And for ANY other vector-like object for which an "is.na"
### method is defined (e.g. an environment or a list)
setMethod("isNA", "ANY", function(x) is.na(x))


### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "mappedLkeys", "mappedRkeys" and "mappedkeys" methods.
###
### Note that for the maps in DB schemas like HUMANCHIP_DB, all the "right
### keys" are expected to be mapped to at least one "left key" hence
### mappedRkeys(x) should be the same as Rkeys(x) for those maps (maybe
### something worth checking in a test unit).
###

setMethod("mappedLkeys", "FlatBimap",
    function(x)
        unique(x@data[[match("Lkeyname", x@colmetanames)]])
)

setMethod("mappedLkeys", "AnnDbBimap",
    function(x)
    {
        dbUniqueMappedKeys(dbconn(x), x@L2Rchain, x@Lkeys, x@Rkeys,
                                      1, x@datacache)
    }
)

setMethod("mappedLkeys", "Go3AnnDbBimap",
    function(x)
    {
        getMappedKeys <- function(ontology)
        {
            tablename <- Rtablename(x)[ontology]
            L2Rchain <- makeGo3L2Rchain(x@L2Rchain, tablename, ontology)
            dbUniqueMappedKeys(dbconn(x), L2Rchain, x@Lkeys, x@Rkeys,
                                          1, x@datacache)
        }
        keys1 <- getMappedKeys("BP")
        keys2 <- getMappedKeys("CC")
        keys3 <- getMappedKeys("MF")
        unique(c(keys1, keys2, keys3))
    }
)

### For an AgiAnnDbMap object (like silly maps ACCNUM and MULTIHIT in
### ARABIDOPSISCHIP_DB), the "mappedLkeys" method for AnnDbBimap objects
### would ignore the x@replace.single and x@replace.multiple slots leading
### to a wrong result when one of those slots is NA.
### But who cares, those maps are silly anyway...
mappedLkeysIsNotAvailable <- function(x)
{
    (length(x@replace.single) == 1 && is.na(x@replace.single)) ||
      (length(x@replace.multiple) == 1 && is.na(x@replace.multiple))
}

setMethod("mappedLkeys", "AgiAnnDbMap",
    function(x)
    {
        if (mappedLkeysIsNotAvailable(x))
            stop("mappedLkeys() is not available for map ", x@objName)
        callNextMethod(x)
    }
)

setMethod("mappedRkeys", "FlatBimap",
    function(x)
        unique(x@data[[match("Rkeyname", x@colmetanames)]])
)

setMethod("mappedRkeys", "AnnDbBimap",
    function(x)
    {
        dbUniqueMappedKeys(dbconn(x), x@L2Rchain, x@Lkeys, x@Rkeys,
                                      -1, x@datacache)
    }
)

setMethod("mappedRkeys", "Go3AnnDbBimap",
    function(x)
    {
        getMappedKeys <- function(ontology)
        {
            tablename <- Rtablename(x)[ontology]
            L2Rchain <- x@L2Rchain
            L2Rchain[[length(L2Rchain)]]@tablename <- tablename
            dbUniqueMappedKeys(dbconn(x), L2Rchain, x@Lkeys, x@Rkeys,
                                          -1, x@datacache)
        }
        keys1 <- getMappedKeys("BP")
        keys2 <- getMappedKeys("CC")
        keys3 <- getMappedKeys("MF")
        ## Because a given go_id can only belong to 1 of the 3 ontologies...
        ## (if not, then apply unique to this result)
        c(keys1, keys2, keys3)
    }
)

setMethod("mappedRkeys", "AnnDbMap",
    function(x)
    {
        stop("mappedRkeys() is not supported for an \"", class(x), "\" object")
    }
)

setMethod("mappedkeys", "Bimap",
    function(x)
        switch(as.character(direction(x)),
                "1"=mappedLkeys(x),
               "-1"=mappedRkeys(x),
                    stop("mappedkeys() is undefined for an undirected bimap"))
)

setMethod("mappedkeys", "environment",
    function(x)
    {
        ## This is needed because the ! operator loses the "names" attributes
        ## when applied on a named vector of length 0
        if (length(x) == 0)
            return(character(0))
        notNA <- !isNA(x)
        names(notNA)[notNA]
    }
)

setMethod("mappedkeys", "vector",
    function(x)
    {
        if (is.null(names(x)))
            stop("mappedkeys() is not defined on an unnamed vector")
        ## This is needed because the ! operator loses the "names" attributes
        ## when applied on a named vector of length 0
        if (length(x) == 0)
            return(character(0))
        notNA <- !isNA(x)
        names(notNA)[notNA]
    }
)


### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "count.mappedLkeys", "count.mappedRkeys" and "count.mappedkeys" methods.
###

setMethod("count.mappedLkeys", "Bimap",
    function(x) length(mappedLkeys(x)))

setMethod("count.mappedLkeys", "AnnDbBimap",
    function(x)
    {
        dbCountUniqueMappedKeys(dbconn(x), x@L2Rchain, x@Lkeys, x@Rkeys,
                                           1, x@datacache)
    }
)

setMethod("count.mappedLkeys", "Go3AnnDbBimap",
    function(x) length(mappedLkeys(x))
)

setMethod("count.mappedLkeys", "AgiAnnDbMap",
    function(x)
    {
        if (mappedLkeysIsNotAvailable(x))
            stop("count.mappedLkeys() is not available for map ", x@objName)
        callNextMethod(x)
    }
)

setMethod("count.mappedRkeys", "Bimap",
    function(x) length(mappedRkeys(x)))

setMethod("count.mappedRkeys", "AnnDbBimap",
    function(x)
    {
        dbCountUniqueMappedKeys(dbconn(x), x@L2Rchain, x@Lkeys, x@Rkeys,
                                           -1, x@datacache)
    }
)

setMethod("count.mappedRkeys", "Go3AnnDbBimap",
    function(x)
    {
        countMappedNames <- function(ontology)
        {
            tablename <- Rtablename(x)[ontology]
            L2Rchain <- makeGo3L2Rchain(x@L2Rchain, tablename, ontology)
            dbCountUniqueMappedKeys(dbconn(x), L2Rchain, x@Lkeys, x@Rkeys,
                                               -1, x@datacache)
        }
        ## Because a given go_id can only belong to 1 of the 3 ontologies...
        countMappedNames("BP") + countMappedNames("CC") + countMappedNames("MF")
    }
)

setMethod("count.mappedRkeys", "AnnDbMap",
    function(x)
    {
        stop("count.mappedRkeys() is not supported for an \"", class(x), "\" object")
    }
)

setMethod("count.mappedkeys", "Bimap",
    function(x)
        switch(as.character(direction(x)),
                "1"=count.mappedLkeys(x),
               "-1"=count.mappedRkeys(x),
                    stop("count.mappedkeys() is undefined for an undirected bimap"))
)

setMethod("count.mappedkeys", "ANY", function(x) length(mappedkeys(x)))


### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "show" and "summary" methods.
###

.key.summary <- function(keys, nmapped)
{
    len0 <- length(keys)
    if (len0 == 0)
        return("")
    if (len0 > 2)
        keys <- keys[1:2]
    string <- paste0("\"", keys, "\"", collapse=", ")
    if (len0 > 2)
        string <- paste0(string, ", ...")
    paste0(string, " (total=", len0, "/mapped=", nmapped, ")")
}

.Bimap.summary <- function(x)
{
    ## Left keys
    cat("| Lkeyname: ", Lkeyname(x), sep="")
    if (is(x, "AnnDbBimap"))
        cat(" (Ltablename: ", Ltablename(x), ")", sep="")
    cat("\n")
    tmp <- mappedLkeys(x) # just to put them in the cache
    cat("|    Lkeys: ", .key.summary(Lkeys(x), count.mappedLkeys(x)), "\n", sep="")
    cat("|\n")

    if (!is(x, "AnnDbMap")) {
        ## Right keys
        cat("| Rkeyname: ", Rkeyname(x), sep="")
        if (is(x, "AnnDbBimap"))
            cat(" (Rtablename: ", Rtablename(x), ")", sep="")
        cat("\n")
        tmp <- mappedRkeys(x) # just to put them in the cache
        cat("|    Rkeys: ", .key.summary(Rkeys(x), count.mappedRkeys(x)), "\n", sep="")
        cat("|\n")
    }

    ## Tag
    if (!is.na(tagname(x)))
        cat("| tagname: ", tagname(x), "\n|\n", sep="")

    ## direction
    direction <- names(.DIRECTION_STR2INT)[.DIRECTION_STR2INT == direction(x)]
    cat("| direction: ", direction, "\n", sep="")
}

.is.submap <- function(x)
{
    .inslot.Lkeys(x) || .inslot.Rkeys(x)
}

setMethod("show", "FlatBimap",
    function(object)
    {
        cat("\"", class(object), "\" object:\n", sep="")
        cat("|\n")
        .Bimap.summary(object)
        cat("\ndata:\n")
        if (nrow(object) <= 20) {
            show(object@data)
        } else {
            show(head(object, n=10))
            cat("...\n")
            cat("(", nrow(object), " rows)\n", sep="")
        }
    }
)

setMethod("show", "AnnDbBimap",
    function(object)
    {
        map <- "map"
        if (.is.submap(object))
            map <- "submap"
        cat(object@objName, " ", map, " for ", object@objTarget,
            " (object of class \"", class(object), "\")\n", sep="")
    }
)

setMethod("summary", "Bimap", function(object) show(object))

setMethod("summary", "AnnDbBimap",
    function(object)
    {
        show(object)
        cat("|\n")
        .Bimap.summary(object)
    }
)


### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "toTable" methods.
###

setMethod("toTable", "FlatBimap",
    function(x)
    {
        x@data
    }
)

setMethod("toTable", "Bimap",
    function(x)
    {
        toTable(flatten(x, fromKeys.only=TRUE))
    }
)

as.data.frame.Bimap <- function(x, row.names=NULL, optional=FALSE, ...)
{
    if (!is.null(row.names))
        warning(wmsg("the \"as.data.frame\" method for Bimap objects ",
                     "ignores the 'row.names' argument"))
    if (!identical(optional, FALSE))
        warning(wmsg("the \"as.data.frame\" method for Bimap objects ",
                     "ignores the 'optional' argument"))
    if (length(list(...)) != 0L)
        stop(wmsg("the \"as.data.frame\" method for Bimap objects ",
                  "doesn't take any extra argument"))
    toTable(x)
}

setMethod("as.data.frame", "Bimap", as.data.frame.Bimap)



### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "head" and "tail" methods.
###
### TODO: Define these methods to AnnDbBimap objects.
###

setMethod("head", "FlatBimap",
    function(x, ...)
    {
        c <- colnames(x)
        y <- head(x@data, ...)
        if (!identical(colnames(y), c))
            colnames(y) <- c
        y
    }
)

setMethod("tail", "FlatBimap",
    function(x, ...)
    {
        c <- colnames(x)
        y <- tail(x@data, ...)
        if (!identical(colnames(y), c))
            colnames(y) <- c
        y
    }
)





### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "nrow" methods.
###

setMethod("nrow", "Bimap", function(x) nrow(toTable(x)))

setMethod("nrow", "FlatBimap",
    function(x)
        nrow(x@data)
)

### CURRENTLY BROKEN!
setMethod("nrow", "AnnDbTable",
    function(x)
    {
        dbCountRawAnnDbMapRows(dbconn(x), Ltablename(x), Lkeyname(x), NULL, NULL, x@from)
    }
)

setMethod("nrow", "AnnDbBimap",
    function(x)
    {
        dbCountRowsFromL2Rchain(dbconn(x), x@L2Rchain, x@Lkeys, x@Rkeys)
    }
)

setMethod("nrow", "Go3AnnDbBimap",
    function(x)
    {
        countRows <- function(ontology)
        {
            tablename <- Rtablename(x)[ontology]
            L2Rchain <- makeGo3L2Rchain(x@L2Rchain, tablename, ontology)
            dbCountRowsFromL2Rchain(dbconn(x), L2Rchain, x@Lkeys, x@Rkeys)
        }
        countRows("BP") + countRows("CC") + countRows("MF")
    }
)


### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "links" methods.
###

setMethod("links", "FlatBimap",
    function(x)
    {
        Rattribnames(x) <- NULL
        x@data
    }
)

setMethod("links", "Bimap",
    function(x)
    {
        Rattribnames(x) <- NULL
        links(flatten(x, fromKeys.only=TRUE))
    }
)

setMethod("links", "Go3AnnDbBimap",
    function(x) links(flatten(x, fromKeys.only=TRUE)))


### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "count.links" methods.
###

setMethod("count.links", "Bimap",
    function(x)
    {
        Rattribnames(x) <- NULL
        nrow(x)
    }
)

setMethod("count.links", "Go3AnnDbBimap",
    function(x) nrow(x))


### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "nhit" methods.
###
### TODO: maybe it could be optimized for AnnDbBimap objects with a fast (and
### smart) SQL query (involving COUNT, HAVING or/and GROUP?) that retrieves
### only the strictly necessary stuff.
### Something that would be worth testing (e.g. with unit testing):
###    y <- nhit(x) # y is a named integer vector
###    names(y) should be identical to names(x)
###    sum(y) should be equal to count.links(x)
###

setMethod("nhit", "list",
    function(x)
    {
        sapply(x,
            function(xx)
                if (length(xx) == 1L && is.na(xx)) 0L else length(xx)
        )
    }
)

setMethod("nhit", "Bimap", function(x) nhit(as.list(x)))

setMethod("nhit", "environment",
    function(x)
    {
        #nhit(as.list(x, all.names=TRUE))
        ## Using eapply should be faster than the above (not tested, I'm just
        ## assuming).
        unlist(
            eapply(x,
                function(xx)
                    if (length(xx) == 1 && is.na(xx)) 0L else length(xx),
                all.names=TRUE
            ),
            recursive=FALSE
        )
    }
)


### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The "ncol" and "dim" methods.
###

setMethod("ncol", "Bimap",
    function(x) length(colnames(x)))

setMethod("dim", "Bimap",
    function(x) c(nrow(x), ncol(x)))



### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### Method to remove the multiple mapping filter from AnnDbBimaps that have a probe mapping in them.
###

.toggleFilter = function(x, value = c("all","single","multiple")){
  type <- match.arg(value)
  ans <- x
  if(ans@L2Rchain[[1]]@tablename!="probes"){
    stop("This method can only be used on mappings where the Lkeys are probes.")
  }
  switch(type,
         "all" = ans@L2Rchain[[1]]@filter <- "{is_multiple} IN ('1','0')",
         "single"  = ans@L2Rchain[[1]]@filter <- "{is_multiple}='0'",
         "multiple" = ans@L2Rchain[[1]]@filter <- "{is_multiple}='1'"
         )
  return(ans)
}

setMethod("toggleProbes", "ProbeAnnDbBimap", function(x, value){.toggleFilter(x, value)})
setMethod("toggleProbes", "ProbeAnnDbMap", function(x, value){.toggleFilter(x, value)})
setMethod("toggleProbes", "ProbeIpiAnnDbMap", function(x, value){.toggleFilter(x, value)})
setMethod("toggleProbes", "ProbeGo3AnnDbBimap", function(x, value){.toggleFilter(x, value)})



### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### Methods to indicate the mask setting of the probes
###

.hasMultiProbes = function(x){
  f <- x@L2Rchain[[1]]@filter
  if((f=="{is_multiple}='1'" || f=="{is_multiple} IN ('1','0')") && x@L2Rchain[[1]]@tablename=="probes"){
    ans <- TRUE
  }else{
    ans <- FALSE
  }
  return(ans)
}

setMethod("hasMultiProbes", "ProbeAnnDbBimap",function(x){.hasMultiProbes(x)})
setMethod("hasMultiProbes", "ProbeAnnDbMap",function(x){.hasMultiProbes(x)})
setMethod("hasMultiProbes", "ProbeIpiAnnDbMap",function(x){.hasMultiProbes(x)})
setMethod("hasMultiProbes", "ProbeGo3AnnDbBimap",function(x){.hasMultiProbes(x)})


.hasSingleProbes = function(x){
  f <- x@L2Rchain[[1]]@filter
  if((f=="{is_multiple}='0'" || f=="{is_multiple} IN ('1','0')") && x@L2Rchain[[1]]@tablename=="probes"){
    ans <- TRUE
  }else{
    ans <- FALSE
  }
  ans
}

setMethod("hasSingleProbes", "ProbeAnnDbBimap",function(x){.hasSingleProbes(x)})
setMethod("hasSingleProbes", "ProbeAnnDbMap",function(x){.hasSingleProbes(x)})
setMethod("hasSingleProbes", "ProbeIpiAnnDbMap",function(x){.hasSingleProbes(x)})
setMethod("hasSingleProbes", "ProbeGo3AnnDbBimap",function(x){.hasSingleProbes(x)})


### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### The BimapFilter methods.
###

##usage: getBimapFilter(myBiMap)
##returns the filter that is on there for any Bimap

setMethod("getBimapFilters", "AnnDbBimap",
##      function(x){L2Rchain.bimapFilter(x@L2Rchain)})
    function(x) {
      L2Rchain <- x@L2Rchain
      filter <- sapply(L2Rchain, function(x){x@filter})
      filter <- filter[!is.na(filter)]
      return(filter)
    })






### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
### Method for making a FlatBimap by using the select method.
###

## This method takes x, which is an AnnotationDb object, and also a SINGLE
## cols value.  It is not exported, and is used only for testing the backwards
## compatibility provided by getAnnMap() in the annotate package.


## I need to check with Hervé if my assumptions here are correct that I
## Don't want any NAs in @data OR in @Rkeys and @Lkeys???
## Maybe I don't want NAs in only one or the other?
## And maybe the answer is in how mappedLkeys and mappedRkeys work?


makeFlatBimapUsingSelect <- function(x, col){
  suppressWarnings(tab <- select(x, keys=keys(x),
                                 columns=col))
  ## tab cannot contain ANY NA values??? - seems like a dicey idea...
  ## what about keys that are unmapped but still valid?
  idx = apply(tab, MARGIN=1, function(x){!any(is.na(x))})
  tab <- tab[idx,]
  ## keys have to be unique as well
  lkys <- unique(tab[,1])
  rkys <- unique(tab[,2])
  new("FlatBimap", colmetanames=c("Lkeyname", "Rkeyname"),
      direction=1, data=tab, Lkeys=lkys, Rkeys=rkys)

}

## Hervé says that I in fact want to have no NAs (and be unique) in the Rkey and Lkeys, and that I ALSO do not want ANY NAs in either column of @data.  So I have modified this function accordingly








## OLD method for making bimaps (probably should deprecate this after the release)
##Makes a simpleBimap for tables that are added outside of standard AnnotationForge Schemas.
##This function requires that the bimap map from a single table in the DB.
createSimpleBimap <- function(tablename, Lcolname, Rcolname,
                              datacache,
                              objName=as.character(NA),
                              objTarget=as.character(NA))
{
    seed <- list(
                 objName=objName,
                 objTarget=objTarget,
                 Class="AnnDbBimap",
                 L2Rchain=list(
                   list(
                        tablename=tablename,
                        Lcolname=Lcolname,
                        Rcolname=Rcolname
                        )
                   ),
                 datacache=datacache
                 )
    createAnnDbBimap(seed, list())
}
jmacdon/AnnotationDbi documentation built on Oct. 22, 2021, 9:14 p.m.