knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library("AltWrapper")
There are numerous data sources in R. When data is not in a standard R format, R provide a variaty of methods to read it. One can write a function to convert the data into an R data structure, or implement an S3/S4 class and keep the data in its original format. However, this flexibility introduces a lot of troubles when the data is interacting with the other R packages. While one package may assume the data is an R vector, the other package may expect the data is an S4 object with [
and [<-
implemented. Without a standard, it will waste a lot of time to handle the data structure, instead of doing the data analysis.
ALTREP
is a set of new APIs in R provided since R 3.5. It is designed to represent a non-standard data structure to a standard R vector using the unified APIs. At R level, there is no difference between an R vector and an ALTREP
vector. This mechanism provides a possibility to wrap any data to a lightweight R vector without using S3/S4 class while still preserving it original data format. While ALTREP
has the potential to be an official way to read any non-standard data in R, the requirement of knowing C++ language and R's C interfact and lacking of official document impede its development. Even for those who is familar with C++ language, it will takes a lot of energy to learn ALTREP
and manage their C++ code. AltWrapper
is a package for developers to make the use of ALTREP
simpler in R. Developers are able to define all ALTREP
APIs using R functions, and to debug and replace an ALTREP
function at any time without doing compilation. While R code is generally slower compared to C++ code, a special consideration is token to minimize the overhead of calling an user-defined R function. With the help of AltWrapper
, Users are not required to know C++ language to use ALTREP
.
We will illustrate the package with a simple example. Suppose we want to represent a sequence vector in a compact format. The sequence only has a starting value and its increment as the data. We will create a compacted R vecotr by defining ALTREP
APIS.
#sequence 1:10 A_compact = list(start = 1L, by = 1L, length = 10L) #Define ALTREP APIs lengthFunc <- function(x) { return(x$length) } getElementFunc <- function(x, i) { return(x$start + x$by * (i - 1)) } ## Register ALTREP class and functions ## The type of the vector is integer setAltClass(className = "compactSeq", "integer") setAltMethod(className = "compactSeq", getLength = lengthFunc) setAltMethod(className = "compactSeq", getElement = getElementFunc) #Create altWrapper object A_vector = newAltrep(className = "compactSeq", x = A_compact) #Check the result is.vector(A_vector)
We claim that the altWrapper class is an integer class. Therefore, the return value of the function newAltrep
is an integer R vector. The value can be accessed via regular [
operator. An altWrapper object must have getLength
defined. For most operations, altWrapper API getDataptr
is required. Since there is no actual data for a compact sequence, we define the API getElement
instead. Please refer to ?setAltMethod
to see a full list of available APIs.
Please note that the variable A_vector
cannot be printed just by typing its variable name in the console for
ALTREP is still underdevelopment and some R functions do not catch up with it. At the time of this writting, only R 3.7 devel version can correctly print the object. Before R 3.7, the variable can be printed out either by using [
operator to force R to use get_Elt
method, or by using printAltWrapper
function provided by AltWrapper package.
## Use [ operator A_vector[1L:10L] ## Use print method printAltWrapper(A_vector)
Alternatively, there is an S3/S4 print method defined for the altWrapper class, you can create an S3 object by calling the function newAltrep
with the argument S3Class = TRUE
.
#Create an S3 version of altWrapper object A_vector_s3 = newAltrep(className = "compactSeq", x = A_compact, S3Class = TRUE) A_vector_s3 #Check class type class(A_vector) class(A_vector_s3)
The definition of an API of an altWrapper class is ad hoc. You can change the API even after an object is created. For example, we can define the sum function of the class compactSeq
given that we have created an object A_vector
for the class.
sumFunc <- function(x, na.rm) { message("Sum function") a1 = x$start n = x$length d = x$by return(n * a1 + n * (n - 1) * d / 2) } setAltMethod(className = "compactSeq", sum = sumFunc) sum(A_vector)
The variable A_vector
has been made before we define the sum function for its class, but we still see the message from our customized sum function. We can also redefine the sum function at any moment.
sumFunc <- function(x, na.rm) { message("Sum function V2") a1 = x$start n = x$length d = x$by return(n * a1 + n * (n - 1) * d / 2) } setAltMethod(className = "compactSeq", sum = sumFunc) sum(A_vector)
A warning will be shown to indicate the sum function has been defined previously. It can be suppressed by calling setAltWrapperOptions(redefineWarning=FALSE)
. A function can be deleted by setting its value to NULL
. For example, The sum function can be removed from the class compactSeq
via setAltMethod(className = "compactSeq", sum = NULL)
.
you can get the altWrapper data from an altWrapper object by calling getAltWrapperData
function on A_vector
x = getAltWrapperData(A_vector) x
Naturally, the altWrapper data can be set via setAltWrapperData
x$start = 2 A_vector = setAltWrapperData(A_vector, x) printAltWrapper(A_vector)
By default, the function setAltWrapperData
will duplicate the object. If you only want to memorize some data(e.g. caching on-disk data), you can call setAltWrapperData
with argument duplicate = FALSE
. Please do not set duplicate = FALSE
if your new x
will change the value of A_vector
, because one R object can have multiple variable names, changing the value of an immutable object will change the value of all associated variables.
The following example shows how to retrieve a function from an altWrapper class.
sumFunc = getAltMethod(className = "compactSeq", methodName = "sum") sumFunc
A NULL value will be returned if the function is not found. You can also get a summary report of an altWrapper class, the report shows the type of the ALTREP class and status of all ALTREP APIs.
showAltClass(className = "compactSeq")
Because the altWrapper class is built upon ALTREP, the package exports a few ALTREP APIs. Please note that it is not recommended to call an ALTREP function directly to get/set the value of an altWrapper object since the data structure of an altWrapper object is subjected to change. These functions are served as development tools only. Here we simply list the ALTREP APIs, please call ?function
in R to see their documents.
is.altrep(A_vector) .getAltData1(A_vector) .getAltData2(A_vector) .setAltData1(A_vector, .getAltData1(A_vector)) .setAltData2(A_vector, .getAltData2(A_vector))
Although there is no real limitation on how an altWrapper function is constructed, there are certain expectations of the altWrapper function and it is a good practice to follow them:
The garbage collector will be suspended when calling getDataptr
function, so it will be problematic if the function allocates too many object without GC. Especially calling other functions in getDataptr
function should be better avoided since R's copy-on-write feature can cause unexpected allocations.
An altWrapper function should not run transfer control(e.g. LONGJMP
).
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.