SlicedData-class: Class 'SlicedData' for storing large matrices

Description Usage Arguments Extends Fields Methods Author(s) References Examples

Description

This class is created for fast and memory efficient manipulations with large datasets presented in matrix form. It is used to load, store, and manipulate large datasets, e.g. genotype and gene expression matrices. When a dataset is loaded, it is sliced in blocks of 1,000 rows (default size). This allows imputing, standardizing, and performing other operations with the data with minimal memory overhead.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# x[[i]] indexing allows easy access to individual slices.
# It is equivalent to x$GetSlice(i) and x$SetSlice(i,value)
## S4 method for signature 'SlicedData'
x[[i]]
## S4 replacement method for signature 'SlicedData'
x[[i]] <- value

# The following commands work as if x was a simple matrix object
## S4 method for signature 'SlicedData'
nrow(x)
## S4 method for signature 'SlicedData'
ncol(x)
## S4 method for signature 'SlicedData'
dim(x)
## S4 method for signature 'SlicedData'
rownames(x)
## S4 method for signature 'SlicedData'
colnames(x)
## S4 replacement method for signature 'SlicedData'
rownames(x) <- value
## S4 replacement method for signature 'SlicedData'
colnames(x) <- value

# SlicedData object can be easily transformed into a matrix
# preserving row and column names
## S4 method for signature 'SlicedData'
as.matrix(x)

# length(x) can be used in place of x$nSlices()
# to get the number of slices in the object
## S4 method for signature 'SlicedData'
length(x)

Arguments

x

SlicedData object.

i

Number of a slice.

value

New content for the slice / new row or column names.

Extends

SlicedData is a reference classes (envRefClass). Its methods can change the values of the fields of the class.

Fields

dataEnv:

environment. Stores the slices of the data matrix. The slices should be accessed via getSlice() and setSlice() methods.

nSlices1:

numeric. Number of slices. For internal use. The value should be access via nSlices() method.

rowNameSlices:

list. Slices of row names.

columnNames:

character. Column names.

fileDelimiter:

character. Delimiter separating values in the input file.

fileSkipColumns:

numeric. Number of columns with row labels in the input file.

fileSkipRows:

numeric. Number of rows with column labels in the input file.

fileSliceSize:

numeric. Maximum number of rows in a slice.

fileOmitCharacters:

character. Missing value (NaN) representation in the input file.

Methods

initialize(mat):

Create the object from a matrix.

nSlices():

Returns the number of slices.

nCols():

Returns the number of columns in the matrix.

nRows():

Returns the number of rows in the matrix.

Clear():

Clears the object. Removes the data slices and row and column names.

Clone():

Makes a copy of the object. Changes to the copy do not affect the source object.

CreateFromMatrix(mat):

Creates SlicedData object from a matrix.

LoadFile(filename, skipRows = NULL, skipColumns = NULL,
sliceSize = NULL, omitCharacters = NULL, delimiter = NULL, rowNamesColumn = 1):

Loads data matrix from a file. filename should be a character string. The remaining parameters specify the file format and have the same meaning as file* fields. Additional rowNamesColumn parameter specifies which of the columns of row labels to use as row names.

SaveFile(filename):

Saves the data to a file. filename should be a character string.

getSlice(sl):

Retrieves sl-th slice of the matrix.

setSlice(sl, value):

Set sl-th slice of the matrix.

ColumnSubsample(subset):

Reorders/subsets the columns according to subset.
Acts as M = M[ ,subset] for a matrix M.

RowReorder(ordr):

Reorders rows according to ordr.
Acts as M = M[ordr, ] for a matrix M.

RowMatrixMultiply(multiplier):

Multiply each row by the multiplier.
Acts as M = M %*% multiplier for a matrix M.

CombineInOneSlice():

Combines all slices into one. The whole matrix can then be obtained via $getSlice(1).

IsCombined():

Returns TRUE if the number of slices is 1 or 0.

ResliceCombined(sliceSize = -1):

Cuts the data into slices of sliceSize rows. If sliceSize is not defined, the value of fileSliceSize field is used.

GetAllRowNames():

Returns all row names in one vector.

RowStandardizeCentered():

Set the mean of each row to zero and the sum of squares to one.

SetNanRowMean():

Impute rows with row mean. Rows full of NaN values are imputed with zeros.

RowRemoveZeroEps():

Removes rows of zeros and those that are nearly zero.

FindRow(rowname):

Finds row by name. Returns a pair of slice number an row number within the slice. If no row is found, the function returns NULL.

rowMeans(x, na.rm = FALSE, dims = 1L):

Returns a vector of row means. Works as rowMeans but requires dims to be equal to 1L.

rowSums(x, na.rm = FALSE, dims = 1L):

Returns a vector of row sums. Works as rowSums but requires dims to be equal to 1L.

colMeans(x, na.rm = FALSE, dims = 1L):

Returns a vector of column means. Works as colMeans but requires dims to be equal to 1L.

colSums(x, na.rm = FALSE, dims = 1L):

Returns a vector of column sums. Works as colSums but requires dims to be equal to 1L.

Author(s)

Andrey Shabalin ashabalin@vcu.edu

References

The package website: http://www.bios.unc.edu/research/genomic_software/Matrix_eQTL/

Examples

1
# Create a SlicedData variable

panhongNTU/GEM documentation built on May 24, 2019, 6:14 p.m.