IWTomicsData-class: Class '"IWTomicsData"'
In IWTomics: Interval-Wise Testing for Omics Data

Description Details Constructors Accessors Subsetting Combining Other methods Author(s) See Also Examples

The class "IWTomicsData" defines a container for storing a collection of aligned genomic region datasets, and their associated feature measurements, to be used as input for the Interval-Wise Testing of "Omics" data.

An object of class "IWTomicsData" is a list of genomic locations organized in different region datasets and aligned (e.g. around their center). Multiple genomic feature measurements are associated to each location. In particular, each feature is measured in windows of fixed size inside each location. As a consequence, a vector of measurement is associated to each pair of locations and features. This information is stored in the following slots:

metadata:

list with region_datasets and feature_datasets components. The component region_datasets is a data frame with names, file names and size of each region dataset. The component feature_datasets is a data frame with names, file names and resolution of each feature.

regions:

?"GRangesList" object containing the genomic locations of each region dataset.

alignment:

string indicating the region alignment type. Can be "left", "right", "center" or "scale".

features:

list of matrix lists, with columns of aligned feature measurments corresponding to each feature in each region dataset.

length_features:

list of vector lists, with the number of measurments corresponding to each feature in each region dataset.

test:

(optional) list with input and result, containing test input and results. In particular, input is a list with components:

id_region1: identifier(s) of the region dataset(s) tested.
id_region2: identifier(s) of the region dataset(s) tested for two sample test.
id_features_subset: vector with the identifiers of the features tested.
mu: the center of symmetry under the null hypothesis in one sample test, or the difference between the two populations in two sample test.
statistics: test statistics used in the test.
probs: probabilities corresponding to the quantiles in test statistics "quantile".
max_scale: the maximum interval length used for the p-value adjustment.
paired: if TRUE, the test was paired.
B: number of permutation used in the test.

Each element of the list result is a list of test results for the features tested. Each test result is a a list with components:

test: string vector indicating the type of test performed, "1pop" or "2pop" for one sample and two sample tests, respectively.
mu: the center of symmetry under the null hypothesis in one sample test, or the difference between the two populations in two sample test, for the particular test considered.
max_scale the maximum interval length used for the p-value adjustment.
T0_plot: value of the test statistics without squaring (used by plotSummary to draw the summary plot).
adjusted_pval: adjusted p-value curve, i.e. adjusted p-values for each point of the curves. The adjustment is done considering max_scale as length.
adjusted_pval_matrix: matrix of size the number of points in the curves with the adjusted p-value curves for each possible scale up to max_scale. Row i of the matrix contains the adjusted p-value curve with correction done up to scale p-i+1 (the matrix contains NA for scale greater than max_scale).
unadjusted_pval: p-value curve, i.e. raw p-values for each point of the curves.
pval_matrix: matrix of size the number of points in the curves with the raw p-values of the multivariate tests. The element (i,j) of the matrix contains the p-value of the joint NPC test of the components j,j+1,...,j+(p-i) (the matrix contains NA for scale greater than max_scale).
exact: logical value indicating whether the exact p-values have been computed.
notNA: vector of logical vectors indicated the points of the curves where the test was performed (used by plotSummary to draw the summary plot).

Objects of class "IWTomicsData" can be initialized from BED or Table files, or they can be directly created by supplying a "GRangesList" of genomic region datasets and a "list" with the aligned feature measurements (see Contructors). The optional slot test is filled by the function IWTomicsTest that performs the Interval-Wise Testing.

IWTomicsData(x, y, alignment='center', id_regions=NULL, name_regions=NULL, id_features=NULL, name_features=NULL, path=NULL, start.are.0based=TRUE, header=FALSE, ...): creates a "IWTomicsData" object from BED or Table files.

IWTomicsData(x, y, alignment='center', id_regions=NULL, name_regions=NULL, id_features=NULL, name_features=NULL, length_features=NULL): creates a "IWTomicsData" object from genomic regions datasets and feature measurements.

x

vector with the names of the region files containing the region datasets to be loaded. BED and Table formats currently supported. Alternative constructor: "GRangesList" object with genomic locations of each region dataset.

y

vector with the names of the feature files, or dataframe with columns of feature file names corresponding to the different region datasets. Each feature must be measured in windows of a fixed size inside all the regions. BED and Table formats currently supported: either a row (with 4 columns chr, start, end, measure) for each window, or a row for each region (with columns chr, start, end, value1, ..., valueN). Note that all files must be sorted. Alternative constructor: list of matrix lists, with columns of aligned feature measurments corresponding to each feature in each region dataset.

alignment

region alignment. Possible types are:

"left" for alignment of the starting positions,
"right" for alignment of the ending positions,
"center" for alignment of the central positions (default),
"scale" for scaling all regions to the same length.

id_regions

vector with the identifiers of the region datasets. If NULL, file_regions or names(regions) are used.

name_regions

vector with the names of the region datasets to be used in the output. If NULL, the identifiers id_regions are used.

id_features

vector with the identifiers of the features. If NULL, file_features or names(features) are used.

name_features

vector with the names of the features to be used in the output plots. If NULL, the identifiers id_features are used.

path

the directory that contains the files. If NULL, the current working directory is used.

start.are.0based

if TRUE (default) the start position in the region files are considered to be 0-based, and converted to 1-based in the "IWTomicsData" object in output.

header

TRUE or FALSE (default) indicating if the files contain the names of the variables as their first lines.

length_features

list of vector lists, with the number of measurments corresponding to each feature in each region dataset.

...

additional parameters in input to read.delim.

In the following code, x is a "IWTomicsData" object.

: nRegions(x): get the number of region datasets.
: nFeatures(x): get the number of features.
: dim(x): get the dimension of the object (number of region datasets, number of features).
: lengthRegions(x): get the number of locations in each region dataset.
: lengthFeatures(x): get a list of vector list, with the number of measurements corresponding to each feature in each region dataset.
: resolution(x): get the measurement resolution for each feature.
: metadata(x): get the metadata associated with the object, i.e. a list with region_datasets and feature_datasets components.
: regions(x): get the "GRangesList" object containing the genomic locations of each region dataset.
: features(x): get a list of matrix lists, with columns of aligned feature measurements corresponding to each feature in each region dataset.
: idRegions(x): get the identifiers of the region datasets.
: idFeatures(x): get the identifiers of the features.
: nameRegions(x): get the names of the region datasets.
: nameFeatures(x): get the names of the features.
: alignment(x): get the region alignment.
: testInput(x): get the test input (if present).
: nTests(x): get the number of tests present.
: idRegionsTest(x), idRegionsTest(x,test): get the identifiers of the region datasets in the different tests. The (optional) argument test indicates the indices of the tests to be considered.
: idFeatuersTest(x): get the identifiers of the features tested.
: adjusted_pval(x), adjusted_pval(x,test,id_features_subset,scale_threshold): get the adjusted p-values of the different tests. The (optional) argument test indicates the indices of the tests to be considered. The (optional) argument id_features_subset is a vector with the identifiers of the features to be cosidered. The (optional) argument scale_threshold is the threshold on the test scale (maximum interval length for the p-value adjustment) for the adjusted p-value computation. Can be either a scalar (the same length for all features) or a vector (a length for each feature) or a list of vectors (a vector for each test). See IWTomicsTest for more details.

In the following code, x is a "IWTomicsData" object. The optional slot test, if present in x, is deleted when using subsetting methods.

: x[i,j]: extract region dataset i and feature j in a new "IWTomicsData" object. Both i and j can be logical vectors, numeric vectors, character vectors (with region dataset and feature identifiers, respectively), or missing.

In the following code, x is a "IWTomicsData" object. The optional slot test, if present in x, is deleted when using combining methods.

: c(x,...) and merge(x,...): create a new "IWTomicsData" object combining x with the "IWTomicsData" objects in .... Any object in ... must have the same region alignment as x, and region datasets and features present in multiple objects must coincide.
: rbind(x,...): create a new "IWTomicsData" object combining the features in x with the features in the "IWTomicsData" objects .... Region datasets in x and any object in ... must coincide and have the same region alignment.
: cbind(x,...): create a new "IWTomicsData" object combining the region datasets in x with the region datasets in the "IWTomicsData" objects .... Features in x and any object in ... must coincide and the "IWTomicsData" objects must have the same region alignment.

: show(x): The show method prints the number of region datasets, their alignment type and the number of features in the "IWTomicsData" object. It also displays names and size of the region datasets, and names and resolution of the features. If the slot test is present in x, the show method prints also the comparisons present.

Marzia A Cremona, Alessia Pini, Francesca Chiaromonte, Simone Vantini

plot method to plot "IWTomicsData" objects; smooth method to smooth curves in "IWTomicsData" objects; IWTomicsTest for the Interval-Wise Testing.

examples_path <- system.file("extdata",package="IWTomics")
datasets=read.table(file.path(examples_path,"datasets.txt"),
                    sep="\t",header=TRUE,stringsAsFactors=FALSE)
features_datasetsBED=read.table(file.path(examples_path,"features_datasetsBED.txt"),
                                sep="\t",header=TRUE,stringsAsFactors=FALSE)
features_datasetsTable=read.table(file.path(examples_path,"features_datasetsTable.txt"),
                                  sep="\t",header=TRUE,stringsAsFactors=FALSE)
data(regions_example)
data(features_example)

## -------------------------------------------------------------------------------------------
## CONSTRUCTION
## -------------------------------------------------------------------------------------------
## Get genomic regions for four region datasets, 
## and two features for each region dataset

## From BED files (check for consistency, time consuming) 
regionsFeaturesBED=IWTomicsData(datasets$regionFile,features_datasetsBED[,3:6],
                                'center',datasets$id,datasets$name,
                                features_datasetsBED$id,features_datasetsBED$name,
                                path=file.path(examples_path,'files'))
regionsFeaturesBED

## From Table files (less checks for consistency, more efficient)
regionsFeaturesTable=IWTomicsData(datasets$regionFile,features_datasetsTable[,3:6],
                                  'center',datasets$id,datasets$name,
                                  features_datasetsTable$id,features_datasetsTable$name,
                                  path=file.path(examples_path,'files'))
regionsFeaturesTable

## From genomic regions datasets and feature measurements.
regionsFeatures=IWTomicsData(regions_example,features_example,alignment='center')
regionsFeatures

## -------------------------------------------------------------------------------------------
## SUBSETTING 
## -------------------------------------------------------------------------------------------
## Extract a subset of region datasets and/or of features

## Get the first region dataset and the second features
regionsFeaturesBED[1,2]

## Get the first region dataset and the second features, using identifiers
regionsFeaturesBED['elem1','ftr2']

## Get the first two region datasets for all the features
regionsFeaturesBED[1:2,]

## Get all region datasets for the first feature
regionsFeaturesBED[,1]

## -------------------------------------------------------------------------------------------
## COMBINING 
## -------------------------------------------------------------------------------------------
data1=regionsFeaturesBED[1:2,1]
data2=regionsFeaturesBED[1:2,2]
data3=regionsFeaturesBED[2:3,]
data4=regionsFeaturesBED[4,]

## Merge different objects
data1
data2
c(data1,data2)
merge(data1,data2)

## Combine different features
data1
data2
cbind(data1,data2)

## Combine different regions
data3
data4
rbind(data3,data4)

## Combine methods together
rbind(cbind(data1,data2),data3,data4)