Function | Description
------------ | -------------
reduce_dimensions
| Perform dimensionality reduction (PCA, MDS, tSNE)
rotate_dimensions
| Rotate two dimensions of a degree
cluster_elements
| Labels elements with cluster identity
remove_redundancy
| Filter out elements with highly correlated features
fill_missing
| Fill values of missing element/feature pairs
impute_missing
| Impute values of missing element/feature pairs
permute_nest
| From one column build a two permuted columns with nested information
combine_nest
| From one column build a two combination columns with nested information
keep_variable
| Keep top variable features
lower_triangular
| keep rows corresponding to a lower triangular matrix
Utilities | Description
------------ | -------------
as_matrix
| Robustly convert a tibble to matrix
subset
| Select columns with information relative to a column of interest
element | feature | value
------------ | ------------- | -------------
chr
or fctr
| chr
or fctr
| numeric
element | feature | value | new information
------------ | ------------- | ------------- | -------------
chr
or fctr
| chr
or fctr
| numeric
| ...
library(knitr) knitr::opts_chunk$set(cache = TRUE, warning = FALSE, message = FALSE, cache.lazy = FALSE) library(dplyr) library(tidyr) library(ggplot2) library(purrr) library(magrittr) library(nanny) my_theme = theme_bw() + theme( panel.border = element_blank(), axis.line = element_line(), panel.grid.major = element_line(size = 0.2), panel.grid.minor = element_line(size = 0.1), text = element_text(size=12), legend.position="bottom", aspect.ratio=1, strip.background = element_blank(), axis.title.x = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10)), axis.title.y = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10)) )
devtools::install_github("stemangiola/nanny")
nanny is a collection of wrapper functions for high level data analysis and manipulation following the tidy paradigm.
mtcars_tidy = mtcars %>% as_tibble(rownames="car_model") %>% mutate_at(vars(-car_model,- hp, -vs), scale) %>% gather(feature, value, -car_model, -hp, -vs) mtcars_tidy
reduce_dimensions
We may want to reduce the dimensions of our data, for example using PCA, MDS of tSNE algorithms. reduce_dimensions
takes a tibble, column names (as symbols; for element
, feature
and value
) and a method (e.g., MDS, PCA or tSNE) as arguments and returns a tibble with additional columns for the reduced dimensions.
MDS
mtcars_tidy_MDS = mtcars_tidy %>% reduce_dimensions(car_model, feature, value, method="MDS", .dims = 3)
On the x and y axes axis we have the reduced dimensions 1 to 3, data is coloured by cell type.
mtcars_tidy_MDS %>% subset(car_model) %>% select(contains("Dim"), everything()) mtcars_tidy_MDS %>% subset(car_model) %>% GGally::ggpairs(columns = 4:6, ggplot2::aes(colour=factor(vs)))
PCA
mtcars_tidy_PCA = mtcars_tidy %>% reduce_dimensions(car_model, feature, value, method="PCA", .dims = 3)
On the x and y axes axis we have the reduced dimensions 1 to 3, data is coloured by cell type.
mtcars_tidy_PCA %>% subset(car_model) %>% select(contains("PC"), everything()) mtcars_tidy_PCA %>% subset(car_model) %>% GGally::ggpairs(columns = 4:6, ggplot2::aes(colour=factor(vs)))
tSNE
mtcars_tidy_tSNE = mtcars_tidy %>% reduce_dimensions(car_model, feature, value, method = "tSNE")
Plot
mtcars_tidy_tSNE %>% subset(car_model) %>% select(contains("tSNE"), everything()) mtcars_tidy_tSNE %>% subset(car_model) %>% ggplot(aes(x = `tSNE1`, y = `tSNE2`, color=factor(vs))) + geom_point() + my_theme
rotate_dimensions
We may want to rotate the reduced dimensions (or any two numeric columns really) of our data, of a set angle. rotate_dimensions
takes a tibble, column names (as symbols; for element
, feature
and value
) and an angle as arguments and returns a tibble with additional columns for the rotated dimensions. The rotated dimensions will be added to the original data set as <NAME OF DIMENSION> rotated <ANGLE>
by default, or as specified in the input arguments.
mtcars_tidy_MDS.rotated = mtcars_tidy_MDS %>% rotate_dimensions(`Dim1`, `Dim2`, .element = car_model, rotation_degrees = 45, action="get")
Original On the x and y axes axis we have the first two reduced dimensions, data is coloured by cell type.
mtcars_tidy_MDS.rotated %>% ggplot(aes(x=`Dim1`, y=`Dim2`, color=factor(vs) )) + geom_point() + my_theme
Rotated On the x and y axes axis we have the first two reduced dimensions rotated of 45 degrees, data is coloured by cell type.
mtcars_tidy_MDS.rotated %>% ggplot(aes(x=`Dim1 rotated 45`, y=`Dim2 rotated 45`, color=factor(vs) )) + geom_point() + my_theme
cluster_elements
We may want to cluster our data (e.g., using k-means element-wise). cluster_elements
takes as arguments a tibble, column names (as symbols; for element
, feature
and value
) and returns a tibble with additional columns for the cluster annotation. At the moment only k-means clustering is supported, the plan is to introduce more clustering methods.
k-means
mtcars_tidy_cluster = mtcars_tidy_MDS %>% cluster_elements(car_model, feature, value, method="kmeans", centers = 2, action="get" )
We can add cluster annotation to the MDS dimesion reduced data set and plot.
mtcars_tidy_cluster %>% ggplot(aes(x=`Dim1`, y=`Dim2`, color=cluster_kmeans)) + geom_point() + my_theme
mtcars_tidy_SNN = mtcars_tidy_tSNE %>% cluster_elements(car_model, feature, value, method = "SNN")
mtcars_tidy_SNN %>% subset(car_model) %>% select(contains("tSNE"), everything()) mtcars_tidy_SNN %>% subset(car_model) %>% ggplot(aes(x = `tSNE1`, y = `tSNE2`, color=cluster_SNN)) + geom_point() + my_theme
drop_redundant
We may want to remove redundant elements from the original data set (e.g., elements or features), for example if we want to define cell-type specific signatures with low element redundancy. remove_redundancy
takes as arguments a tibble, column names (as symbols; for element
, feature
and value
) and returns a tibble dropped recundant elements (e.g., elements). Two redundancy estimation approaches are supported:
removal of highly correlated clusters of elements (keeping a representative) with method="correlation"
mtcars_tidy_non_redundant = mtcars_tidy_MDS %>% remove_redundancy(car_model, feature, value)
We can visualise how the reduced redundancy with the reduced dimentions look like
mtcars_tidy_non_redundant %>% subset(car_model) %>% ggplot(aes(x=`Dim1`, y=`Dim2`, color=factor(vs))) + geom_point() + my_theme
mtcars_tidy_non_redundant = mtcars_tidy_MDS %>% remove_redundancy( car_model, feature, value, method = "reduced_dimensions", Dim_a_column = `Dim1`, Dim_b_column = `Dim2` )
mtcars_tidy_non_redundant %>% subset(car_model) %>% ggplot(aes(x=`Dim1`, y=`Dim2`, color=factor(vs))) + geom_point() + my_theme
fill_missing
This function allows to obtain a rectangular underlying data structure, where every element has one feature, filling missing element/feature pairs with a value of choice (e.g., 0)
We create a non-rectangular data frame
mtcars_tidy_non_rectangular = mtcars_tidy %>% slice(-1)
We fill the missing value with the value of 0
mtcars_tidy_non_rectangular %>% fill_missing(car_model, feature, value, fill_with = 0)
impute_missing
This function allows to obtain a rectangular underlying data structure, where every element has one feature, imputig missing element/feature pairs with a function of choice (e.g., median)
We impute the missing value with the a summary value (median by default) according to a grouping
mtcars_tidy_non_rectangular %>% mutate(vs = factor(vs)) %>% impute_missing( car_model, feature, value, ~ vs) %>% # Print imputed first arrange(car_model != "Mazda RX4" | feature != "mpg")
permute_nest
From one column build a two permuted columns with nested information
mtcars_tidy_permuted = mtcars_tidy %>% permute_nest(car_model, c(feature,value)) mtcars_tidy_permuted
combine_nest
From one column build a two combination columns with nested information
mtcars_tidy %>% combine_nest(car_model, value)
lower_triangular
keep rows corresponding to a lower triangular matrix
mtcars_tidy_permuted %>% # Summarise mpg mutate(data = map(data, ~ .x %>% filter(feature == "mpg") %>% summarise(mean(value)))) %>% unnest(data) %>% # Lower triangular lower_triangular(car_model_1, car_model_2, `mean(value)`)
keep_variable
Keep top variable features
mtcars_tidy %>% keep_variable(car_model, feature, value, top=10)
as_matrix
Robustly convert a tibble to matrix
mtcars_tidy %>% select(car_model, feature, value) %>% spread(feature, value) %>% as_matrix(rownames = car_model) %>% head()
subset
Select columns with information relative to a column of interest
mtcars_tidy %>% subset(car_model)
nest_subset
Nest a data frame based on the columns with information relative to the column provided to nest
mtcars_tidy %>% nest_subset(data = -car_model)
ADD
versus GET
versus ONLY
modesEvery function takes a tidyfeatureomics structured data as input, and (i) with action="add" outputs the new information joint to the original input data frame (default), (ii) with action="get" the new information with the element or feature relative informatin depending on what the analysis is about, or (iii) with action="only" just the new information. For example, from this data set
mtcars_tidy
action="add" (Default) We can add the MDS dimensions to the original data set
mtcars_tidy %>% reduce_dimensions( car_model, feature, value, method="MDS" , .dims = 3, action="add" )
action="get" We can add the MDS dimensions to the original data set selecting just the element-wise column
mtcars_tidy %>% reduce_dimensions( car_model, feature, value, method="MDS" , .dims = 3, action="get" )
action="only" We can get just the MDS dimensions relative to each element
mtcars_tidy %>% reduce_dimensions( car_model, feature, value, method="MDS" , .dims = 3, action="only" )
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.