knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The SummarizedExperiment (se) class offers a useful way to store multiple row and column
data along with the values from an experiment and is widely used in computational biology.
Although subsetting se's is possible with base R notation (ie using []
), se's lack methods
for dplyr functions such as filter
and select
, which excludes se's from easily being used in pipes.
This package offers methods for the dplyr functions and will automatically dispatch
your se to the relevant function.
As se's contain a data frame for both the rowData and colData, a major difference using these functions in cleanse, is that we need to specify whether we apply our function to the row or the col of the se. Cleanse will then take care of updating the se.
The package contains an example se called seq_se
. The example contains
dummy data of expression and copy number values for 10 genes and 48 different conditions.
To get an overview of the different options available in the se, use print_options
:
library(cleanse)
data(seq_se) print_options(seq_se)
As you can see, seq_se contains expression and copy_number data for samples taken from 3 different sites, which were treated with either treatment A or B for 0 or 4 hours for 4 different patients. The genes that were sequenced are from 3 different gene groups: IL, NOTCH, and TLR. Each of the rowData / colData columns (eg patient, site, gene_group) is refered to as a variable henceforth.
These functions either return a subset of the se or a rearranged se. For each of them, the underlying assay values are updated accordingly, so everything is kept in sync.
The filter
function subsets rows / cols from the se based on conditional filtering
of a variable contained in either rowData or colData respectively.
Experiment values in the assay are dropped concurrently with the update of the colData/rowData.
genes_subset_se <- seq_se %>% filter(row, gene_group == "IL") print_options(genes_subset_se) #note the change in available gene_groups dim(seq_se) dim(genes_subset_se) sample_subset_se <- seq_se %>% filter(col, treatment == "B", site %in% c("brain", "skin")) print_options(sample_subset_se) #note the change in available treatments and sites dim(seq_se) dim(sample_subset_se)
To subset a se by position, slice
can be used:
seq_se %>% cleanse::slice(col, 1:10) #select the first 10 columns
arrange
is used to consecutively sort a data frame by a variable. In this case,
we arrange the rows, first by gene_name and next by gene_group.
seq_se_reordered <- seq_se %>% arrange(row, gene_name, gene_group)
The slice_sample() function behaves similar to dplyr's equivalent by selecting random rows or cols:
slice_sample(seq_se, col, n=3)#note the change in dim slice_sample(seq_se, row, prop=.2) #note the change in dim
These functions return a se with updated rowData or colData. The dimensions and assay values of the se will not be changed by these functions.
select
can be used to select variables, and rename
will rename these variables.
# remove the time variable after filtering for time == 0 seq_se_min_time <- seq_se %>% cleanse::filter(col, time == 0) %>% cleanse::select(col, -time) print_options(seq_se_min_time) #note the time variable has disappeared from the colData # rename the time variable after changing it to minutes seq_se_ren_time <- seq_se %>% cleanse::mutate(col, time = (time * 60)) %>% cleanse::rename(col, time_mins = time) print_options(seq_se_ren_time) #note the time variable is now called time_mins
mutate
adds or changes variables. If we for instance want to change the time for
the samples from hours to minutes, we can do
seq_se_mins <- seq_se %>% mutate(col, time = (time * 60)) seq_se$time seq_se_mins$time
or to combine the gene groups and gene names to one new variable in the rowData
seq_se_gene_comb <- seq_se %>% mutate(row, group_and_name = paste(gene_group, gene_name, sep = "_"))
A non-dplyr function that will change the metadata is drop_metadata
: This function
will drop all rowData and colData variables that have only 1 value. Typically used
after subsetting:
seq_se_dropped <- seq_se %>% filter(col, time == 4) %>% drop_metadata() print_options(seq_se_dropped) #note the time variable from colData is dropped as all values were 4
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.