knitr::opts_chunk$set( collapse = TRUE, dpi=60 )
chip - tee - snee
r if(FALSE){Githubpkg("jrboyd/chiptsne")}
This package is in active development.
Future updates will be focused on decreasing RAM usage.
ChIPtSNE
- create the primary object used to run t-SNE and controls configuration.ChIPtSNE.runTSNE
- runs t-SNE using ChIPtSNE and prepares data for analysis and plots.ct*
- All functions to create plots are prefixed with ct.dir.create("~/test_chipstne_install") .libPaths("~/test_chipstne_install", include.site = FALSE) if(!require("ssvQC")) devtools::install_github("FrietzeLabUVM/ssvQC") if(!require("chiptsne")) devtools::install_github("FrietzeLabUVM/chiptsne")
library(chiptsne)
theme_set(theme_classic())
These parameters will determine how multiple functions behave
options("mc.cores" = 20) color_mapping = c("H3K4me3" = "forestgreen", "H3K27me3" = "firebrick1") QcConf options("mc.cores" = 2) perplexity = 15
The two critical inputs are:
# input_dirs = c( # dir("/slipstream/galaxy/uploads/working/qc_framework/output/", # pattern = "K4.+_pooled$", full.names = TRUE), # dir("/slipstream/galaxy/uploads/working/qc_framework/output/", # pattern = "K27.+_pooled$", full.names = TRUE) # ) # #assemble peak calls and sample # peaks = dir(input_dirs, pattern = "pooled_peaks.narrowPeak", full.names = TRUE) # olaps = ssvOverlapIntervalSets(easyLoad_narrowPeak(peaks), ext = 100) # olaps = olaps[rowSums(as.data.frame(mcols(olaps))) > 3] # length(olaps) # set.seed(0) # # query_gr = sample(olaps, 100) # query_gr = olaps # query_gr = resize(query_gr, 5000, fix = "center") # query_gr$id = names(query_gr) # #create config from bws # bws = dir(input_dirs, pattern = "pooled_FE.bw", full.names = TRUE) data("query_gr") bw_files = dir(system.file('extdata', package = "chiptsne"), full.names = TRUE, pattern = ".bw$") cfg_dt = data.table(file = bw_files) cfg_dt[, c("cell", "mark") := tstrsplit(basename(file), "_", keep = 1:2)]
t-sne requires a wide-matrix view where each row is a point (observation) and each column is a dimension (attribute) of the data.
tsne_input = stsFetchTsneInput(cfg_dt, query_gr)
There's a bit of art to running t-sne, see this hands-on explanation to get oriented.
Be sure to set mc.cores
using options()
or provide the n_cores
parameter to speed up processing time.
tsne_res = stsRunTsne(tsne_input$bw_dt, perplexity = perplexity)
A basic plot facetted by the 3 cell lines.
ggplot(tsne_res, aes(x = tx, y = ty, color = tall_var)) + geom_point() + facet_wrap("tall_var")
Interpreting the t-sne landscape can be difficult but it is critical for effectiveness of the visualization. We can plot representative profiles in the t-sne landscape.
The resolution of these images is controlled by n_points
. Use xrng
and
yrng
to specify a region to view in detail. A strength of t-sne is that
it maps both broad trends and finer relationships effectively.
summary_dt = prep_summary(tsne_input$bw_dt, tsne_res, x_points = n_points) img = prep_images(summary_dt, x_points = n_points, line_color_mapping = color_mapping) img_res = plot_summary_raster(img$image_dt, x_points = n_points, min_size = 0) img_res
The same idea as before but now facetted by cell.
summary_dt_byCell = prep_summary(tsne_input$bw_dt, tsne_res, x_points = n_points, facet_by = "tall_var") img_byCell = prep_images(summary_dt_byCell, x_points = n_points, line_color_mapping = color_mapping, facet_by = "tall_var") img_bytall_var_res = plot_summary_raster_byCell(img_byCell$image_dt, x_points = n_points) img_bytall_var_res
Arrows identity how regions behave between two groups. Each arrow is drawn
from the postion in tall_var_a
to position in tall_var_b
with color mapped to
angle to aid in pattern detection.
vel_dt = prep_velocity(tsne_res, tall_var_a = "HUES48", tall_var_b = "HUES64") vel_res = plot_velocity_arrows(vel_dt) vel_res
Individual arrows can be overwhelming, we can also employ a binning strategy
to summarize the average destination in t-sne space for groups of points.
Each arrow here is drawn to the average desination (tall_var_b
) for all nearby
points (tall_var_a
).
plot_regional_velocity(tsne_dt, tall_var_a = "HUES48", tall_var_b = "HUES64", x_points = n_points)
We can also layer the binned arrows on top of the binned profiles.
plot_regional_velocity(tsne_dt, tall_var_a = "HUES48", tall_var_b = "HUES64", x_points = n_points, p = img_res)#, p = img_res$plot)
Individual profiles can also be viewed.
plot_profiles_selected(tsne_input$bw_dt, qtall_vars = c("HUES48", "HUES64"), id_to_plot = query_gr$id[1:5], color_mapping = color_mapping)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.