library(knitr) opts_chunk$set(tidy = TRUE, results = 'hide', comment = ">>", cache = FALSE, fig.height = 4, fig.width = 4, collapse = TRUE, fig.align='center')
The sights
package provides numerous normalization methods that correct the three types of bias that affect High-Throughput Screening (HTS) measurements: overall plate bias, within-plate spatial bias, and across-plate bias. Commonly-used normalization methods such as Z-scores (or methods such as percent inhibition/activation which use within-plate controls to normalize) correct only overall plate bias. Methods included in this package attempt to correct all three sources of bias and typically give better results.
Two statistical tests are also provided: the standard one-sample t-test and the recommended one-sample Random Variance Model (RVM) t-test, which has greater statistical power for the typically small number of replicates in HTS. Correction for the multiple statistical testing of the large number of constructs in HTS data is provided by False Discovery Rate (FDR) correction. The FDR can be described as the proportion of false positives among the statistical tests called significant.
Included graphical and statistical methods provide the means for evaluating data analysis choices for HTS assays on a screen-by-screen basis. These graphs can be used to check fundamental assumptions of both raw and normalized data at every step of the analysis process.
Citing Methods
Please cite the sights
package and specific methods as appropriate.
References for the methods can be found in this vignette, on their specific help pages, and in the manual. They can also be accessed by help(sights_method_name)
in R. For example:
# Help page of SPAWN with its references help(normSPAWN)
The package citation can be accessed in R by:
citation("sights")
``` {r installation, eval=FALSE} source("http://bioconductor.org/biocLite.R") biocLite("sights") library("sights")
2. This should also install and load the packages that SIGHTS imports: ggplot2 [@wickham2009ggplot2], reshape2 [@wickham2007reshaping], qvalue [@dabney2010qvalue], MASS [@venables2002mass], and lattice [@sarkar2008lattice]. Otherwise, you can install/update these packages manually. ``` {r dependencies, eval=FALSE} # Installing packages biocLite(c("ggplot2", "reshape2", "lattice", "MASS", "qvalue")) # Updating packages biocLite("BiocUpgrade") biocLite()
All SIGHTS normalization functions require that the data be arranged such that each plate is a column and each row is a well. The arrangement within each plate should be by-row first, then by-column. For more details and example, see help("ex_dataMatrix")
.
This required arrangement can be done in Microsoft Excel before importing the data into R, although advanced users may prefer to do so in R as needed.
data("ex_dataMatrix") help("ex_dataMatrix") ## Required data arrangement (by-row first) is explained. data("inglese")
Your own data can be imported by giving the path of your file:
If it is a .csv or .txt file, run
read.csv("~/yourfile.csv", header=TRUE, sep=",") ## '~' is the folder location of your file 'yourfile.csv'. ## Use header=TRUE if you have column headers (recommended); otherwise, use header=FALSE. ## N.B. Be sure to use a forward slash ('/') to separate folder names.
install.packages("xlsx") ## This installs the xlsx package which enables import/export of Excel files. library("xlsx") read.xlsx("~/yourfile.xlsx", sheetIndex = 1) # or read.xlsx("~/yourfile.xlsx", sheetName = "one") ## sheetIndex is the sheet number where your data is stored in 'yourfile.xlsx'; sheetName is the name of that sheet.
write.csv(object_name, "~/yourresults.csv") ## As a .csv file write.xlsx(object_name, "~/yourresults.xlsx") ## As a Microsoft Excel file (requires the "xlsx" package)
help("ex_dataMatrix")
Inglese et. al. data [@inglese2006quantitative], see help("inglese")
Some basic information about data (including your own data after importing) can be accessed by various functions. For example, information about the Inglese et al. data set can be obtained as follows:
View(inglese) ## View the entire dataset edit(inglese) ## Edit the dataset head(inglese) ## View the top few rows of the dataset str(inglese) ## Get information on the structure of the dataset summary(inglese) ## Get a summary of variables in the dataset names(inglese) ## Get the variable names of the dataset
See help("normSights")
, help("statSights")
, help("plotSights")
, and the help pages of individual methods for more information.
ls("package:sights") ## Lists all the functions and datasets available in the package lsf.str("package:sights") ## Lists all the functions and their usage args(plotSights) ## View the usage of a specific function example(topic = plotSights, package = "sights") ## View examples of a specific function
Normalization -
All normalization functions are accessible either via normSights()
or their individual function names (e.g. normSPAWN()
).
Statistical tests -
All statistical testing functions are accessible either via statSights()
or their individual function names (e.g. statRVM()
).
Plots -
All plotting functions are accessible either via plotSights()
or their individual function names (e.g. plotAutoco()
).
The results of these functions can be saved as objects and called by their assigned names. For example:
``` {r example, fig.show='hide', message=FALSE, warning=FALSE} library(sights) data("inglese")
spawn_results <- normSPAWN(dataMatrix = inglese, plateRows = 32, plateCols = 40, dataRows = NULL, dataCols = 3:44, trimFactor = 0.2, wellCorrection = TRUE, biasMatrix = NULL, biasCols = 1:18)
spawn_results <- normSights(normMethod = "SPAWN", dataMatrix = inglese, plateRows = 32, plateCols = 40, dataRows = NULL, dataCols = 3:44, trimFactor = 0.2, wellCorrection = TRUE, biasMatrix = NULL, biasCols = 1:18)
summary(spawn_results)
rvm_results <- statRVM(normMatrix = spawn_results, repIndex = rep(1:3, each = 3), normRows = NULL, normCols = 1:9, testSide = "two.sided")
rvm_results <- statSights(statMethod = "RVM", normMatrix = spawn_results, repIndex = c(1,1,1,2,2,2,3,3,3), normRows = NULL, normCols = 1:9, ctrlMethod = NULL, testSide = "two.sided")
head(rvm_results)
autoco_results <- plotAutoco(plotMatrix = spawn_results, plateRows = 32, plateCols = 40, plotRows = NULL, plotCols = 1:9, plotName = "SPAWN_Inglese", plotSep = TRUE)
autoco_results <- plotSights(plotMethod = "Autoco", plotMatrix = spawn_results, plateRows = 32, plateCols = 40, plotRows = NULL, plotCols = c(1,2,3,4,5,6,7,8,9), plotName = "SPAWN_Inglese", plotSep = TRUE)
autoco_results autoco_results[[1]]
# Navigating through SIGHTS We recommend the following workflow: - Visualize the raw data to identify bias, if any: Types of bias | Expectation, in absence of bias | Identification, in presence of bias ---------- | ------------------------ | ----------------------- Plate bias | Replicate plates have similar overall distributions. | Boxplots show different medians and/or variability among replicate plates. Within-plate spatial bias | Data within a plate is not affected by well position. | Heatmaps and 3-D plots show row and/or column effects. Auto-correlation plots show non-zero correlations at various lags; typical patterns include cyclical and/or decreasing correlation values. Across-plate bias | Assuming few true 'hits' within the screen, the majority of data points should be uncorrelated across replicate plates. Only the hits should be correlated. | Scatter plots of replicate plates show strong correlation. - Try different normalization methods and visualize the results, comparing them to raw data - Normalize the raw data using the method that best minimizes bias - Conduct statistical tests on the normalized data and visualize the p-value distribution - Apply FDR correction We will use the Inglese *et. al.* dataset [@inglese2006quantitative] to demonstrate application of SIGHTS and interpretation of results. ```r library("sights") data("inglese")
We will analyze data from two of the concentrations separately:
Lowest Concentration (Three Replicate Plates: Exp1R1-Exp1R3)
For these lowest concentration plates, the concentration is so low that
even active molecules (as determined by a titration series) do not show activity. We use these data to show what normalized null data should look like.
9^th^ Concentration (Three Replicate Plates: Exp9R1-Exp9R3)
For these higher concentration plates, some compounds show activity levels. We use these data to illustrate what data for a typical experiment might look like.
SIGHTS has three graphical methods for visually detecting spatial bias within plates: standard "Heatmap" and "3d" plots, and autocorrelation plots ("Autoco") [@murie2015improving].
sights::plotSights(plotMethod = "Box", plotMatrix = inglese, plotCols = 3:5, plotName = "Raw Exp1")
sights::plotSights(plotMethod = "Heatmap", plotMatrix = inglese, plateRows = 32, plateCols = 40, plotRows = NULL, plotCols = 3, plotName = "Raw Exp1") sights::plotSights(plotMethod = "3d", plotMatrix = inglese, plateRows = 32, plateCols = 40, plotRows = NULL, plotCols = 3, plotName = "Raw Exp1") sights::plotSights(plotMethod = "Autoco", plotMatrix = inglese, plateRows = 32, plateCols = 40, plotRows = NULL, plotCols = 3:5, plotSep = FALSE, plotName = "Raw Exp1")
Heatmap, 3d plot, and auto-correlation plots show that there is non-trivial within-plate spatial bias, strongly indicated by the high auto-correlations and the cyclical patterns in the auto-correlation plots.
sights::plotSights(plotMethod = "Scatter", plotMatrix = inglese, repIndex = c(1,1), plotRows = NULL, plotCols = 3:4, plotName = "Raw Exp1", alpha=0.2)
The preferred normalization method is usually the one that minimizes all three types of bias: plate bias, within-plate spatial bias, and across-plate bias.
Some of the available methods within
sights
are demonstrated below for illustration purposes. Normally, you may wish to examine numerous methods to see which one is preferred for your dataset.
Z.norm.inglese.01 <- sights::normSights(normMethod = "Z", dataMatrix = inglese, dataRows = NULL, dataCols = 3:5) sights::plotSights(plotMethod = "Box", plotMatrix = Z.norm.inglese.01, plotCols = 1:3, plotName = "Z Exp1")
sights::plotSights(plotMethod = "Heatmap", plotMatrix = Z.norm.inglese.01, plateRows = 32, plateCols = 40, plotRows = NULL, plotCols = 1, plotName = "Z Exp1") sights::plotSights(plotMethod = "3d", plotMatrix = Z.norm.inglese.01, plateRows = 32, plateCols = 40, plotRows = NULL, plotCols = 1, plotName = "Z Exp1") sights::plotSights(plotMethod = "Autoco", plotMatrix = Z.norm.inglese.01, plateRows = 32, plateCols = 40, plotRows = NULL, plotCols = 1:3, plotSep = FALSE, plotName = "Z Exp1")
Heatmap, 3d plot, and auto-correlation plots show that within-plate spatial bias is unchanged from the raw data, because Z-scores are simply a linear transformation of the raw data.
sights::plotSights(plotMethod = "Scatter", plotMatrix = Z.norm.inglese.01, repIndex = c(1,1), plotRows = NULL, plotCols = 1:2, plotName = "Z Exp1", alpha=0.2)
SPAWN.norm.inglese.01 <- sights::normSights(normMethod = "SPAWN", dataMatrix = inglese, plateRows = 32, plateCols = 40, dataRows = NULL, dataCols = 3:44, trimFactor = 0.2, wellCorrection = TRUE, biasCols = 1:18) sights::plotSights(plotMethod = "Box", plotMatrix = SPAWN.norm.inglese.01, plotCols = 1:3, plotName = "SPAWN Exp1")
sights::plotSights(plotMethod = "Heatmap", plotMatrix = SPAWN.norm.inglese.01, plateRows = 32, plateCols = 40, plotRows = NULL, plotCols = 1, plotName = "SPAWN Exp1") sights::plotSights(plotMethod = "3d", plotMatrix = SPAWN.norm.inglese.01, plateRows = 32, plateCols = 40, plotRows = NULL, plotCols = 1, plotName = "SPAWN Exp1") sights::plotSights(plotMethod = "Autoco", plotMatrix = SPAWN.norm.inglese.01, plateRows = 32, plateCols = 40, plotRows = NULL, plotCols = 1:3, plotSep = FALSE, plotName = "SPAWN Exp1")
Heatmap, 3d plot, and auto-correlation plots show that SPAWN has removed within-plate spatial bias, as indicated by the near zero correlations at each lag.
sights::plotSights(plotMethod = "Scatter", plotMatrix = SPAWN.norm.inglese.01, repIndex = c(1,1), plotRows = NULL, plotCols = 1:2, plotName = "SPAWN Exp1", alpha=0.2)
sights::plotSights(plotMethod = "Box", plotMatrix = inglese, plotCols = 27:29, plotName = "Raw Exp9")
sights::plotSights(plotMethod = "Heatmap", plotMatrix = inglese, plateRows = 32, plateCols = 40, plotRows = NULL, plotCols = 27, plotName = "Raw Exp9") sights::plotSights(plotMethod = "3d", plotMatrix = inglese, plateRows = 32, plateCols = 40, plotRows = NULL, plotCols = 27, plotName = "Raw Exp9") sights::plotSights(plotMethod = "Autoco", plotMatrix = inglese, plateRows = 32, plateCols = 40, plotRows = NULL, plotCols = 27:29, plotSep = FALSE, plotName = "Raw Exp9")
Heatmap, 3d plot, and auto-correlation plots show that there is non-trivial within-plate spatial bias, strongly indicated by the high auto-correlations and the cyclical patterns in the auto-correlation plots.
sights::plotSights(plotMethod = "Scatter", plotMatrix = inglese, repIndex = c(1,1), plotRows = NULL, plotCols = 27:28, plotName = "Raw Exp9", alpha=0.2)
Z-score normalization
Because Z-score normalization does not correct spatial bias, it is not recommended for these data. We demonstrate SPAWN, one of the recommended normalization methods, which has been shown to perform the best among the available methods for these data. See @murie2015improving for the comparisons and for additional analyses which examined the various normalization methods separately for active and inactive molecules.
SPAWN normalization
SPAWN scores correct all three types of bias in these data.
SPAWN.norm.inglese.09 <- sights::normSights(normMethod = "SPAWN", dataMatrix = inglese, plateRows = 32, plateCols = 40, dataRows = NULL, dataCols = 3:44, trimFactor = 0.2, wellCorrection = TRUE, biasCols = 1:18)[,25:27] sights::plotSights(plotMethod = "Box", plotMatrix = SPAWN.norm.inglese.09, plotCols = 1:3, plotName = "SPAWN Exp9")
sights::plotSights(plotMethod = "Heatmap", plotMatrix = SPAWN.norm.inglese.09, plateRows = 32, plateCols = 40, plotRows = NULL, plotCols = 1, plotName = "SPAWN Exp9") sights::plotSights(plotMethod = "3d", plotMatrix = SPAWN.norm.inglese.09, plateRows = 32, plateCols = 40, plotRows = NULL, plotCols = 1, plotName = "SPAWN Exp9") sights::plotSights(plotMethod = "Autoco", plotMatrix = SPAWN.norm.inglese.09, plateRows = 32, plateCols = 40, plotRows = NULL, plotCols = 1:3, plotSep = FALSE, plotName = "SPAWN Exp9")
Heatmap, 3d plot, and auto-correlation plots show that SPAWN has removed within-plate spatial bias, as indicated by the near zero correlations at each lag.
sights::plotSights(plotMethod = "Scatter", plotMatrix = SPAWN.norm.inglese.09, repIndex = c(1,1), plotRows = NULL, plotCols = 1:2, plotName = "SPAWN Exp9", alpha=0.2)
Both the standard one-sample t-test and the Random Variance Model (RVM) one-sample t-test [@wright2003random; @malo2006statistical] are available. Because the standard t-test tends to perform poorly with few replicates, the RVM test is generally recommended [@murie2009comparison; @murie2015improving].
False Discovery Rate (FDR) methods correct for multiple testing. The method available in SIGHTS is Storey's q-value method [@storey2002direct]. Please see the qvalue package [@dabney2010qvalue] documentation for more information on FDR correction.
SPAWN.norm.inglese.09.t <- sights::statSights(statMethod = "T", normMatrix = SPAWN.norm.inglese.09, repIndex = c(1,1,1), normRows = NULL, ctrlMethod = NULL, testSide = "two.sided") summary(SPAWN.norm.inglese.09.t) ## The 5th column in the result matrix has the p-values, and thus, it will be selected for histogram below. sights::plotSights(plotMethod = "Hist", plotMatrix = SPAWN.norm.inglese.09.t, plotRows = NULL, plotCols = 5, plotAll = FALSE, plotSep = TRUE, colNames = "Exp9", plotName = "t-test")
With FDR: When corrected for multiple testing, "q-values" are generated. Often, in screening contexts, q-values of 0.20 or smaller might be appropriate for follow-up.
SPAWN.norm.inglese.09.t.fdr <- sights::statFDR(SPAWN.norm.inglese.09.t, ctrlMethod = "smoother") summary(SPAWN.norm.inglese.09.t.fdr)
SPAWN.norm.inglese.09.rvm <- sights::statSights(statMethod = "RVM", normMatrix = SPAWN.norm.inglese.09, repIndex = c(1,1,1), normRows = NULL, ctrlMethod = NULL, testSide = "two.sided") summary(SPAWN.norm.inglese.09.rvm) sights::plotSights(plotMethod = "IGFit", plotMatrix = SPAWN.norm.inglese.09, repIndex = c(1,1,1))
Without FDR:
sights::plotSights(plotMethod = "Hist", plotMatrix = SPAWN.norm.inglese.09.rvm, plotRows = NULL, plotCols = 5, colNames = "Exp9", plotName = "RVM test")
With FDR: When corrected for multiple testing, "q-values" are generated. Often, in screening contexts, q-values of 0.20 or smaller might be appropriate for follow-up.
SPAWN.norm.inglese.09.rvm.fdr <- sights::statFDR(SPAWN.norm.inglese.09.rvm, ctrlMethod = "smoother") summary(SPAWN.norm.inglese.09.rvm.fdr)
All SIGHTS plotting functions, which use the ggplot2 package [@wickham2009ggplot2] (i.e., all except plot3d
that uses lattice graphics), have an ellipsis argument ("...") which passes on additional parameters to the specific ggplot geom being used in that function. For example, the default plot title and the bar colors of the histogram can be modified as follows:
sights::plotHist(plotMatrix = SPAWN.norm.inglese.09.rvm, plotCols = 5, plotAll = TRUE, binwidth = 0.02, fill = 'pink', color = 'black', plotName = "RVM test Exp9")
All SIGHTS plotting functions, which use ggplot, produce ggplot objects that can be modified.
Other packages which provide more plotting options can be installed as well: ggthemes [@arnold2015package], gridExtra [@auguie2015package].
install.packages("ggthemes") ## This installs the ggthemes package, which has various themes that can be used with ggplot objects. library("ggthemes") install.packages("gridExtra") ## This installs the gridExtra package, which enables arrangement of plot objects. library("gridExtra")
library("ggthemes") library("gridExtra")
Below are some examples of the plotting modifications that can be achieved using ggplot2/ggthemes/gridExtra [@wickham2009ggplot2, @arnold2015package, @auguie2015package] functions:
b <- sights::plotBox(plotMatrix = inglese, plotCols = 33:35) b + ggplot2::geom_boxplot(fill = c('rosybrown', 'pink', 'thistle')) + ggthemes::theme_igray() + ggplot2::labs(x = "Sample_11 Replicates", y = "Raw Values")
Note: When plotSep = TRUE, a list of plot objects is produced, which can be called individually and modified, as in the example below.
s <- sights::plotScatter(plotMatrix = SPAWN.norm.inglese.09, repIndex = c(1,1,1)) s[[2]] + ggplot2::labs(title = "Original Scatter Plot") s[[2]] + ggplot2::lims(x = c(-5,5), y = c(-5,5)) + ggplot2::labs(title = "Constrained Scatter Plot") s[[2]] + ggplot2::coord_cartesian(xlim = c(-5,5), ylim = c(-5,5)) + ggplot2::labs(title = "Zoomed-in Scatter Plot")
box <- sights::plotSights(plotMethod = "Box", plotMatrix = SPAWN.norm.inglese.09, plotCols = 1:3) + ggplot2::theme(plot.title = ggplot2::element_text(size = 12)) autoco <- sights::plotSights(plotMethod = "Autoco", plotMatrix = SPAWN.norm.inglese.09, plateRows = 32, plateCols = 40, plotRows = NULL, plotCols = 1:3, plotSep = FALSE) + ggplot2::theme(plot.title = ggplot2::element_text(size = 12)) scatter <- sights::plotSights(plotMethod = "Scatter", plotMatrix = SPAWN.norm.inglese.09, repIndex = c(1,1,1), plotRows = NULL, plotCols = 1:3) sc1 <- scatter[[1]] + ggplot2::theme(plot.title = ggplot2::element_text(size = 12)) sc2 <- scatter[[2]] + ggplot2::theme(plot.title = ggplot2::element_text(size = 12)) sc3 <- scatter[[3]] + ggplot2::theme(plot.title = ggplot2::element_text(size = 12)) sc <- gridExtra::grid.arrange(sc1, sc2, sc3, ncol = 3) ab <- gridExtra::grid.arrange(box, autoco, ncol = 2)
{r biases, fig.cap="Arrangement: Multiple plots can be custom-arranged in one window by using gridExtra package [@auguie2015package].", fig.height = 7, fig.width = 7}
gridExtra::grid.arrange(ab, sc, nrow = 2, top = "SPAWN Normalized Exp9")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.