Contact:
David Quigley
david.quigley@ucsf.edu
University of California, San Francisco
Helen Diller Family Comprehensive Cancer Center
Department of Epidemiology and Biostatistics
The HTDoseResponseCurve library makes it straightforward to analyze time course drug exposure experiments performed by high-throughput analysis of cells in multiwell plates.
HTDoseResponseCurve does not perform image analysis.
HTDoseResponseCurve fits curves using the drc library, available on CRAN.
The minimal information required for a Dose Response Curve is a set of observations with labels for sample_type, treatment, concentration, and value. A file called sample_data_1.txt, with the appropriate format is included with this package. The example data have control (vehicle) observations labeled "DMSO".
The following commands are explained in detail in the next section, "Loading data without plate information".
library(knitr, quietly = TRUE)
library(HTDoseResponseCurve) library(drc, quietly = TRUE) # 1) Load from file with columns sample_type, treatment, concentration, value # 2) Fit dose response curve, estimate upper/lower asymptote, IC50, and slope # 3) Plot dose response curve fn = system.file("extdata", "sample_data_1.txt", package="HTDoseResponseCurve") ds = read_dataset( fn, negative_control = "DMSO" ) fit_1 = fit_DRC( ds, sample_types=c("line1", "line2"), treatments="drug", fct=drc::LL.4() ) plot( fit_1, xlab="nM drug", ylab="surviving fraction", xlim=c(0, 1e5), ylim=c(0, 1.1) )
A dose response curve models the response of an organism to varying concentrations of a treatment. A typical example is the survival of cells in a dish after being dosed with increasing concentrations of a toxic drug. The response of living things to drugs is typically non-linear, and frequently takes a sigmoidal (S-shaped) shape. This curve can be modeled by fitting a non-linear model that has parameters for the upper and lower asymptotes, the slope of the middle of the curve, and the point at which the curve reaches the mid-way point between the upper and lower asymptotes. This midway point has various names with slightly different meanings, depending on the specific application, including the $IC_{50}$, $EC_{50}$, or $LD_{50}$.
The dose response curve for a drug effect E can be defined by the equation:
$E = \frac{ E_{max} ( \frac{D}{ IC_{50} })^m } {1 + (\frac{D}{IC_{50}})^m}$
where D is a particular drug dose, m is the slope of the curve, $E_{max}$ is the maximum possible drug effect, $IC_{50}$ is the value of D that reduces E to 50% of the distance between $E_{max}$ and $E_{min}$.
In the equation above, $E_{min}$ is assumed to be zero. If $E_{min}$ is not zero, we can model it explicity. For comparison, the four-parameter DRC equation used by the drc library is:
$E = c + \frac{d-c}{1+\exp(b(\log(x)-\log(e)))}$
where b is the slope (m), c is $E_{min}$, d is $E_{max}$, and e is the $IC_{50}$.
A primary motivation for development of this package was to make it easier to process high-throughput experiments run in multiwell plates, where the same plate would be observed repeatedly over time. Every observation in HTDoseResponseCurve therefore has a timepoint value associated with it; this value is conventionally called "hour" but could be days, minutes, or any other unit of time starting at 0. If you don't specify the hour, this time point is set automatically to zero. The examples in this section don't specify an hour.
Data are first loaded and then optionally normalized against a negative control (also called a vehicle control, named after the vehicle substance in which a drug is dissolved to put it into solution). Many times it is desirable to reduce plate-specific or cell-line-specific effects by normalization of data against a control well that contains no perturbation other than the treatment vehicle. Normalization transforms each value into a percentage of the negative control. Normalized values can be greater than one. You can specify the negative control in four ways:
If you specify the existence of negative controls, HTDoseResponseCurve will expect to find them for all of the sample types and treatments on the plate.
If your data are already in R, you can create a dataset without loading files by the
create_dataset() function. The following example experiment tested the effect
of four concentrations of a drug (200, 500, 1000, and 2000 nM) on the viability
of two cell lines, normalized against measurements labeled "DMSO".
sample_types = rep( c(rep("line1",3), rep("line2",3)), 5) treatments = c(rep("DMSO",6), rep("drug",24)) concentrations = c( rep(0,6),rep(200,6), rep(500,6),rep(1000,6),rep(5000,6)) values=c(100,99,100,90,91,92,99,97,99,89,87,88, 86,89,88,56,59,58,66,65,67,25,23,24,42,43,46,4,5,9) ds = create_dataset( sample_types = sample_types, treatments = treatments, concentrations = concentrations, values = values, negative_control = "DMSO")
If your experiment data are in a text file, HTDoseResponseCurve can read
directly from that file with the read_dataset() function. Each row should
contain a single observation, e.g. a single well in a multi-well plate. The
text file must have the following column headers in the first row:
sample_type
treatment
concentration
value
These headers may be in any order, and additional columns may be present. If your text file has the optional column hour, data from this column will be loaded into the dataset.
The following example experiment tested the effect of four concentrations of a drug (200, 500, 1000, and 2000 nM) on the viabilitity of two cell lines, normalized against measurements labeled "DMSO".
fn = system.file("extdata", "sample_data_1.txt", package="HTDoseResponseCurve") ds = read_dataset( fn, negative_control = "DMSO", sep = "\t" )
The original measurements are stored unchanged in a column called value. Normalized measurements are stored in a column called value_normalized.
head(ds)
The fit_DRC() function attempts to fit a dose response curve. The fit_DRC() function returns a HT_fit object. You can view essential information about the fit that was obtained using the R core function summary():
fit_1 = fit_DRC( ds, sample_types=c("line1", "line2"), treatments="drug", fct=drc::LL.4() ) summary( fit_1 )
The summary() function returns the coefficients estimated by the fit. If there is more than one unique condition in the fit, summary() reports the result of an ANOVA test comparing a model lacking parameters for treatment or sample type to a model with treatment/sample type as a covariate. The F-statistic and p-value values reported in the summary of fit_1 are derived from the ANOVA. Extremely low P values may be rounded to zero by the summary() function.
The summary also reports an estimated Area Under the Curve (AUC), an estimated $EC_{50}$ (Effective Concentration, an estimate of the concentration of treatment that reduces the measured value to halfway between the highest and lowest amounts), an estimated $SF_{50}$ (Surviving Fraction, an estimate of the concentration of treatment that reduces the measured value to 50%), and the fitted values at the lowest and highest concentrations of drug.
If the data are not at all sigmoidal, the fit might not converge. This will be reported in the summary.
To plot a dose response curve, pass the HT_fit object generated by the fit_DRC() function to the plot() function.
fit_1 = fit_DRC( ds, fct=drc::LL.4() ) plot( fit_1 )
Note that in this summary for line 1 in the previous section, the SF50 value for line 1 is 1919.97 while the EC50 value is 855.24. This curve, where line 1 is plotted in black, should make clear the distinction between the SF50 and EC50: the EC50 is the concentration required to lower the fitted value to the 50th percentile between the upper and lower asymptote, while the SF50 is the concentration required to lower the fitted value to 0.5, or 50%. Since the lowest value on the black curve is only slightly lower than 50%, its SF50 is much larger than its EC50.
The plot() function understands the parameters for R's built-in plot() function, so it is straightforward to alter its appearance:
plot(fit_1, xlim=c(0, 1e5), main="Drug dose response curve", xlab="concentration nM", ylab="surviving fraction", lwd=3, col=c("black", "cornflowerblue") ) legend( 2, 0.4, c("Cell Line 1", "Cell Line 2"), fill=c("black", "cornflowerblue"), bty="n")
The fit_DRC() function fits and plots curves using non-linear curve fitting methods exposed by functions in the drc package. It is important to understand that this model has several coefficients, and it is possible to set some of these coefficients to fixed values rather than estimating them from the data. Estimating values is neither always correct nor always wrong; it requires judgement to decide how to fit the curve. Christian Ritz, Florent Baty, Jens C. Streibig, and Daniel Gerhard have published a very useful guide that provides a gentle introduction to this topic: Dose-Response Analysis Using R Ritz PLoS ONE 2015. Reading that well-written tutorial is strongly recommended for anyone who will use either the drc or the HTDoseResponseCurve package.
As a reminder, the four parameter model is:
$E = c + \frac{d-c}{1+\exp(b(\log(x)-\log(e)))}$
where b is the slope (m), c is $E_{min}$, d is $E_{max}$, and e is the $IC_{50}$.
To estimate the dose response slope, upper asymptote, lower asymptote, and $EC_{50}$, pass drc::LL.4() to the fct parameter of fit_DRC(). To fix the lower asymptote to zero, pass drc::LL.3() into fit_DRC(). To fix the lower and upper asymptotes to zero and 1 respectively, pass drc::LL.2().
If you wish to display the $EC_{50}$ estimates on the curve with dotted vertical lines, pass show_EC50=TRUE. The $EC_{50}$ concentrations may be different from the 50% surviving Fraction concentration ($SF_{50}$) if the lower asymptote is not zero.
# estimate EC50, slope, upper asymptote, lower asymptote fit_1 = fit_DRC(ds, sample_types = c("line1", "line2"), treatments = "drug", fct=LL.4()) # estimate EC50, slope, upper asymptote. Lower asymptote set to 0 fit_1_lower0 = fit_DRC(ds, sample_types = c("line1", "line2"), treatments = "drug", fct=LL.3()) # estimate EC50, slope. Upper, lower asymptotes set to 1, 0. fit_1_both = fit_DRC(ds, sample_types = c("line1", "line2"), treatments = "drug", fct=LL.2())
The fit_DRC() function will sometimes report warnings or errors generated by the drc::drm() function during curve fitting. Errors indicate that the model was not able to converge to a fit. This can occur if the data do not have a sufficiently sigmoidal shape.
We can plot curves fitted with the drc functions LL.4(), LL.3(), and LL.2() to see the results of different choices for which parameters to estimate:
layout(matrix(1:3,1,3)) par(mar=c(3,3,3,1)) plot( fit_1, main="LL.4()", xlim=c(0, 1e6), show_EC50=TRUE ) plot( fit_1_lower0, main="LL.3(), lower=0", xlim=c(0, 1e6), show_EC50=TRUE ) plot( fit_1_both, main="LL.2(), upper=1, lower=0", xlim=c(0, 1e6), show_EC50=TRUE )
Note that in the LL.3() and LL.2() calls, the curves asymptotically approach zero because we specified that the lower asympotote should be zero. Theose two plots have higher $EC_{50}$ estimates than LL.4() on the left, because the $EC_{50}$ is measured relative to the upper and lower asymptotes for the curve, and the curves have different lower asymptotes.
We can emphasize the difference between the EC50 and SF50 for this curve fitting by plotting both values for the same curve:
layout(matrix(1:3,1,3)) par(mar=c(3,3,3,1)) plot( fit_1, main="LL.4(), SF50", xlim=c(0, 1e6), show_SF50=TRUE ) abline(0.5, 0) plot( fit_1, main="LL.4(), EC50", xlim=c(0, 1e6), show_EC50=TRUE ) abline(0.5, 0)
Note that drug one (black line) reaches its EC50 at a lower concentration than its SF50.
You can fit and plot more than one treatment or sample type if there are multiple values in your data set by passing them to the treatments or sample_type parameters as a vector.
HTDoseResponseCurve can also read data from 6-, 12-, 24-, 96-, and 384-well plates described using a Microsoft Excel spreadsheet. The contents of each well and the treatments applied to those wells are described in a plate map. The minimal elements required to describe the cells in a well are sample type, treatment, and concentration. The sample type indicates what kind of cell is in the well (e.g., MCF7, PC3, WT). The treatment indicates the perturbation on those cells, often a drug (e.g. Olaparib, DMSO, si-TP53, Vehicle). The concentration indicates the concentration of the treatment applied to the well.
Although you may build a plate map manually by combining individual matrixes into a list object, HTDoseResponseCurve contains functions to load an externally defined plate map either from an XML file or a Microsoft Excel file. HTDoseResponseCurve can natively consume the XML-formatted files platemap file exported by the Incucyte Cell Analysis System. This instrument resides in an incubator and captures microscopic images of either white light or fluorescent light.
It is usually easier to write out a plate map in a text file or Excel spreadsheet, but you can also create and manipulate plate maps directly in R. The plate map is stored as a list of data frames. This example will demonstrate a 96 well plate.
First, we'll create an empty plate map with create_empty_plate_map():
plate_map = create_empty_plate_map( number_of_wells = 96 )
The object returned by this call is a list of five 96-element matrixes named treatment, concentration, sample_type, density, and passage. The density and passage matrixes are optional; the treatment, concentration, and sample_type values must be populated with relevant information.
Next, we'll populate the plate map with information about the treatment, drug concentration, and cell lines in each well where there was a measurement. These commands simulate two cell lines tested with a single drug at four concentrations (200, 500, 1000, and 2000 nM), including a negative vehicle control of DMSO. Data will be placed into wells C through G, columns 2 through 7.
plate_map$treatment[3,2:7] = "DMSO" plate_map$treatment[4:7,2:7] = "drug" plate_map$concentration[3,2:7] = 0 plate_map$concentration[4,2:7] = 200 plate_map$concentration[5,2:7] = 500 plate_map$concentration[6,2:7] = 1000 plate_map$concentration[7,2:7] = 2000 plate_map$sample_type[3:7, 2:4] = "line1" plate_map$sample_type[3:7, 5:7] = "line2"
Now that we have specified a plate map, we'll specify the values we observed in those wells. We'll use the create_empty_plate() function to create an empty data plate object and manually populate that object with simulated measurements collected in triplicate:
plate_1 = create_empty_plate( number_of_wells=96, plate_id="test" ) plate_1[3,2:7] = c(100,99,100, 90, 91, 92) plate_1[4,2:7] = c(99, 97, 99, 89, 87, 88) plate_1[5,2:7] = c(86, 89, 88, 56, 59, 58) plate_1[6,2:7] = c(66, 65, 67, 25, 23, 24) plate_1[7,2:7] = c(42, 43, 46, 4, 5, 9)
The object returned by this plate is a data frame. We can confirm the plate looks the way we expect it to by plotting the raw data on a plate with the plot_values_by_plate() function. The darker red the well, the higher the value in that well.
plot_values_by_plate(plate_1)
It is common to re-use the same plate map information for data from many timepoints. To analyze the data we will combine the measurements (in the plate variable) and metadata (in the plate_map variable) into a single dataset.
When combining the measurements and metadata, you must specify which wells (if any) correspond to negative controls, also known as vehicle-treated wells. Negative controls are described above in the section Specifying how to normalize data; please review that section if you skipped it.
In this example, there are r sum(plate_map$treatment=="DMSO")
negative control wells with the treatment label "DMSO".
ds = combine_data_and_map( plate_1, plate_map, negative_control = "DMSO" )
The data frame returned by combine_data_and_map() is a dataset combining the well metadata and the measured values.
We can extract sample types or treatments from the dataset object using get_sample_types() and get_treatments()
get_sample_types(ds) get_treatments(ds)
The dataset object returned by combine_data_and_map is the same regardless of whether it was created by the combine_data_and_map() function or by a call to create_dataset() as described in the previous section.
If you are using the Incucyte system to generate data, you can export plate readings to Microsoft Excel and then read them in using the function read_plates_from_Incucyte_export().
If you have exported a plate map from the Incucyte software, it can be read into HTDoseResponseCurve using the read_platemap_from_Incucyte_XML() function. You can also load a plate map from an Excel file using the read_platemap_from_excel() function, or you can construct a plate map manually in R as was shown above.
As described above, the steps are
pkg = "HTDoseResponseCurve" fn_map_XML = system.file("extdata", "sample_data_384_platemap.txt",package=pkg) fn_data_Excel = system.file("extdata", "sample_data_384.xlsx", package = pkg) plate_map = read_platemap_from_Incucyte_XML( path_to_file = fn_map_XML ) plate_data = read_plates_from_Incucyte_export( path_to_file = fn_data_Excel, plate_id = "p1", number_of_wells=384) plate_data$hours = round(plate_data$hours) ds = combine_data_and_map( raw_plate = plate_data, plate_map = plate_map, negative_control = "DMSO" )
Example: sample data from an Incycyte experiment
This experiment tested the effect of r length(get_treatments(ds[!ds$is_negative_control,]))
different drugs on two cell lines, imaged every 12 hours over five days.
In this example, there are r sum(!is.na(plate_map$treatment) & plate_map$treatment=="DMSO")
negative control (vehicle) wells on each plate with the treatment label "DMSO".
The expected concentration unit for these plots will be nanomoles of drug; since the experimental data for this example were recorded in micromoles, we will multiply the concentration by 1000 to get nanomoles.
To illustrate how the plate readings changed over time, the plot_values_by_plate() function is useful:
layout(matrix(1:4,2,2,byrow=TRUE)) par(mar=c(3,3,2,1)) plot_values_by_plate(plate = plate_data, hour = 48, main="48 hours") plot_values_by_plate(plate = plate_data, hour = 96, main="96 hours") plot_values_by_plate(plate = plate_data, hour = 192, main="192 hours") plot_values_by_plate(plate = plate_data, hour = 228, main="228 hours")
The data set we loaded in the previous section contains observations from more than one time point. We therefore have to specify the hour to plot when fitting and plotting a dose response curve:
fit_inc = fit_DRC( ds, sample_types=c("line_1","line_2"), treatments = "drug13", hour=168, fct=drc::LL.4() ) plot( fit_inc, xlim=c(0, 1e5), lwd=3, main=paste("Drug 13: 168 hours"), ylab="surviving fraction", xlab="nM")
If you forget to specify a particular hour in a dataset with multiple time points, the fit_DRC() function will generate a warning:
fit_inc = fit_DRC( ds, sample_types=c("line_1","line_2"), treatments = "drug13", fct=drc::LL.4() ) plot( fit_inc, xlim=c(0, 1e5), lwd=3, main=paste("Drug 13: all hours (don't do this)"), ylab="surviving fraction", xlab="nM")
To show the growth of untreated cells over time, we can plot the raw value at these timepoints.
par(mar=c(5,7,3,2)) x=plot_timecourse( ds, sample_types=c("line_1","line_2"), treatments="DMSO", concentrations=0, plot_raw = TRUE, main=paste("DMSO (vehicle)"), ylab="Confluence", ylim=c(0,100) ) legend(0, 100, c("L1 DMSO", "L2 DMSO"), pch=c(19,22), bty="n" )
The dose response curve for Line 1 is drawn with closed points, while line 2 is drawn with open squares. Line 1 (L1) grows faster than L2; by 180 hours, the DMSO-treated wells for that line have become confluent.
To examine un-normalized values for wells treated with a particular drug across the two cell lines, we can look at the raw confluence score at these timepoints.
x=plot_timecourse( ds, sample_types=c("line_1","line_2"), treatments=c("DMSO","drug13"), concentrations=c(0,5000), plot_raw=TRUE, main=paste("Drug 13"), ylab="Confluence", ylim=c(0,100) ) legend(0, 100, c("L1 DMSO","L2 DMSO", "L1 5000 nM", "L2 5000 nM"), pch=c(19,19,22,22), col=c("black", "red","black", "red"), bty="n" )
Here we can directly examine the un-normalized vehicle treated samples (black) and 5000 nM samples (red). Drug 13 only modestly affects cell line L1 but has a dramatic impact on cell line L2.
Note that the raw data for this plot and the color and "pch" value indicating what shape was plotted for each condition is returned in a data frame by plot_timecourse(). To isolate these values one can inspect the data frame manually or use the unique() function:
legend_vals = unique(x[,c("sample_type", "concentration", "color", "pch")] ) legend_vals
If we are curious about particular timepoints, we can fit and plot dose response curves individually. Here we plot dose response curves for drug 13 at four different time points:
layout(matrix(1:4,2,2,byrow=TRUE)) par(mar=c(4,4,3,1)) hours_to_plot=c(48, 96, 120, 168) for(h in 1:4){ plot( fit_DRC( ds, sample_types=c("line_1","line_2"), treatments = "drug13", hour=hours_to_plot[h], fct=drc::LL.3() ), xlim=c(0, 1e6), lwd=3, main=paste("Drug 13:", hours_to_plot[h], "hours"), ylab="surviving fraction", xlab="nM") }
It would be convenient to calculate fits for all of the treatments at all timepoints. The fit_statistics() and plot_fit_statistic() functions are useful for this purpose. The fit_statistics() function attempts to fit dose response curves at all time points for all treatments and then calculates summary values of the AUC and EC50. These values are returned as a data frame with one row for each unique condition. The plot_fit_statistic() function plots either EC50 or AUC across all of the timepoints in the fit statistics.
First, we'll use fit_statistics() calculate fits for four drugs. We could calculate fits for all drugs just as easily but it would take more processing time.
# select four drugs to test, to save time in curve fitting ds_test = ds[ds$treatment %in% c("DMSO", "drug01", "drug13", "drug20", "drug21"),] # calculate fit statistics at all time points fits = fit_statistics(ds_test, fct = drc::LL.4() )
We can summarize the results for a given drug in a table:
library(knitr) LINE1 = fits$treatment=="drug13" & fits$sample_type=="line_1" LINE2 = fits$treatment=="drug13" & fits$sample_type=="line_2" kable( data.frame( hour=fits$hour[LINE1], AUC_line1 = round(fits$AUC[LINE1], 2), AUC_line2 = round(fits$AUC[LINE2], 2), EC50_line1 = fits$coef_EC50[LINE1], EC50_line2 = fits$coef_EC50[LINE2], obs_min_line1 = round(fits$obs_min[LINE1], 2), obs_min_line2 = round(fits$obs_min[LINE1], 2), F_stat = round( fits$ANOVA_F_test[LINE1], 2), P_value = round( fits$ANOVA_P_value[LINE1], 4) ) )
We can summarize these results for a single treatment (drug 13) across all time points at once using the plot_treatment_summary() function. This function will plot dose response curves from up to four time points.
#pdf("/notebook/HTDoseResponseCurve_paper/reproduce/treatment_summary.pdf", # height=8, width=8) plot_treatment_summary(ds_test, fits, treatment = "drug13", times_DRC = c(48, 96, 120, 168)) #dev.off()
If we want a more detailed plot for AUC across all time points, we can use the function plot_fit_statistic(). Observations where the ANOVA_P_value exceeds an alpha-level cut-off set by the user are drawn as open circles and not connected by lines. Here the alpha-level cut-off is 0.0025, chosen to correct an alpha = 0.05 for 20 tests by simple Bonferroni adjustment. The choice of whether to set a P value threshold, and how that value might be selected, is entirely up to you.
ps = plot_fit_statistic( fits[fits$treatment=="drug13",], statistic = "AUC", alpha = 0.0025, ylim=c(0, 1.5), main="Area Under the Dose Response Curve, drug 13") legend(x=0, y=0.3, legend=unique(ps$sample_type), col=unique(ps$color), pch=19, bty="n" ) legend(x=36, y=0.3, legend=c("P>alpha", "P<=alpha"), pch=c(13,19), bty="n", col="darkgrey" )
This plot shows the AUC for line 1 exceeds that of line 2 at later time points, and that the difference between the dose response curves is significant at the P < 0.0025 level at and after 132 hours. In these plots, the open vs. closed circles reflects the $P_{ANOVA}$ value for the test for a difference in dose response curves between the two lines. Here a filled-in circle means $P_{ANOVA}$ <= 0.0025.
It is somewhat surprising that drug 13 also apparently shows a $P_{ANOVA}$ <= 0.0025 at 12 hours, so we can take a look at the dose response curve for that time point for more information:
plot( fit_DRC( ds, sample_types=c("line_1","line_2"), treatments = "drug13", hour=12, fct=drc::LL.3() ), xlim=c(0, 1e5), lwd=3, main=paste("Drug 13: 12 hours"), ylab="surviving fraction", xlab="nM", ylim=c(0, 1.5)) legend(x=2, y=0.4, legend=c("line 1", "line 2"), fill=c("black", "red"), bty="n" )
The fit converged here, and the values were significantly different from each other in the ANOVA test, but it's clear that there is not a sensible dose response effect and neither drug acheived a $SF_{50}$. We can screen out these values by passing the obs_min parameter value:
ps = plot_fit_statistic( fits[fits$treatment=="drug13",], statistic = "AUC", alpha = 0.0025, obs_min = 0.5, ylim=c(0,1.5), main="Area Under the Dose Response Curve, drug 13", sub="Requiring SF50 for at least one sample type") legend(x=0, y=0.3, legend=unique(ps$sample_type), fill=unique(ps$color), bty="n" ) legend(x=36, y=0.3, legend=c("P>alpha", "P<=alpha, min>obs_min", "P<=alpha, min<=min_obs"), pch=c(13,18,19), bty="n", col="darkgrey" )
When there are many drugs in a screen, it may be convenient to generate a summary value such as $\large{ log2( \frac{ AUC_{line1} }{ AUC_{line2} } ) }$ and plot a heat map view across time points. For this example, I will color in the heat map if P < 0.0008 for the ANOVA testing for a cell line effect.
I'll plot the values using plot_color_grid(), a general function for plotting any matrix of numbers:
# convert long data frame to individual matrixes fit_mat = fit_statistics_matrixes( fits ) # mark as NA values that exceed a P value threshold rel_AUC = convert_to_foldchange( fit_mat$line_1$AUC, fit_mat$line_2$AUC ) ALPHA_LEVEL = 0.008 invalid_alpha = is.na( fit_mat$line_1$ANOVA_P_value ) | fit_mat$line_1$ANOVA_P_value > ALPHA_LEVEL rel_AUC[ invalid_alpha ] = NA M=plot_color_grid( rel_AUC, color_bounds = c(-3,3) )
We customize the plot using parameters of plot_color_grid():
M=plot_color_grid( rel_AUC, color_bounds = c(-3,3), block.height = 5, block.width=4, space.X = 2, space.Y=2, cex.y = 1, cex.x = 0.75, color_palatte = c("#91cf60","#ffffbf","#fc8d59") )
Here we can see that drug 13 and drug 21 both had strong effects, with opposite direction, that became strongest after 108 or 144 hours respectively.
Reporting the same values as a table is straightforward:
library(knitr) kable( t(round( rel_AUC, 2)) )
The standard Dose Response Curve is calculated at a single time point. To compare the relative Area under the Curve for single drug concentration across multiple timepoints, you can call the timecourse_AUC_ratio() function. The AUC for a single sample type/treatment condition across the whole timecourse is calculated using raw data normalized by the vehicle control specified by the user when the dataset was created. AUC is calculated by connecting the normalized data points and then using the trapezoids method for AUC implemented in the trapz() function of the pracma library.
For each treatment, in each concentration, the timecourse_AUC_ratio() function will calculate:
$AUC_{sample1} = \frac{ AUC( sample1_{raw} ) }{ AUC_{vehicle} }$
$AUC_{sample2} = \frac{ AUC( sample2_{raw} ) }{ AUC_{vehicle} }$
$AUC_{ratio} = \frac{ AUC_{sample1} } {AUC_{sample2} }$
pkg = "HTDoseResponseCurve" fn_map_XML = system.file("extdata", "sample_data_384_platemap.txt",package=pkg) fn_data_Excel = system.file("extdata", "sample_data_384.xlsx", package = pkg) plate_map = read_platemap_from_Incucyte_XML( fn_map_XML ) plate_data = read_plates_from_Incucyte_export( fn_data_Excel, "p1", number_of_wells=384) plate_data$hours = round(plate_data$hours) ds = combine_data_and_map( plate_data, plate_map, negative_control = "DMSO" ) ds$hours=round(ds$hours) tc=timecourse_AUC_ratio(ds, treatments=c("drug13"), sample_types=c("line_1", "line_2"), concentrations=c(200, 1000, 5000), summary_method="mean") # create a data frame from individual list components kable( data.frame( sample_type = tc$sample_types, concentration = tc$concentrations, round( t(tc$AUC),2), row.names=1:length(tc$sample_types), stringsAsFactors=FALSE), row.names = FALSE)
The timecourse Area under the curve for drug 13 at 5000 nM in line_1 is more than twice the value for line_2, while the timecourse AUC for the same comparison at 200 nM is nearly identical in both lines. This suggests drug_13 is more selective at larger doses.
The timecourse AUC can be calculated quickly for a large number of treatments; here it is calculated for all 20 drugs in this screen, with the results plotted using boxplot_label_outliers(), a wrapper around boxplot() that draws labels on outlier values:
tc=timecourse_AUC_ratio(ds, treatments=unique(ds$treatment), sample_types=c("line_1", "line_2"), concentrations=c(200, 1000, 5000), summary_method="mean") tc_200=tc$AUC[,tc$concentrations==200 & tc$sample_types=="ratio"] tc_1000=tc$AUC[,tc$concentrations==1000 & tc$sample_types=="ratio"] tc_5000=tc$AUC[,tc$concentrations==5000 & tc$sample_types=="ratio"] layout(matrix(1:3,1,3)) b=boxplot_label_outliers(tc_200, ylim=c(0, 2.5), las=1, main="timecourse AUC, 200 nM", cex=2, pch=19) b=boxplot_label_outliers(tc_1000, ylim=c(0, 2.5), las=1, main="timecourse AUC, 1000 nM", cex=2, pch=19) b=boxplot_label_outliers(tc_5000, ylim=c(0, 2.5), las=1, main="timecourse AUC, 5000 nM", cex=2, pch=19)
It is often important to determine whether the combined activity of two different drugs exceeds that which would be expected from their individual application. Such "extra" activity is commonly called "synergy".
The null hypothesis for a synergy analysis is that the effect of combining two or more drugs is additive. Loewe additivity is a commonly used definition of additivity; it essentially says that the effect of two drugs is additive if their combined effect is equivalent to either of the drugs being combined with itself at the same final dosage.
To describe the effect and doses in each condition we can write:
$f_a$, $f_u$: fraction affected, unaffected (by drug)
$(D)1, (D)_2, (D){1,2}$: Dose of drug 1, drug 2, or mixture of drugs 1 and 2.
$(D_m)1, (D_m)_2, (D_m){1,2}$: $IC_{50}$ dose of drug 1, drug 2, mixture of drugs 1, 2.
$D_m$ (the $IC_{50}$) is related to relationship between the drug effect and dose by the equation:
$\frac{f_a}{f_u} = [ \frac{D}{D_m}]^m$
This equation says that, at a given dose D, the relationship between the dose and $D_m$ is proportional to the log of the relationship between the proportion of cells killed and the proportion not yet killed. This relationship can be plotted as a median effect plot, which is useful for quality control. The median effect plot shows the relationship between dose and relative effect on a log-log scale; it restates the dose response curve. The X axis is log( dose ) in uM. The Y axis is the relative proportion of affected samples:
$y = log(\frac{f_a}{f_u})$
$x = log(D)$
The slope m of the median effect plot is determined by the shape of the graph; slope = 1 means a hyperbolic DRC, while m > 1 is sigmoid (“higher order”). A strong linear fit of the median effect plot is associated with a well-behaved sineusoidal DRC.
The line fitting a median effect plot intersects the X axis where $y = log(\frac{f_a}{f_u}) = 0$; since the sum of $f_a$ and $f_u$ is 1, that means the line intersects the X axis when $f_a = f_u = 0.5$; that's in princple the $IC_{50}$. You can therefore-- in princple-- get the $IC_{50}$ for the dose response curve by taking the antilog of D when Y = 0.
The Combination Index (CI) is a measure of synergy. To calculate CI, one needs the $IC_{50}$ values for the individual drugs and a series of two-drug exposures that
The equation for CI is:
$CI = \frac{(D)_1}{(D_m)_1} + \frac{(D)_2}{(D_m)_2}$
Where $D_1$ and $D_2$ are the doses of drugs 1 and 2 when combined that together produce effect m, and $D_{m,1}$ and $D_{m,2}$ are the doses that produce effect m when given one at a time.
Looking for synergy, we can calculate the Combination Index (CI) at various levels of lethality. A CI value <1, =1, and >1 indicate synergism, additive effect, and antagonism.
Values less than one are evidence for synergy; values greater than one are evidence for antagonism.
J. Jack Lee and Maiying Kong published a method to calculate calculate confidence intervals for the interaction index (Lee and Kong Statistics in Biopharmaceutical Research 2012). Lee and Kong published R code that will perform this calculation, and I have incorporated this code into HTDoseResponseCurve; all credit goes to Lee and Kong.
The best approach to assessing drug synergy is not clear. The combination index has been subject to criticism from statisticians who advocate non-linear model fitting as superior to the CI method. For an introduction to the terminology and issues, the following references may be helpful:
Articles explaining CI methods and advocating their use:
Articles critical of CI methods
HTDoseResponseCurve can calculate and plot synergy values if you have followed a "ray" experimental design, the standard approach recommended for combination index experiments. In this design, one exposes the sample to
Importantly, the two drugs do not have to be given in the same concentration.
You can load the data from an experiment of this design into HTDoseResponseCurve by creating a dataset that includes two vectors of treatments and two vectors of concentrations. To calculate the Combination Index, you must specify the ratio at which each treatment is combined. The concentration value for the combination treatment will equal the the sum of each drug's individual concentrations.
Data published in Lee and Kong Statistics in Biopharmaceutical Research 2012 demonstrate the calculation of the interaction between two drugs called SCH66336 and 4-HPR. Their data are reproduced here, along with code that reproduces their Figure 4b.
This figure below plots the interaction index at varying effect strengths for the drug combination (solid dots were observed), with upper and lower 95% confidence intervals plotted with thin dots.
From these plots, Lee and Kong conclude that combinations that produce an effect less than about 0.5 are synergistic at a 95% confidence interval, while there is not evidence that combination concentrations that produce less of an effect are not additive. Note that the figure is plotted with the Y axis on the log scale.
dose_SCH=c(0.1, 0.5, 1, 2, 4) eff_SCH = c(0.6701, 0.6289, 0.5577, 0.4550, 0.3755) dose_4HPR = c(0.1, 0.5, 1, 2) eff_4HPR = c(0.7666, 0.5833, 0.5706, 0.4934) eff_comb = c(0.6539, 0.4919, 0.3551, 0.2341) syn = data.frame( treatment_1 = rep("SCH66336", 13), conc_1 = c( dose_SCH, rep(0, 4), dose_SCH[1:4]), treatment_2 = rep("4-HPR", 13), conc_2 = c( rep(0, 5), dose_4HPR, dose_4HPR ), values = c(eff_SCH, eff_4HPR, eff_comb ), stringsAsFactors=FALSE )
Raw data
kable(syn)
# Create dataset using create_synergy_dataset ds_lk = create_synergy_dataset( sample_types = rep("sample_1", 13), treatments_1 = syn$treatment_1, treatments_2 = syn$treatment_2, concentrations_1 = syn$conc_1, concentrations_2 = syn$conc_2, values = syn$values, negative_control = NA)
layout(matrix(1:2,1,2)) # Calculate median effect CS_lk = chou_synergy( ds_lk, sample_type = "sample_1", hour=0, treatment_1 = "SCH66336", treatment_2="4-HPR", fct=LL.2(), summary_method="mean" ) me=plot_synergy_median_effect( CS_lk, main="Median Effect" ) legend( -3, 2, c( "SCH66336", "4-HPR", "combined"), col=c("black", "red", "cornflowerblue"), pch=19, cex=0.75) # Calculate interaction index ii=plot_synergy_interaction_index(ds=ds_lk, sample_type = "sample_1", treatment_1 = "SCH66336", treatment_2="4-HPR", hour=0, xlab="Effect", ylab="Interaction Index" , main="Interaction Index") abline(log(1), 0) legend( 0, 10, c("interaction", "95% C.I."), lty=c(1, 2), col=c("black", "cornflowerblue"), cex=0.75)
If you prefer, you can plot the same values on the linear scale:
layout(matrix(1:2,1,2)) ii=plot_synergy_interaction_index(ds=ds_lk, sample_type = "sample_1", treatment_1 = "SCH66336", treatment_2="4-HPR", hour=0, log = "", xlab="Effect", ylab="Interaction Index", ylim=c(0,3), main="Interaction Index") abline(1, 0) legend( 0, 3, c("interaction", "95% C.I."), lty=c(1, 2), col=c("black", "cornflowerblue"), cex=0.75)
Bliss synergy is calculated using a different null model, one that is based on probabilistic independence. Bliss synergy is reported as deviation (difference) of the observed values for drug combinations from the values predicted by Bliss independence, which takes the form:
${Effect}_{null} = {Effect}_A + {Effect}_B - ({Effect}_A \times {Effect}_B)$
This result is sometimes called the "excess over Bliss independence".
In this example, two drugs are tested at four concentrations (including 0 nM, the vehicle condition). In one scenario, the dataset (DS_add), they have merely additive effects when combined. In an alternate scenario, using with different measured values (DS_syn), there is evidence for synergistic interactions when combined. The Excess over Bliss score is plotted as a heat map using the plot_color_grid() function.
samples = rep("cell_line_1", 16) t1 = rep( c("DMSO", "drug1", "drug1", "drug1"), 4) t2 = c( rep( "DMSO", 4), rep("drug2", 12) ) c1 = rep( c(0, 50, 100, 200), 4) c2 = c(0,0,0,0, 50,50,50,50, 100,100,100,100, 200,200,200,200) value_ind=c(1,0.8,0.7,0.6,0.8,0.7,0.6,0.5,0.7,0.6,0.5,0.4,0.6,0.5,0.4,0.3) value_syn=c(1,0.8,0.7,0.6,0.8,0.8,0.5,0.3,0.7,0.3,0.2,0.1,0.6,0.3,0.1,0.05) DS_add=create_synergy_dataset( sample_types=samples, treatments_1=t1, treatments_2=t2, concentrations=c1, concentrations_2=c2, values=value_ind, negative_control="DMSO") DS_syn=create_synergy_dataset( sample_types=samples, treatments_1=t1, treatments_2=t2, concentrations=c1, concentrations_2=c2, values=value_syn, negative_control="DMSO") b_add = synergy_bliss(DS_add, "drug1", "drug2", sample_type="cell_line_1") b_syn = synergy_bliss(DS_syn, "drug1", "drug2", sample_type="cell_line_1") layout(matrix(1:2,1,2)) par(mar=c(4,4,3,1)) gi = plot_color_grid(b_add$excess*100, color_bounds = c(-100,100), main="additive", xlab="nM drug 1", ylab="nM drug 2", plot_values = TRUE) gs = plot_color_grid(b_syn$excess*100, color_bounds = c(-100,100), main="synergistic", xlab="nM drug 1", ylab="nM drug 2", plot_values = TRUE)
To view excess over bliss synergy more clearly, it may be helpful to hold one drug constant and plot the effect of increasing the concentration of the second drug. Here we hold drug "drug2" constant at either 50, 100, or 200 nM and show individual dose responses as "drug1" is varied. The evidence for synergy is represented by the distance between the combined effect (solid circle) vs. the expected effects of the individual drugs by themselves.
layout(matrix(1:3,3,1)) par(mar=c(4,4,3,1)) concs=c(50, 100, 200) for( i in 1:3 ){ plot_additive_vs_synergy_effect( DS_syn, treatment_for_DRC="drug1", concs_for_DRC=c(50, 100, 200), treatment_subgroup="drug2", conc_subgroup=concs[i], hour=0, sample_type="cell_line_1", cex=2, xlab=paste("drug1 effect when drug2=",concs[i],"nM", sep="")) legend( 1, 0.95, c("drug1", "drug2", "both"), pch=c(1, 2, 19), bty="n", cex=1.5 ) box() }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.