dSplsda | R Documentation |
This function is used to compare groups of individuals from whom comparable
cytometry or other complex data has been generated. It is superior to just
running a Wilcoxon analysis in that it does not consider each cluster
individually, but instead uses a sparse partial least squares discriminant
analysis to first identify which vector thourgh the multidimensional data
cloud, created by the cluster-donor matrix, that optimally separates the
groups, and as it is a sparse algorithm, applies a penalty to exclude the
clusters that are orthogonal, or almost orthogonal to the discriminant
vector, i.e. that do not contribute to separating the groups. This is in
large a wrapper for the splsda
function from
the mixOmics package.
dSplsda(
xYData,
idsVector,
groupVector,
clusterVector,
displayVector,
testSampleRows,
paired = FALSE,
densContour = TRUE,
plotName = "default",
groupName1 = unique(groupVector)[1],
groupName2 = unique(groupVector)[2],
thresholdMisclassRate = 0.05,
title = FALSE,
plotDir = ".",
bandColor = "black",
dotSize = 500/sqrt(nrow(xYData)),
createOutput = TRUE
)
xYData |
A dataframe or matrix with two columns. Each row contains information about the x and y positition in the field for that observation. |
idsVector |
Vector with the same length as xYData containing information about the id of each observation. |
groupVector |
Vector with the same length as xYData containing information about the group identity of each observation. |
clusterVector |
Vector with the same length as xYData containing information about the cluster identity of each observation. |
displayVector |
Optionally, if the dataset is very large (>100 000 observations) and hence the SNE calculation becomes impossible to perform for the full dataset, this vector can be included. It should contain the set of rows from the data used for statistics, that has been used to generate the xYData. |
testSampleRows |
Optionally, if a train-test setup is wanted, the rows specified in this vector are used to divide the dataset into a training set, used to generate the analysis, and a test set, where the outcome is predicted based on the outcome of the training set. All rows that are not labeled as test rows are assumed to be train rows. |
paired |
Defaults to FALSE, i.e. no assumption of pairing is made and Wilcoxon rank sum-test is performed. If true, the software will by default pair the first id in the first group with the firs id in the second group and so forth, so make sure the order is correct! |
densContour |
If density contours should be created for the plot(s) or not. Defaults to TRUE. a |
plotName |
The main name for the graph and the analysis. |
groupName1 |
The name for the first group |
groupName2 |
The name for the second group |
thresholdMisclassRate |
This threshold corresponds to the usefulness of the model in separating the groups: a misclassification rate of the default 0.05 means that 5 percent of the individuals are on the wrong side of the theoretical robust middle line between the groups along the sPLS-DA axis, defined as the middle point between the 3:rd quartile of the lower group and the 1:st quartile of the higher group. |
title |
If there should be a title displayed on the plotting field. As the plotting field is saved as a png, this title cannot be removed as an object afterwards, as it is saved as coloured pixels. To simplify usage for publication, the default is FALSE, as the files are still named, eventhough no title appears on the plot. |
plotDir |
If different from the current directory. If specified and non-existent, the function creates it. If "." is specified, the plots will be saved at the current directory. |
bandColor |
The color of the contour bands. Defaults to black. |
dotSize |
Simply the size of the dots. The default makes the dots smaller the more observations that are included. |
createOutput |
For testing purposes. Defaults to TRUE. If FALSE, no output is generated. |
This function returns the full result of the sPLS-DA. It also returns a SNE based plot showing which events that belong to a cluster dominated by the first or the second group defined by the sparse partial least squares loadings of the clusters.
splsda
, dColorPlot
,
dDensityPlot
, dResidualPlot
# Load some data
data(testData)
## Not run:
# Load or create the dimensions that you want to plot the result over.
# uwot::umap recommended due to speed, but tSNE or other method would
# work as fine.
data(testDataSNE)
# Run the clustering function. For more rapid example execution,
# a depeche clustering of the data is inluded
# testDataDepeche <- depeche(testData[,2:15])
data(testDataDepeche)
# Run the function. This time without pairing.
sPLSDAObject <- dSplsda(
xYData = testDataSNE$Y, idsVector = testData$ids,
groupVector = testData$label,
clusterVector = testDataDepeche$clusterVector
)
# Here, pairing is used. NB!! This artificial example is only present to
# show how to use the function. In reality, pairing should only be used in
# situations where true paired data is present! The only reason this works
# although this is non-paired data is that the number of donors is identical.
# As it is, the algorithm internally converts the idsVector so that the first
# individual in group1 is associated with the first individual in group2.
# This can lead to erratic problems, so make sure that either a valid id
# vector, with the same id occuring two times for each individual is
# provided, or that the individuals occur in the exact same order in both
# groups.
sPLSDAObject <- dSplsda(
xYData = testDataSNE$Y, idsVector = testData$ids,
groupVector = testData$label, clusterVector =
testDataDepeche$clusterVector,
paired = TRUE, plotName = "sPLSDAPlot_paired",
groupName1 = "Stimulation 1",
groupName2 = "Stimulation 2"
)
# Here is an example of how the display vector can be used.
subsetVector <- sample(1:nrow(testData), size = 10000)
# Now, the SNE for this displayVector could be created
# testDataSubset <- testData[subsetVector, 2:15]
# testDataSNESubset <- Rtsne(testDataDisplay, pca=FALSE)$Y
# But we will just subset the testDataSNE immediately
testDataSNESubset <- testDataSNE$Y[subsetVector, ]
# And now, this new SNE can be used for display, although all
# the data is used for the sPLS-DA calculations
sPLSDAObject <- dSplsda(
xYData = testDataSNESubset, idsVector = testData$ids,
groupVector = testData$label, clusterVector =
testDataDepeche$clusterVector,
displayVector = subsetVector
)
# Finally, an example of a train-test set situation, where a random half the
# dataset is used for training and the second half is used for testing. It
# is naturally more biologically interesting to use two independent datasets
# for training and testing in the real world.
sPLSDAObject <- dSplsda(
xYData = testDataSNE$Y, idsVector = testData$ids,
groupVector = testData$label, clusterVector =
testDataDepeche$clusterVector, testSampleRows = subsetVector
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.