runCONCLUS | R Documentation |
This function is a wrapper to run the whole CONCLUS workflow. See details.
runCONCLUS( ## General parameters outputDirectory, experimentName, countMatrix, species, cores=2, clusteringMethod="ward.D2", exportAllResults=TRUE, orderClusters=FALSE, clusToAdd=NA, silentPlot=TRUE, ## Normalisation parameters sizes=c(20,40,60,80,100), rowMetaData=NULL, columnsMetaData = NULL, alreadyCellFiltered=FALSE, runQuickCluster=TRUE, info=TRUE, ## tSNE parameters randomSeed = 42, PCs=c(4, 6, 8, 10, 20, 40, 50), perplexities=c(30,40), writeOutputTSne = FALSE, ## Dbscan parameters epsilon=c(1.3, 1.4, 1.5), minPoints=c(3, 4), writeOutputDbScan=FALSE, ## Cell Similarity matrix parameters clusterNumber=10, deepSplit=4, ## Rank genes parameters columnRankGenes="clusters", writeOutputRankGenes=FALSE, ## Retrieving top markers parameters nTopMarkers=10, removeDuplicates = TRUE, writeTopMarkers=FALSE, ## Retrieving genes infos parameters groupBy="clusters", orderGenes="initial", getUniprot=TRUE, saveInfos=FALSE, ## plotCellSimilarity parameters colorPalette="default", statePalette="default", writeCSM=FALSE, widthCSM=7, heightCSM=6, ## plotClusteredTSNE parameters savePlotCTSNE=FALSE, widthPlotClustTSNE=6, heightPlotClustTSNE=5, tSNENb=NA, ## plotCellHeatmap parameters meanCentered=TRUE, orderGenesCH=FALSE, savePlotCH=FALSE, widthCH=10, heightCH=8.5, clusterCols=FALSE, ## plotClustersSimilarity parameters savePlotClustSM=FALSE, widthPlotClustSM=7, heightPlotClustSM=5.5)
outputDirectory |
Directory to which results should be written. This needs to be defined even if you choose to not output any results. |
experimentName |
String of the name of the experiment. |
countMatrix |
Matrix containing the raw counts. |
species |
Character string of the species of interest. Shoud be mouse or human. Other organisms can be added on demand. |
cores |
Maximum number of jobs that CONCLUS can run in parallel. This parameter is used by ?generateTSNECoordinates, ?runDBSCAN, ?clusterCellsInternal, and ?retrieveGenesInfo. Default=1. |
clusteringMethod |
Clustering method passed to hclust() function. See ?hclust for a list of method. This parameter is used by ?clusterCellsInternal, ?calculateClustersSimilarity, ?plotCellSimilarity, ?plotClusteredTSNE, ?plotCellHeatmap, and ?plotClustersSimilarity. Default = "ward.D2". |
exportAllResults |
If TRUE, Save all results of CONCLUS. See ?exportResults for details. Default=TRUE. |
orderClusters |
If TRUE, clusters in the cells and clusters similarity matrix of cells will be ordered by name. Default = FALSE. |
clusToAdd |
If not NA, defines the clustering to be used in theObject. This is particularly useful when one wants to compare the clustering performance of different tools. It should be a data frame having two columns 'clusters' and 'cells'. Default=NA. |
silentPlot |
Boolean indicating if the figures should not be output on the R graphics. Default=TRUE. |
sizes |
Vector of size factors from scran::computeSumFactors() function used by ?normaliseCountMatrix. |
rowMetaData |
Data frame containing genes informations. Default is NULL. See ?normaliseCountMatrix. |
columnsMetaData |
Data frame containing cells informations. Default is NULL. See ?normaliseCountMatrix. |
alreadyCellFiltered |
If TRUE, quality check and filtering will not be applied during the normalization of the count matrix. See ?normaliseCountMatrix. |
runQuickCluster |
If TRUE scran::quickCluster() function will be applied. It usually improves the normalization for medium-size count matrices. However, it is not recommended for datasets with less than 200 cells and may take too long for datasets with more than 10000 cells. Default=TRUE. See ?normaliseCountMatrix. |
info |
Logical. If TRUE, additional annotations like ensembl_gene_id, go_id, name_1006, chromosome_name and gene_biotype are added to the row data, for all the genes from the count matrix with ENSEMBL IDs or SYMBOL ID. Default: TRUE. |
randomSeed |
Default is 42. Seeds used to generate the tSNE. See ?generateTSNECoordinates. |
PCs |
Vector of first principal components. For example, to take ranges 1:5 and 1:10 write c(5, 10). Default = c(4, 6, 8, 10, 20, 40, 50). See ?generateTSNECoordinates. |
perplexities |
A vector of perplexity (t-SNE parameter). See ?generateTSNECoordinates for details. Default = c(30, 40). |
writeOutputTSne |
If TRUE, write the tsne parameters to the output directory defined in theObject. Default = FALSE. Ignored if exportAllResults=TRUE. |
epsilon |
Reachability distance parameter of fpc::dbscan() function. See Ester et al. (1996) for more details. Default = c(1.3, 1.4, 1.5). |
minPoints |
Reachability minimum no. of points parameter of fpc::dbscan() function. See Ester et al. (1996) for more details. Default = c(3, 4). |
writeOutputDbScan |
If TRUE, write the results of the dbScan clustering to the output directory defined in theObject, in the sub-directory output_tables. Default = FALSE. Ignored if exportAllResults=TRUE. |
clusterNumber |
Exact number of cluster. Default = NULL that will determine the number of clusters automatically. See ?clusterCellsInternal. |
deepSplit |
Intuitive level of clustering depth. Options are 1, 2, 3, 4. See ?clusterCellsInternal. Default = 4. |
columnRankGenes |
Name of the column with a clustering result. See ?rankGenes. Default="clusters". |
writeOutputRankGenes |
If TRUE, output one list of marker genes per cluster in the output directory defined in theObject and in the sub-directory 'marker_genes'. Default=FALSE. Ignored if exportAllResults=TRUE. |
nTopMarkers |
Number of marker genes to retrieve per cluster. See ?retrieveTopClustersMarkers. Default=10. |
removeDuplicates |
If TRUE, duplicated markers are removed from the lists. See ?retrieveTopClustersMarkers. Default=TRUE. |
writeTopMarkers |
If TRUE, writes one list per cluster in the output folder defined in theObject, and in the sub-directory marker_genes/markers_lists. Default=FALSE. Ignored if exportAllResults=TRUE. |
groupBy |
A column in the input table used for grouping the genes in the output tables. This option is useful if a table contains genes from different clusters. See ?retrieveGenesInfo. Default = "clusters". |
orderGenes |
If "initial" then the order of genes will not be changed. The other option is "alphabetical". See ?retrieveGenesInfo. Default="initial". |
getUniprot |
Boolean, whether to get information from UniProt or not. See ?retrieveGenesInfo. Default = TRUE. |
saveInfos |
If TRUE, save the genes infos table in the directory defined in theObject (?getOutputDirectory) and in the sub-directory 'marker_genes/saveGenesInfo'. Default=FALSE. Ignored if exportAllResults=TRUE. |
colorPalette |
A vector of colors for clusters. This parameter is used by all plotting methods. Default = "default". See ?plotClustersSimilarity for details. |
statePalette |
A vector of colors for states or conditions. This parameter is used by all plotting functions except ?plotClusteredTSNE. See ?plotClustersSimilarity for details. |
writeCSM |
If TRUE, the cells similarity heatmap is saved in the directory defined in theObject (?getOutputDirectory) and in the sub-directory "pictures". Default=FALSE. Ignored if exportAllResults=TRUE. |
widthCSM |
Width of the plot in the pdf file. See ?pdf for more details. Default = 7. |
heightCSM |
Height of the plot in the pdf file. See ?pdf for more details. Default = 6. |
savePlotCTSNE |
If TRUE, the heatmap of the clustered tSNE is saved in the directory defined in theObject (?getOutputDirectory) and in the sub-directory "pictures/tSNE_pictures". Default=FALSE. Ignored if exportAllResults=TRUE. |
widthPlotClustTSNE |
Width of the clustered tSNE plot in the pdf file. See ?pdf for more details. Default = 6. |
heightPlotClustTSNE |
Height of the clustered tSNE plot in the pdf file. See ?pdf for more details. Default = 5. |
tSNENb |
Give the number of the tSNE to plot. If NA, all tSNE solutions are plotted (14 tSNE by default). Default=NA. |
meanCentered |
Boolean indicating if mean centering should be applied to the expression matrix. See ?plotCellHeatmap. Default = TRUE. |
orderGenesCH |
Boolean, should the heatmap be structured by gene. See ?plotCellHeatmap. Default=FALSE. |
savePlotCH |
If TRUE save the cell heatmap in pdf format. The heatmap is saved in the output directory defined in theObject (?getOutputDirectory) and in the sub-directory 'pictures'. Default=FALSE. Ignored if exportAllResults=TRUE. |
widthCH |
Width of the cell heatmap saved in ?pdf. Default = 10. |
heightCH |
Height of the cell heatmap saved in ?pdf. Default = 8.5. |
clusterCols |
If TRUE, the columns representing the clusters are also taken into account in the hierarchical clustering of the cell heatmap. Default=FALSE. |
savePlotClustSM |
If TRUE, save the cluster similarity heatmap in pdf format. The heatmap is saved in the output directory defined in theObject (?getOutputDirectory) and in the sub-directory 'pictures'. Default=FALSE. Ignored if exportAllResults=TRUE. |
widthPlotClustSM |
Width of the clusters similarity heatmap in the pdf file. See ?pdf for more details. Default = 7. |
heightPlotClustSM |
Height of the clusters similarity heatmap in the pdf file. See ?pdf for more details. Default = 5.5. |
CONCLUS is a tool for robust clustering and positive marker features selection of single-cell RNA-seq (sc-RNA-seq) datasets. Of note, CONCLUS does not cover the preprocessing steps of sequencing files obtained following next-generation sequencing.
CONCLUS is organized into the following steps:
1) Generation of multiple t-SNE plots with a range of parameters including
different selection of genes extracted from PCA.
2) Use the Density-based spatial clustering of applications with noise
(DBSCAN) algorithm for idenfication of clusters in each generated t-SNE plot.
3) All DBSCAN results are combined into a cell similarity matrix.
4) The cell similarity matrix is used to define "CONSENSUS" clusters
conserved accross the previously defined clustering solutions.
5) Identify marker genes for each concensus cluster. cr
This wrapper function performs the following steps:
1) Building the single-cell RNA-Seq object. See ?scRNAseq-class.
2) Performing the normalization. See ?normaliseCountMatrix.
3) Calculating all tSNEs. See ?generateTSNECoordinates.
4) Clustering with DbScan. See ?runDBSCAN.
5) Computing the cells similarity matrix. See ?clusterCellsInternal.
6) Computing the clusters similarity matrix. If clusToAdd is not NA, add
the provided clustering. See ?calculateClustersSimilarity and
?addClustering.
7) Ranking genes. See ?rankGenes.
8) Getting marker genes. See ?retrieveTopClustersMarkers.
9) Getting genes info. See ?retrieveGenesInfo.
10) Plot the cell similarity matrix. See ?plotCellSimilarity.
11) Plot clustered tSNE. See ?plotClusteredTSNE.
12) Plot the cell heatmap. See ?plotCellHeatmap.
13) Plot the clusters similarity heatmap. See ?plotClustersSimilarity.
14) Exporting all results to outputDirectory if exportAllResults=TRUE.
See ?exportAllResults.
15) Return an object containing all the results provided by CONCLUS.
If exportAllResults=TRUE, in your "outputDirectory", the sub-folder pictures contains all tSNE with dbscan coloration (sub-folder tSNE_pictures), the cell similarity matrix (Test_cells_correlation_X_clusters.pdf), the cell heatmap (Test_clustersX_meanCenteredTRUE_orderClustersFALSE_orderGenesFALSE markrsPerCluster.pdf'), and the cluster similarity matrix ('Test_clusters_similarity_10_clusters.pdf'). You will also find in the sub-folder 'Results':
+ '1_MatrixInfo': The normalized count matrix and its meta-data for both
rows and columns.
+ '2_TSNECoordinates': The tSNE coordinates for each parameter of principal
components (PCs) and perplexities.
+ '3_dbScan': The different clusters given by DBscan according to different
parameters. Each file gives a cluster number for each cell.
+ '4_CellSimilarityMatrix': The matrix underlying the cells similarity
heatmap.
+ '5_ClusterSimilarityMatrix': The matrix underlying the clusters similarity
heatmap.
+ '6_ConclusResult': A table containing the result of the consensus
clustering. This table contains two columns: clusters-cells.
+ '7_fullMarkers': Files containing markers for each cluster, defined by the
consensus clustering.
+ '8_TopMarkers': Files containing the top 10 markers for each cluster.
+ '9_genesInfos': Files containing gene information for the top markers
defined in the previous folder.
A scRNAseq
object containing the similarity matrices and the
marker genes.
Nicolas Descostes
experimentName <- "Bergiers" outputDirectory <- "YourOutputDirectory" species <- "mouse" ## Load the count matrix countmatrixPath <- system.file("extdata/countMatrix.tsv", package="conclus") countMatrix <- loadDataOrMatrix(file=countmatrixPath, type="countMatrix", ignoreCellNumber=TRUE) ## Load the coldata coldataPath <- system.file("extdata/colData.tsv", package="conclus") columnsMetaData <- loadDataOrMatrix(file=coldataPath, type="coldata", columnID="cell_ID") ## Use runCONCLUS ## These parameters are tweaked to fit our example data and reduce ## computing time, please consider using the default parameters or ## adjusted to your dataset. scr <- runCONCLUS(outputDirectory, experimentName, countMatrix, species, columnsMetaData=columnsMetaData, perplexities=c(2,3), tSNENb=1, PCs=c(4,5,6,7,8,9,10), epsilon=c(380, 390, 400), minPoints=c(2,3), clusterNumber=2) ## Remove the results unlink(outputDirectory, recursive=TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.