doppelgangR: doppelgangR
In lwaldron/doppelgangR: Identify likely duplicate samples from genomic or meta-data

doppelgangR

R Documentation

doppelgangR

Description

Identify samples with suspiciously high correlations and phenotype similarities

Usage

doppelgangR(
  esets,
  separator = ":",
  corFinder.args = list(separator = separator, use.ComBat = TRUE, method = "pearson"),
  phenoFinder.args = list(separator = separator, vectorDistFun = vectorWeightedDist),
  outlierFinder.expr.args = list(bonf.prob = 0.5, transFun = atanh, tail = "upper"),
  outlierFinder.pheno.args = list(normal.upper.thresh = 0.99, bonf.prob = NULL, tail =
    "upper"),
  smokingGunFinder.args = list(transFun = I),
  impute.knn.args = list(k = 10, rowmax = 0.5, colmax = 0.8, maxp = 1500, rng.seed =
    362436069),
  manual.smokingguns = NULL,
  automatic.smokingguns = FALSE,
  within.datasets.only = FALSE,
  intermediate.pruning = FALSE,
  cache.dir = "cache",
  BPPARAM = bpparam(),
  verbose = TRUE
)

Arguments

`esets`	a list of ExpressionSets, containing the numeric and phenotypic data to be analyzed.
`separator`	a delimitor to use between dataset names and sample names
`corFinder.args`	a list of arguments to be passed to the corFinder function.
`phenoFinder.args`	a list of arguments to be passed to the phenoFinder function. If NULL, samples with similar phenotypes will not be searched for.
`outlierFinder.expr.args`	a list of arguments to be passed to outlierFinder when called for expression data
`outlierFinder.pheno.args`	a list of arguments to be passed to outlierFinder when called for phenotype data
`smokingGunFinder.args`	a list of arguments to be passed to smokingGunFinder
`impute.knn.args`	a list of arguments to be passed to impute::impute.knn. Set to NULL to do no knn imputation.
`manual.smokingguns`	a character vector of phenoData columns that, if identical, will be considered evidence of duplication
`automatic.smokingguns`	automatically look for "smoking guns." If TRUE, look for phenotype variables that are unique to each patient in dataset 1, also unique to each patient in dataset 2, but contain exact matches between datasets 1 and 2.
`within.datasets.only`	If TRUE, only search within each dataset for doppelgangers.
`intermediate.pruning`	The default setting FALSE will result in output with no missing values, but uses extra memory because all results from the expression, phenotype, and smoking gun doppelganger searches must be saved until the end. Setting this to TRUE will save memory for very large searches, but distance metrics will only be available if that value was identified as a doppelganger (for example, phenotype doppelgangers will have missing values for the expression and smoking gun similarity).
`cache.dir`	The name of a directory in which to cache or look up results to save re-calculating correlations. Set to NULL for no caching.
`BPPARAM`	Argument for BiocParallel::bplapply(), by default will use all cores of a multi-core machine
`verbose`	Print progress information

Value

Returns an object of S4-class "DoppelGang"

Author(s)

Levi Waldron, Markus Riester, Marcel Ramos

Examples


example("phenoFinder")

results2 <- doppelgangR(esets2, cache.dir = NULL)
results2
plot(results2)
summary(results2)

## Set phenoFinder.args=NULL to ignore similar phenotypes, and
## turn off ComBat batch correction:

## Not run: 
results2 <- doppelgangR(testesets,
corFinder.args=list(use.ComBat=FALSE), phenoFinder.args=NULL,
    cache.dir=NULL)
summary(results2)

library(curatedOvarianData)
data(GSE32062.GPL6480_eset)
data(GSE32063_eset)
data(GSE12470_eset)
data(GSE17260_eset)

testesets <- list(JapaneseA = GSE32062.GPL6480_eset,
    JapaneseB = GSE32063_eset,
    Yoshihara2009 = GSE12470_eset,
    Yoshihara2010 = GSE17260_eset)

## standardize the sample ids to improve matching
## based on clinical annotation

testesets <- lapply(testesets, function(X) {
  X$alt_sample_name <-
    paste(X$sample_type, gsub("[^0-9]", "", X$alt_sample_name), sep = "_")
  pData(X) <-
    pData(X)[,!grepl("uncurated_author_metadata", colnames(pData(X)))]
  X[, 1:20]  ##speed computations
})

(results1 <- doppelgangR(testesets, cache.dir = NULL))
plot(results1)
summary(results1)


## End(Not run)

lwaldron/doppelgangR documentation built on Jan. 9, 2025, 1:15 a.m.

lwaldron/doppelgangR index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

lwaldron/doppelgangR
Identify likely duplicate samples from genomic or meta-data

doppelgangR: doppelgangR
In lwaldron/doppelgangR: Identify likely duplicate samples from genomic or meta-data

doppelgangR

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Related to doppelgangR in lwaldron/doppelgangR...

R Package Documentation

Browse R Packages

We want your feedback!

lwaldron/doppelgangR Identify likely duplicate samples from genomic or meta-data

doppelgangR: doppelgangR In lwaldron/doppelgangR: Identify likely duplicate samples from genomic or meta-data

doppelgangR

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Related to doppelgangR in lwaldron/doppelgangR...

R Package Documentation

Browse R Packages

We want your feedback!

lwaldron/doppelgangR
Identify likely duplicate samples from genomic or meta-data

doppelgangR: doppelgangR
In lwaldron/doppelgangR: Identify likely duplicate samples from genomic or meta-data