The Figure \@ref(fig:opalOmic) describes the different types of omic association analyses that can be performed using DataSHIELD client functions implemented in the r Githubpkg("isglobal-brge/dsOmicsClient")
package. Basically, data (omic and phenotypes/covariates) can be stored in different sites (http, ssh, AWS S3, local, ...) and are managed with Opal through the r Githubpkg("obiba/resourcer")
package and their extensions implemented in r Githubpkg("isglobal-brge/dsOmics")
.
knitr::include_graphics(tools::file_path_as_absolute("../fig/dsOmics_A.jpg"))
Then, dsOmicsClient
package allows different types of analyses: "virtually" pooled and federated meta-analysis. Both methods are based on fitting different generalized linear models (GLMs) for each feature when assesing association between omic data and the phenotype/trait/condition of interest. Of course non-disclosive omic data analysis from a single study can also be performed.
The "virtually" pooled approach (Figure \@ref(fig:omicAnal1)) is recommended when the user wants to analyze omic data from different sources and obtain results as if the data were located in a single computer. It should be noticed that this can be very time consuming when analyzing multiple features since it calls repeatedly to a base function in DataSHIELD (ds.glm
) and that it cannot be recommended when data are not properly harmonized (e.g. gene expression normalized using different methods, GWAS data having different platforms, ...). Also when it is necesary to remove unwanted variability (for transcriptomic and epigenomica analysis) or control for population stratification (for GWAS analysis), this approach cannot be used since we need to develop methods to compute surrogate variables (to remove unwanted variability) or PCAs (to to address population stratification) in a non-disclosive way.
The federated meta-analysis approach Figure \@ref(fig:omicAnal2) overcomes the limitations raised when performing pooled analyses. First, the computation issue is addressed by using scalable and fast methods to perform data analysis at whole-genome level at each server. The transcriptomic and epigenomic data analyses make use of the widely used r Biocpkg("limma")
package that uses ExpressionSet
or RangedSummarizedExperiment
Bioc infrastructures to deal with omic and phenotypic (e.g covariates). The genomic data are analyzed using r Biocpkg("GWASTools")
and r Biocpkg("GENESIS")
that are designed to perform quality control (QC) and GWAS using GDS infrastructure.
Next, we describe how both approaches are implemented:
ds.glm()
function which is a DataSHIELD function that uses an approach that is mathematically equivalent to placing all individual-level data froma all sources in one central warehouse and analysing those data using the conventional glm()
function in R. The user can select one (or multiple) features (i.e., genes, transcripts, CpGs, SNPs, ...) knitr::include_graphics(tools::file_path_as_absolute("../fig/dsOmics_B.jpg"))
knitr::include_graphics(tools::file_path_as_absolute("../fig/dsOmics_C.jpg"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.