The DOAGDC package was developed to download, organize and analyze
Genomic Data Commons (GDC) data. It uses several well-known R packages,
as DESeq2, edgeR, mclust, and WGCNA. This is a free open-source software
that implements academic research by the authors and co-workers. If you
use it, please support the project by citing the appropriate journal
article listed running the following command citation("DOAGDC")
.
# Checking for missing dependencies manually
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("annotate", "AnnotationDbi", "BiocParallel", "clusterProfiler",
"DESeq2", "DO.db", "DOSE", "EBSeq", "edgeR", "gage",
"GO.db", "goseq", "IHW", "MineICA", "org.Hs.eg.db", "pathview",
"reactome.db", "ReactomePA", "SummarizedExperiment"))
# Get DOAGDC from GitHub:
install.packages("devtools") # if not already done
devtools::install_github("Facottons/DOAGDC")
There are some possible paths in DOAGDC analysis workflow:
The start point for all possible analyses is the download_gdc()
function. As the name suggests, this function automatically download all
selected data, and store it inside the appropriate folders. After this
step, the download data can be concatenated into a single table using
the concatenate_files()
function. Then, choose which functions should
be used in the next steps:
groups_identification_coxHR()
,
groups_identification_mclust()
, or clinical_terms()
).dea_DESeq2()
,
dea_EBSeq()
, and/or dea_edgeR()
).draw_heatmap()
, or PCA_Analysis()
).GOnto()
, GSEA()
,
DO_React_enrich()
, or KEEG_enrich()
).This is a entire example that shows how to run a dea analysis, with groups generated by ‘Caspase-8-M-E’ expression:
library(DOAGDC)
# set the work dir path
DIR <- "/any/full/path/name"
# set the data type to use in groups identification step.
Data <- "protein"
# set the desired data base to be used ("gdc" or "legacy")
# for using GDC Legacy Archive
DB <- "legacy"
# set the tumor type
# for using 'Uterine Carcinosarcoma'
Tumor <- "UCS"
## step 1: Download
download_gdc(dataType = Data, tumor = Tumor, dataBase = DB,
workDir = DIR)
## step 2: concatenate into a single table
# first normalized data to groups identification
concatenate_files(dataType = Data,
Name = "Caspase-8-M-E",
normalization = TRUE,
tumorData = TRUE,
dataBase = DB,
workDir = DIR,
tumor = Tumor)
It was created an object called UCS_LEGACY_protein_tumor_data
, its
name is based on your initial setups (i.e. tumor _
data-base _ data-type _tumor_data). This object
prevents data of being messed up, keeping tumors, data bases, and data
types in the correct environment. All downstream objects are going to be
store inside this environment. From now on, you must use the name of
this recent created object as value to the env
function arguments.
# now, uses not normalized data, as required by the dea packages
# remember to use the object mentionate later
concatenate_files(dataType = "gene",
normalization = FALSE,
tumorData = TRUE,
dataBase = "legacy",
workDir = DIR,
tumor = Tumor,
env = UCS_LEGACY_protein_tumor_data)
## step 3: groups identification
# forcing three groups here ("G1", "G2", "G3")
groups_identification_mclust(dataType = Data,
Name = "Caspase-8-M-E",
tumor = Tumor,
group.number = 3,
dataBase = DB,
workDir = DIR,
env = UCS_LEGACY_protein_tumor_data)
## step 4: dea
# using edgeR 'exactTest' method
dea_edgeR(dataType = Data,
Name = "Caspase-8-M-E",
Method = "exactTest",
workDir = DIR,
env = UCS_LEGACY_protein_tumor_data)
## step 5: visualization
# PCA of 'G2_over_G1'
PCA_Analysis(Tool = "edgeR",
dataType = Data,
Name = "Caspase-8-M-E",
pairName = "G2_over_G1",
env = UCS_LEGACY_protein_tumor_data)
# Heat Map of 'G2_over_G1'
draw_heatmap(Tool = "edgeR",
dataType = Data,
Name = "Caspase-8-M-E",
Method = "euclidean",
pairName = "G2_over_G1",
env = UCS_LEGACY_protein_tumor_data)
## step 6: Pathway enrichment analysis
# using only the 'Upregulated' genes
# powered by enrichGO (image below) and goseq
GOnto(condition = "Upregulated",
Tool = "edgeR",
env = UCS_LEGACY_protein_tumor_data)
}
If you would like more information, please check the DOAGDC vignette or the DOAGDC Manual to learn even more.
There are two main places to get help with DOAGDC:
The GitHub itself.
The Stack Overflow: It is a great source of programming questions and their answers. It is also an amazing place to get help with a helpful community. Before asking any questions, please read the how to ask guide.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.