Managing data from large scale projects such as The Cancer Genome Atlas (TCGA)[@ref1] for further analysis is an important and time consuming step for research projects. Several efforts, such as Firehose project, make TCGA pre-processed data publicly available via web services and data portals but it requires managing, downloading and preparing the data for following steps. We developed an open source and extensible R based data client for Firehose Level 3 and Level 4 data and demonstrated its use with sample case studies. RTCGAToolbox could improve data management for researchers who are interested with TCGA data. In addition, it can be integrated with other analysis pipelines for further data analysis.
RTCGAToolbox is open-source and licensed under the GNU General Public License Version 2.0. All documentation and source code for RTCGAToolbox is freely available. Please site the paper at [@ref3].
Currently, following functions are provided to access datasets and process datasets.
To install RTCGAToolbox, you can use Bioconductor. Source code is also available on GitHub. First time users use the following code snippet to install the package
if (!requireNamespace("BiocManager")) install.packages("BiocManager") BiocManager::install("RTCGAToolbox")
Before getting the data from Firehose pipelines, users have to check valid dataset aliases, stddata run dates and analyze run dates. To provide valid information RTCGAToolbox comes with three control functions. Users can list datasets with "getFirehoseDatasets" function. In addition, users have to provide stddata run date or/and analyze run date for client function. Valid dates are accessible via "getFirehoseRunningDates" and "getFirehoseAnalyzeDates" functions. Below code chunk shows how to list datasets and dates.
library(RTCGAToolbox) # Valid aliases getFirehoseDatasets()
# Valid stddata runs getFirehoseRunningDates(last = 3)
# Valid analysis running dates (will return 3 recent date) getFirehoseAnalyzeDates(last=3)
When the dates and datasets are determined users can call data client function ("getFirehoseData") to access data. Current version can download multiple data types except ISOFORM and exon level data due to their huge data size. Below code chunk will download READ dataset with clinical and mutation data.
# READ mutation data and clinical data brcaData <- getFirehoseData(dataset="READ", runDate="20160128", forceDownload=TRUE, clinical=TRUE, Mutation=TRUE)
Printing the object will show the user what datasets are in the FirehoseData
object:
brcaData
Users have to set several parameters to get data they need. Below "getFirehoseData" options has been explained:
getFirehoseDatasets()
like as explained
above.getFirehoseRunningDates()
.Following logic keys are provided for different data types. By default client only download clinical data.
Users can also set following parameters to set client behavior.
We've provided an abbreviated dataset from the 'ACC' (Adrenocortical carcinoma) that contains only the top 6 rows for each dataset and a full clinical dataset. This dataset can be invoked by doing:
data(accmini)
accmini
accmini
data is a FirehoseData object that stores RNAseq, copy number,
mutation, clinical data from the Adrenocortical Carcinoma (ACC) study.The biocExtract
function allows the user to take any downloaded dataset and
convert it into a standard Bioconductor object. These can either be a
SummarizedExperiment
, RangedSummarizedExperiment
, or RaggedExperiment
based on features of the data. The user must provide the desired data type
as input to the function along with the actual FirehoseData
data object.
This allows for easy adaptability to other software in the Bioconductor
ecosystem.
biocExtract(accmini, "RNASeq2Gene") biocExtract(accmini, "CNASNP")
You can obtain the downloaded data in tabular or list format from the
FirehoseData
object by using 'getData()' function.
head(getData(accmini, "clinical")) getData(accmini, "RNASeq2GeneNorm") getData(accmini, "GISTIC", "AllByGene")
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.