From https://support.bioconductor.org/p/9138939/.
library(GenomicDataCommons,quietly = TRUE)
I made a small change to the filtering expression approach based on
changes to lazy evaluation best practices. There is now no need to
include the ~
in the filter expression. So:
q = files() |> GenomicDataCommons::filter( cases.project.project_id == 'TCGA-COAD' & data_type == 'Aligned Reads' & experimental_strategy == 'RNA-Seq' & data_format == 'BAM')
And get a count of the results:
count(q)
And the manifest.
manifest(q)
Your question about race and ethnicity is a good one.
all_fields = available_fields(files())
And we can grep for race
or ethnic
to get potential matching fields
to look at.
grep('race|ethnic',all_fields,value=TRUE)
Now, we can check available values for each field to determine how to complete our filter expressions.
available_values('files',"cases.demographic.ethnicity") available_values('files',"cases.demographic.race")
We can complete our filter expression now to limit to white
race only.
q_white_only = q |> GenomicDataCommons::filter(cases.demographic.race=='white') count(q_white_only) manifest(q_white_only)
GenomicDataCommons
?I would like to get the number of cases added (created, any logical datetime would suffice here) to the TCGA project by experiment type. I attempted to get this data via GenomicDataCommons package, but it is giving me I believe the number of files for a given experiment type rather than number cases. How can I get the number of cases for which there is RNA-Seq data?
library(tibble) library(dplyr) library(GenomicDataCommons) cases() |> GenomicDataCommons::filter( ~ project.program.name=='TCGA' & files.experimental_strategy=='RNA-Seq' ) |> facet(c("files.created_datetime")) |> aggregations() |> unname() |> unlist(recursive = FALSE) |> as_tibble() |> dplyr::arrange(dplyr::desc(key))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.