NCI Genomic Data Commons Access

How could I generate a manifest file with filtering of Race and Ethnicity?

From https://support.bioconductor.org/p/9138939/.

library(GenomicDataCommons,quietly = TRUE)

I made a small change to the filtering expression approach based on changes to lazy evaluation best practices. There is now no need to include the ~ in the filter expression. So:

q = files() |>
  GenomicDataCommons::filter(
    cases.project.project_id == 'TCGA-COAD' &
      data_type == 'Aligned Reads' &
      experimental_strategy == 'RNA-Seq' &
      data_format == 'BAM')

And get a count of the results:

count(q)

And the manifest.

manifest(q)

Your question about race and ethnicity is a good one.

all_fields = available_fields(files())

And we can grep for race or ethnic to get potential matching fields to look at.

grep('race|ethnic',all_fields,value=TRUE)

Now, we can check available values for each field to determine how to complete our filter expressions.

available_values('files',"cases.demographic.ethnicity")
available_values('files',"cases.demographic.race")

We can complete our filter expression now to limit to white race only.

q_white_only = q |>
  GenomicDataCommons::filter(cases.demographic.race=='white')
count(q_white_only)
manifest(q_white_only)

How can I get the number of cases with RNA-Seq data added by date to TCGA project with `GenomicDataCommons`?

From https://support.bioconductor.org/p/9135791/

I would like to get the number of cases added (created, any logical datetime would suffice here) to the TCGA project by experiment type. I attempted to get this data via GenomicDataCommons package, but it is giving me I believe the number of files for a given experiment type rather than number cases. How can I get the number of cases for which there is RNA-Seq data?

library(tibble)
library(dplyr)
library(GenomicDataCommons)

cases() |> 
  GenomicDataCommons::filter(
    ~ project.program.name=='TCGA' & files.experimental_strategy=='RNA-Seq'
  ) |> 
  facet(c("files.created_datetime")) |> 
  aggregations() |> 
  unname() |>
  unlist(recursive = FALSE) |> 
  as_tibble() |>
  dplyr::arrange(dplyr::desc(key))