download.SRA.metadata | R Documentation |
Given a experiment identifier, query information from different locations of SRA to get a complete metadata table of the experiment. It first finds Runinfo for each library, then sample info, if pubmed id is not found searches for that and searches for author through pubmed.
download.SRA.metadata(
SRP,
outdir = tempdir(),
remove.invalid = TRUE,
auto.detect = FALSE,
abstract = "printsave",
force = FALSE,
rich.format = FALSE,
fetch_GSE = FALSE
)
SRP |
character string, a study ID as either the PRJ, SRP, ERP, DRPor GSE of the study, examples would be "SRP226389" or "ERP116106". If GSE it will try to convert to the SRP to find the files. The call works as long the runs are registered on the efetch server, as their is a linked SRP link from bioproject or GSE. Example which fails is "PRJNA449388", which does not have a linking like this. |
outdir |
character string, directory to save file, default: tempdir(). The file will be called "SraRunInfo_SRP.csv", where SRP is the SRP argument. We advice to use bioproject IDs "PRJNA...". The directory will be created if not existing. |
remove.invalid |
logical, default TRUE. Remove Runs with 0 reads (spots) |
auto.detect |
logical, default FALSE. If TRUE, ORFik will add additional columns: |
abstract |
character, default "printsave". If abstract for project exists,
print and save it (save the file to same directory as runinfo).
Alternatives: "print", Only print first time downloaded,
will not be able to print later. |
force |
logical, default FALSE. If TRUE, will redownload all files needed even though they exists. Useuful if you wanted auto.detection, but already downloaded without it. |
rich.format |
logical, default FALSE. If TRUE, will fetch all Experiment and Sample attributes. It means, that different studies can have different set of columns if set to TRUE. |
fetch_GSE |
logical, default FALSE. Search for GSE, if exists, appends a column called GEO. Will be included even though this study is not from GEO, then it sets all to NA. |
A common problem is that the project is not linked to an article, you will then not get a pubmed id.
The algorithm works like this:
If GEO identifier, find the SRP.
Then search Entrez for project and get sample identifier.
From that extract the run information and collect into a final table.
a data.table of the metadata, 1 row per sample, SRR run number defined in 'Run' column.
doi: 10.1093/nar/gkq1019
Other sra:
browseSRA()
,
download.SRA()
,
download.ebi()
,
get_bioproject_candidates()
,
install.sratoolkit()
,
rename.SRA.files()
## Originally on SRA
download.SRA.metadata("SRP226389")
## Now try with auto detection (guessing additional library info)
## Need to specify output dir as tempfile() to re-download
#download.SRA.metadata("SRP226389", tempfile(), auto.detect = TRUE)
## Originally on ENA (RCP-seq data)
# download.SRA.metadata("ERP116106")
## Originally on GEO (GSE) (save to directory to keep info with fastq files)
# download.SRA.metadata("GSE61011")
## Bioproject ID
# download.SRA.metadata("PRJNA231536")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.