View source: R/experiment_IO.R
create.experiment | R Documentation |
experiment
Create a single R object that stores and controls all results relevant to
a specific Next generation sequencing experiment.
Click the experiment link above in the title if you are not sure what an
ORFik experiment is.
By using files in a folder / folders. It will make an experiment table
with information per sample, this object allows you to use the extensive API in
ORFik that works on experiments.
Information Auto-detection:
There will be several columns you can fill in, when creating the object,
if the files have logical names like (RNA-seq_WT_rep1.bam) it will try to auto-detect
the most likely values for the columns. Like if it is RNA-seq or Ribo-seq,
Wild type or mutant, is this replicate 1 or 2 etc.
You will have to fill in the details that were not auto detected.
Easiest way to fill in the blanks are in a csv editor like libre Office
or excel. You can also remake the experiment and specify the
specific column manually.
Remember that each row (sample) must have a unique combination
of values.
An extra column called "reverse" is made if there are paired data,
like +/- strand wig files.
create.experiment(
dir,
exper,
saveDir = ORFik::config()["exp"],
txdb = "",
fa = "",
organism = "",
assembly = "",
pairedEndBam = FALSE,
viewTemplate = FALSE,
types = c("bam", "bed", "wig", "bigWig", "ofst"),
libtype = "auto",
stage = "auto",
rep = "auto",
condition = "auto",
fraction = "auto",
author = "",
files = findLibrariesInFolder(dir, types, pairedEndBam),
result_folder = NULL,
runIDs = extract_run_id(files)
)
dir |
Which directory / directories to create experiment from, must be a directory with NGS data from your experiment. Will include all files of file type specified by "types" argument. So do not mix files from other experiments in the same folder! |
exper |
Short name of experiment. Will be name used to load
experiment, and name shown when running |
saveDir |
Directory to save experiment csv file, default:
|
txdb |
A path to TxDb (prefered) or gff/gtf (not adviced, slower) file with transcriptome annotation for the organism. |
fa |
A path to fasta genome/sequences used for libraries, remember the file must have a fasta index too. |
organism |
character, default: "" (no organism set), scientific name of organism. Homo sapiens, Danio rerio, Rattus norvegicus etc. If you have a SRA metadata csv file, you can set this argument to study$ScientificName[1], where study is the SRA metadata for all files that was aligned. |
assembly |
character, default: "" (no assembly set). The genome assembly name, like GRCh38 etc. Useful to add if you want detailed metadata of experiment analysis. |
pairedEndBam |
logical FALSE, else TRUE, or a logical list of TRUE/FALSE per library you see will be included (run first without and check what order the files will come in) 1 paired end file, then two single will be c(T, F, F). If you have a SRA metadata csv file, you can set this argument to study$LibraryLayout == "PAIRED", where study is the SRA metadata for all files that was aligned. |
viewTemplate |
run View() on template when finished, default (FALSE). Usually gives you a better view of result than using print(). |
types |
Default |
libtype |
character, default "auto". Library types, must be length 1 or equal length of number of libraries. "auto" means ORFik will try to guess from file names. Example: RFP (Ribo-seq), RNA (RNA-seq), CAGE, SSU (TCP-seq 40S), LSU (TCP-seq 80S). |
stage |
character, default "auto". Developmental stage, tissue or cell line, must be length 1 or equal length of number of libraries. "auto" means ORFik will try to guess from file names. Example: HEK293 (Cell line), Sphere (zebrafish stage), ovary (Tissue). |
rep |
character, default "auto". Replicate numbering, must be length 1 or equal length of number of libraries. "auto" means ORFik will try to guess from file names. Example: 1 (rep 1), 2 rep(2). Insert only numbers here! |
condition |
character, default "auto". Library conditions, must be length 1 or equal length of number of libraries. "auto" means ORFik will try to guess from file names. Example: WT (wild type), mutant, etc. |
fraction |
character, default "auto". Fractionation of library, must be length 1 or equal length of number of libraries. "auto" means ORFik will try to guess from file names. This columns is used to make experiment unique, if the other columns are not sufficient. Example: cyto (cytosolic fraction), dmso (dmso treated fraction), etc. |
author |
character, default "". Main author of experiment, usually last name is enough. When printing will state "author et al" in info. |
files |
character vector or data.table of library paths in dir.
Default: |
result_folder |
character, default NULL. The folder to output analysis results like QC, count tables etc. By default the libFolder(df) folder is used, the folder of first library in experiment. If you are making a new experiment which is a collection of other experiments, set this to a new folder, to not contaminate your other experiment directories. |
runIDs |
character ids, usually SRR, ERR, or DRR identifiers, default is to search for any of these 3 in the filename by:
|
a data.frame, NOTE: this is not a ORFik experiment, only a template for it!
Other ORFik_experiment:
ORFik.template.experiment()
,
ORFik.template.experiment.zf()
,
bamVarName()
,
experiment-class
,
filepath()
,
libraryTypes()
,
organism,experiment-method
,
outputLibs()
,
read.experiment()
,
save.experiment()
,
validateExperiments()
# 1. Pick directory
dir <- system.file("extdata/Homo_sapiens_sample", "", package = "ORFik")
# 2. Pick an experiment name
exper <- "ORFik"
# 3. Pick .gff/.gtf location
txdb <- system.file("extdata/references/homo_sapiens",
"Homo_sapiens_dummy.gtf.db", package = "ORFik")
# 4. Pick fasta genome of organism
fa <- system.file("extdata/references/homo_sapiens",
"Homo_sapiens_dummy.fasta", package = "ORFik")
# 5. Set organism (optional)
org <- "Homo sapiens"
# Create temple not saved on disc yet:
template <- create.experiment(dir = dir, exper, txdb = txdb,
saveDir = NULL,
fa = fa, organism = org,
viewTemplate = FALSE)
## Now fix non-unique rows: either is libre office, microsoft excel, or in R
template$X5[6] <- "heart" # here a dummy example, even though data is correct
# read experiment (if you set correctly)
df <- read.experiment(template)
## Default location of experiments is ORFik::config()["exp"]
# default_experiments_path <- ORFik::config()["exp"]
# exp_path <- file.path(default_experiments_path, paste0("exper", ".csv"))
# Save with: save.experiment(df, file = exp_path)
# Then you can simply load with read.experiment(exper),
# since you saved in the default directory
## Custom location (If you work in a team, use a shared folder)
# Remember to update ORFik::config() to ripple the effect through whole
# of ORFik if you want to use this as default
new_dir <- tempdir() # Here we just use tempdir
create.experiment(dir = dir, exper, txdb = txdb,
saveDir = new_dir, fa = fa, organism = org)
template_loaded <- read.experiment(exper, new_dir)
# The csv template paths (from index 5) is equal to file paths of loaded exp
identical(template$X6[-seq(4)], filepath(template_loaded, "default"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.