experiment-class | R Documentation |
It is an object that simplify and error correct your NGS workflow,
creating a single R object that stores and controls all results relevant
to a specific experiment.
It contains following important parts:
filepaths: Information for each library in the experiment (for multiple file formats: bam, bed, wig, ofst, etc.)
genome: Annotation files for the experiment (fasta genome, index, gtf, txdb)
organism: Name (for automatic GO, sequence analysis, etc.)
description: Author information and experiment details (use 'list.experiments()' to show all experiments made with ORFik; this makes it easy to find and load them later)
API: ORFik supports a rich API for using the experiment, e.g., 'outputLibs(experiment, type = "wig")' to load all libraries in the wig format into R, 'loadTxdb(experiment)' to load the txdb (gtf) of the experiment, 'transcriptWindow()' to plot metacoverage for all libraries, and 'countTable(experiment)' to load count tables, etc.
Safety: Verifies that experiments contain no duplicate, empty, or non-accessible files.
Act as a way of extension of SummarizedExperiment
by allowing
more ease to find not only counts, but rather
information about libraries, and annotation, so that more tasks are
possible. Like coverage per position in some transcript etc.
## Constructor:
Simplest way to make is to call:
create.experiment(dir)
On some folder with NGS libraries (usually bam files) and see what you get.
Some of the fields
might be needed to fill in manually. Each resulting row must be unique
(not including filepath, they are always unique), that means
if it has replicates then that must be said explicit. And all
filepaths must be unique and have files with size > 0.
Here all the columns in the experiment will be described:
name (column info): examples
filepaths: Information for each library in the experiment (for multiple file formats: bam, bed, wig, ofst, etc.)
genome: Annotation files for the experiment (fasta genome, index, gtf, txdb)
organism: Name (for automatic GO, sequence analysis, etc.)
description: Author information and experiment details (use 'list.experiments()' to show all experiments made with ORFik; this makes it easy to find and load them later)
API: ORFik supports a rich API for using the experiment, e.g., 'outputLibs(experiment, type = "wig")' to load all libraries in the wig format into R, 'loadTxdb(experiment)' to load the txdb (gtf) of the experiment, 'transcriptWindow()' to plot metacoverage for all libraries, and 'countTable(experiment)' to load count tables, etc.
Safety: Verifies that experiments contain no duplicate, empty, or non-accessible files.
Special rules:
Supported:
Single/paired end bam, bed, wig, ofst + compressions of these
The reverse column of the experiments says "paired-end" if bam file.
If a pair of wig files, forward and reverse strand, reverse is filepath
to '-' strand wig file.
Paired forward / reverse wig files, must have same name except
_forward / _reverse in name
Paired end bam, when creating experiment, set pairedEndBam = c(T, T, T, F).
For 3 paired end libraries, then one single end.
Naming:
Will try to guess naming for tissues / stages, replicates etc.
If it finds more than one hit for one file, it will not guess.
Always check that it guessed correctly.
a ORFik experiment
Other ORFik_experiment:
ORFik.template.experiment()
,
ORFik.template.experiment.zf()
,
bamVarName()
,
create.experiment()
,
filepath()
,
libraryTypes()
,
organism,experiment-method
,
outputLibs()
,
read.experiment()
,
save.experiment()
,
validateExperiments()
## To see an internal ORFik example
df <- ORFik.template.experiment()
## See libraries in experiment
df
## See organism of experiment
organism(df)
## See file paths in experiment
filepath(df, "default")
## Output NGS libraries in R, to .GlobalEnv
#outputLibs(df)
## Output cds of experiment annotation
#loadRegion(df, "cds")
## This is how to make it:
## Not run:
library(ORFik)
# 1. Update path to experiment data directory (bam, bed, wig files etc)
exp_dir = "/data/processed_data/RNA-seq/Lee_zebrafish_2013/aligned/"
# 2. Set a short character name for experiment, (Lee et al 2013 -> Lee13, etc)
exper_name = "Lee13"
# 3. Create a template experiment (gtf and fasta genome)
temp <- create.experiment(exp_dir, exper_name, saveDir = NULL,
txdb = "/data/references/Zv9_zebrafish/Danio_rerio.Zv9.79.gtf",
fa = "/data/references/Zv9_zebrafish/Danio_rerio.Zv9.fa",
organism = "Homo sapiens")
# 4. Make sure each row(sample) is unique and correct
# You will get a view open now, check the data.frame that it is correct:
# library type (RNA-seq, Ribo-seq), stage, rep, condition, fraction.
# Let say it did not figure out it is RNA-seq, then we do:"
temp[5:6, 1] <- "RNA" # [row 5 and 6, col 1] are library types
# You can also do this in your spread sheet program (excel, libre office)
# Now save new version, if you did not use spread sheet.
saveName <- paste0("/data/processed_data/experiment_tables_for_R/",
exper_name,".csv")
save.experiment(temp, saveName)
# 5. Load experiment, this will validate that you actually made it correct
df <- read.experiment(saveName)
# Set experiment name not to be assigned in R variable names
df@expInVarName <- FALSE
df
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.