codon_usage | R Documentation |
Per AA / codon, analyse the coverage, get a multitude of features. For both A sites and P-sites (Input reads must be P-sites for now) This function takes inspiration from the codonDT paper, and among others returns the negative binomial estimates, but in addition many other features.
codon_usage(
reads,
cds,
mrna,
faFile,
filter_table,
filter_cds_mod3 = TRUE,
min_counts_cds_filter = max(min(quantile(filter_table, 0.5), 1000), 1000),
with_A_sites = TRUE,
aligned_position = "center",
code = GENETIC_CODE
)
reads |
either a single library (GRanges, GAlignment, GAlignmentPairs),
or a list of libraries returned from |
cds |
a GRangesList |
mrna |
a GRangesList |
faFile |
a FaFile from genome |
filter_table |
a matrix / vector of length equal to cds |
filter_cds_mod3 |
logical, default TRUE. Remove all ORFs that are not mod3, this speeds up the computation a lot, and usually removes malformed ORFs you would not want anyway. |
min_counts_cds_filter |
numeric, default:
|
with_A_sites |
logical, default TRUE. Not used yet, will also return A site scores. |
aligned_position |
what positions should be taken to calculate per-codon coverage. By default: "center", meaning that positions -1,0,1 will be taken. Alternative: "left", then positions 0,1,2 are taken. |
code |
a named character vector of size 64. Default: GENETIC_CODE. Change if organism does not use the standard code. |
The primary column to use is "mean_txNorm", this is the fair normalized score.
a data.table of rows per codon / AA. All values are given per library, per site (A or P), sorted by the mean_txNorm_percentage column of the first library in the set, the columns are:
variable (character) : Library name
seq (character) : Amino acid:codon
sum (integer) : total counts per seq
sum_txNorm (integer) : total counts per seq normalized per tx
var (numeric) : variance of total counts per seq
N (integer) : total number of codons of that type
mean_txNorm (numeric) : Default use output, the fair codon usage, normalized both for gene and genome level for codon and read counts
...
alpha (numeric) : dirichlet alpha MOM estimator (imagine mean and variance of probability in 1 value, the lower the value, the higher the variance, mean is decided by the relative value between samples)
sum_txNorm (integer) : total counts per seq normalized per tx
relative_to_max_score (integer) : Percentage use of codon
type (factor(character)) : "P" or "A"
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7196831/
Other codon:
codon_usage_exp()
,
codon_usage_plot()
df <- ORFik.template.experiment()[9:10,] # Subset to 2 Ribo-seq libs
## For single library
reads <- fimport(filepath(df[1,], "pshifted"))
cds <- loadRegion(df, "cds", filterTranscripts(df))
mrna <- loadRegion(df, "mrna", names(cds))
filter_table <- assay(countTable(df, type = "summarized")[names(cds)])
faFile <- findFa(df)
res <- codon_usage(reads, cds, mrna, faFile = faFile,
filter_table = filter_table, min_counts_cds_filter = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.