RiboseQC_analysis: Perform a Ribo-seQC analysis
In ohlerlab/RiboseQC: RiboseQC, a Comprehensive Ribo-Seq Analysis Tool

RiboseQC_analysis

R Documentation

Perform a Ribo-seQC analysis

Description

This function loads annotation created by the prepare_annotation_files function, and analyzes a BAM file.

Usage

RiboseQC_analysis(annotation_file, bam_files, read_subset = T,
  readlength_choice_method = "max_coverage", chunk_size = 5000000L,
  write_tmp_files = T, dest_names = NA, rescue_all_rls = FALSE,
  fast_mode = T, create_report = T, sample_names = NA,
  report_file = NA, extended_report = F, pdf_plots = T)

Arguments

`annotation_file`	Full path to the annotation file (*Rannot). Or, a vector with paths to one annotation file per bam file.
`bam_files`	character vector containing the full path to the bam files
`read_subset`	Select readlengths up to 99 percent of the reads, defaults to `TRUE`. Must be of length 1 or same length as bam_files.
`readlength_choice_method`	Method used to subset relevant read lengths (see `choose_readlengths` function); defaults to "max_coverage". Must be of length 1 or same length as bam_files.
`chunk_size`	the number of alignments to read at each iteration, defaults to 5000000, increase when more RAM is available. Must be between 10000 and 100000000
`write_tmp_files`	Should output all the results (in *results_RiboseQC_all)? Defaults to `TRUE`. Must be of length 1 or same length as bam_files.
`dest_names`	character vector containing the prefixes to use for the result output files. Defaults to same as `bam_files`
`rescue_all_rls`	Set cutoff of 12 for read lengths ignored because of insufficient coverage. Defaults to `FALSE`. Must be of length 1 or same length as bam_files.
`fast_mode`	Use only top 500 genes to build profiles? Defaults to `TRUE`. Must be of length 1 or same length as bam_files.
`create_report`	Create an html report showing the RiboseQC analysis results. Defaults to `TRUE`
`sample_names`	character vector containing the names for each sample analyzed (for the html report). Defaults to "sample1", "sample2" ...
`report_file`	desired filename for for the html report file. Defaults to the first entry of `bam_files` followed by ".html"
`extended_report`	creates a large html report including codon occupancy for each read length. Defaults to `FALSE`
`pdf_plots`	creates a pdf file for each produced plot. Defaults to `TRUE`

Details

This function loads different genomic regions created in the prepare_annotation_files step, separating features on different recognized organelles. The bam files is then analyzed in chunks to minimize RAM usage.
The complete list of analysis and output is as follows:

read_stats: contains:
read length distribution (rld) per organelle, positions containes mapping statistics on different genomic regions, reads_pos1 contains 5' end mapping positions for each read, separated by read length. counts_cds_genes: contains read mapping statistics on CDS regions of protein coding genes, including gene symbols, counts, RPKM and TPM values counts_all_genes: is a similar object, but contains statistics on all annotated genes. reads_summary: reports mapping statistics on different genomic regions and divided by read length and organelle.

profiles_fivepr contains:
five_prime_bins: a DataFrame object (one for each read length and compartment) with signal values over 50 5'UTR bins, 100 CDS bins and 50 3'UTR bins; one representative transcript (reprentative_mostcommon) is selected for each gene. five_prime_subcodon containes a similar structure, but for 25nt downstream the Transcription Start Site (TSS), 25nt upstream start codons, 33nt donwstream the start codon, 33nt in the middle of the ORF, 33nt upstream the stop codon, 25nt downstream the stop codon, and 25nt upstream the Transcription End Site (TES).

selection_cutoffs contains:
results_choice: containing the calculated cutoffs and selected readlengths, together with data about the different methods. results_cutoffs has statistics about calculated cutoffs, while analysis_frame_cutoff has extensive statistics concerning cutoff calculations and read length selection, see calc_cutoffs_from_profiles for more details.

P_sites_stats: contains the list of calculated P_sites, from all reads (P_sites_all), uniquely mapping reads (P_sites_all_uniq), or uniquely mapping reads with mismatches (P_sites_uniq_mm). junctions contains stastics on read mapping on annotated splice junctions. coverage for entire reads (no 5'ends or P_sites-transformed) on different strands and for all and uniquely mapping reads are also calculated.

profiles_P_sites contains:
P_sites_bins: profiles for each organelle and read length around binned transcript locations.
P_sites_subcodon: profiles for each organelle and read length around transcript start/ends and ORF start/ends.
Codon_counts: codon occurrences in the first 11 codons, middle 11 codons, and last 11 codons for each ORF.
P_sites_percodon: P_sites counts on each codon, separated by ORF positions as described above. Values are separated by organelle and read length.
P_sites_percodon_ratio: ratio of P_sites_percodon/Codon_counts, as a measure of P_site occupancy on each codon, divided again by organelle and read length, for different ORF positions.

sequence_analysis: contains a DataFrame object with the 50top mapping location in the genome, with the corresponding DNA sequence, number of reads mapping (also in percentage of total n of reads), and genomic feature annotation.

summary_P_sites: contains a DataFrame object summarizing the P_sites calculation and read length selection, including statistics on percentage of total reads used.

Value

the function saves a "results_RiboseQC_all" R file appended to the bam_files path including the complete list of outputs described here. In addition, bigwig files for coverage value and P_sites position is appended to the bam_files path, including also a summary of P_sites selection statistics, a smaller "results_RiboseQC" R file used for creating a dynamic html report, and a "for_SaTAnn" R object that can be used in the SaTAnn pipeline.

Author(s)

Lorenzo Calviello, calviello.l.bio@gmail.com

ohlerlab/RiboseQC
RiboseQC, a Comprehensive Ribo-Seq Analysis Tool

RiboseQC_analysis: Perform a Ribo-seQC analysis
In ohlerlab/RiboseQC: RiboseQC, a Comprehensive Ribo-Seq Analysis Tool

Perform a Ribo-seQC analysis

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Related to RiboseQC_analysis in ohlerlab/RiboseQC...

R Package Documentation

Browse R Packages

We want your feedback!

ohlerlab/RiboseQC RiboseQC, a Comprehensive Ribo-Seq Analysis Tool

RiboseQC_analysis: Perform a Ribo-seQC analysis In ohlerlab/RiboseQC: RiboseQC, a Comprehensive Ribo-Seq Analysis Tool

Perform a Ribo-seQC analysis

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Related to RiboseQC_analysis in ohlerlab/RiboseQC...

R Package Documentation

Browse R Packages

We want your feedback!

ohlerlab/RiboseQC
RiboseQC, a Comprehensive Ribo-Seq Analysis Tool

RiboseQC_analysis: Perform a Ribo-seQC analysis
In ohlerlab/RiboseQC: RiboseQC, a Comprehensive Ribo-Seq Analysis Tool