View source: R/epigraHMMDataSetFromBam.R
epigraHMMDataSetFromBam | R Documentation |
This function creates a RangedSummarizedExperiment
object from of a set of BAM files.
It is used to store the input data, the model offsets, and the results from the peak calling algorithms.
epigraHMMDataSetFromBam(
bamFiles,
colData,
genome,
windowSize,
gapTrack = TRUE,
blackList = TRUE
)
bamFiles |
a string vector (or a list of string vectors) with the path for BAM files. If bamFiles is a list of string vectors, vectors must be named, have the same dimension, and, at least, a vector with name 'counts' must exist (see details). |
colData |
a |
genome |
either a single string with the name of the reference genome (e.g. 'hg19') or a GRanges object with ranges to be tilled into a set of non-overlapping windows. |
windowSize |
an integer specifying the size of genomic windows where read counts will be computed. |
gapTrack |
either a logical ( |
blackList |
either a logical ( |
The index ".bai" files must be stored in the same directory of their respective BAM files. The index files must be named after their respective BAM files with the additional ".bai" suffix.
‘epigraHMMDataSetFromBam' will store experimental data (e.g. ChIP-seq counts) from bamFiles (or bamFiles[[’counts']], if a list is provided). Additional data (e.g. input control counts) will be stored similarly with their respective list names.
By default, the function computes read counts using csaw's estimated fragment length via cross correlation analysis. For experimental counts (e.g. ChIP-seq), sequencing reads are shifted downstream half of the estimated fragment length. For additional counts (e.g. input control), sequencing reads are not shifted prior to counting.
Additional columns included in the colData input will be passed to the resulting epigraHMMDataSet assay and can be acessed via colData()
function.
The genome
argument will call GenomeInfoDb::Seqinfo() to fetch the chromosome lengths of the specified genome.
See ?GenomeInfoDb::Seqinfo for the list of UCSC genomes that are currently supported.
If gapTrack = TRUE
and the name of a reference genome is passed as input through genome
(e.g. 'hg19'),
the function will discard any genomic coordinate overlapping regions specified by the respective UCSC gap table.
If gapTrack
is a GRanges object, the function will discard any genomic coordinate overlaping regions from gapTrack
.
If blackList = TRUE
and the name of a reference genome is passed as input through genome
(e.g. 'hg19'),
The function will fetch the manually curated blacklist tracks (Version 2) from https://github.com/Boyle-Lab/Blacklist/tree/master/lists.
Current available genomes are ce10, dm3, hg19, hg38, and mm10.
If blackList
is a GRanges object, the function will discard any genomic coordinate overlaping regions from blackList
.
An epigraHMMDataSet object with sorted colData regarding conditions and replicates. Experimental counts will be stored in the 'counts' assay in the resulting epigraHMMDataSet object. Additional experimental data will be stored with their respective names from the list bamFiles.
Pedro L. Baldoni, pedrobaldoni@gmail.com
https://github.com/plbaldoni/epigraHMM DOI: 10.1093/nar/gkv1191 DOI: 10.1038/s41598-019-45839-z DOI: 10.1038/nature11247
bamFiles <- system.file("extdata","euratrans",
"lv-H3K27me3-SHR-male-bio2-tech1.bam",
package="chromstaRData")
colData <- data.frame(condition = 'SHR', replicate = 1)
object <- epigraHMMDataSetFromBam(bamFiles = bamFiles,
colData = colData,
genome = 'rn4',
windowSize = 25000,
gapTrack = TRUE,
blackList = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.