Description Usage Arguments Details Value Author(s) Examples
Process DNase-seq and/or histone ChIP-seq data and construct required DNase-seq and histone ChIP-seq files for generating priors to allocate multi-reads in ChIP-Seq data.
1 2 3 4 5 6 | priorProcess(dnaseFile = NULL, histoneFile = NULL, dnaseName
= "dnase", histoneName = NULL, fragL = 200, AllocThres
= 900, chrList = NULL, capping = 0, outfileLoc = "./",
outfile = "dnase_histone", bowtieDir, bowtieIndex, vBowtie = 2,
mBowtie = 99, pBowtie = 8, bwaDir, bwaIndex, nBWA = 2, oBWA = 1, tBWA =
8, mBWA = 99, csemDir, picardDir, chrom.ref, saveFiles = TRUE)
|
dnaseFile |
DNase-seq file in fastq format. For faster results, sam formatted file after alignment including multi-mapping reads or bam or bed files already obtained by CSEM with allocated reads can also be processed, if available. Otherwise, it is better start from the fastq formatted file. The default value is NULL. |
histoneFile |
Histone ChIP-seq file in fastq format. For faster results, sam formatted file after alignment including multi-mapping reads or bam or bed files already obtained by CSEM with allocated reads can also be processed, if available. Otherwise, it is better start from the fastq formatted file. Default value is set to NULL. |
dnaseName |
Name of DNase-seq data or the dataset used as DNase-seq data in the model, default set as 'dnase'. |
histoneName |
Name of histone ChIP-seq data sets. If no giving
values, histoneName is set as a vector of index number
|
fragL |
Average fragment length with default value 200. |
AllocThres |
Allocation threshold. It will select reads with scores higher than |
chrList |
A vector of chromosomes that will be included in
the analysis. Default set as NULL and |
capping |
Maximum number of reads allowed at each nucleotide position. To avoid potential PCR amplification artifacts, the maximum number of reads that can start at a nucleotide position is capped. Default is 0 (no capping, i.e. no maximum restriction). |
outfileLoc |
Directory to store processed files. |
outfile |
Infix of outfile name. Default set as "dnase_histone" indicating the prior is constructed using DNase-seq and Histone data. |
bowtieDir |
Directory where Bowtie was installed, default set as NULL. |
bowtieIndex |
Bowtie index, used in bowtie aligning. Default value is NULL and users can specify the selection of aligner, Bowtie or BWA, by providing the corresponding index. |
vBowtie |
Bowtie parameter. In -v mode, alignments may have
no more than vBowtie mismatches, where |
mBowtie |
Bowtie parameter. -m parameter instructs bowtie to
refrain from reporting any alignments for reads having more than
|
pBowtie |
Bowtie parameter. The -p option causes Bowtie to launch a specified number of parallel search threads. Each thread runs on a different processor/core and all threads find alignments in paralle. Default value is 8. |
bwaDir |
Directory where BWA was installed. Default set as NULL. |
bwaIndex |
BWA index used in BWA alignment. Users can specify the aligner, Bowtie or BWA, by specifying the index that will be used. Default set as NULL. |
nBWA |
BWA paramter. In "bwa aln -n" mode, if it is an integer, it denotes the maximum edit distances including mismatch and gap open. Otherwise, it will be the fraction of missing alignments given 2% uniform base errr rate. Default value is 2. |
oBWA |
BWA parameter. In "bwa aln -o" mode, it specifies the maximum number of gap open. Default set as 1. |
tBWA |
BWA parameter. In "bwa aln -t" mode, it is the number of threads in multi-threading mode. Default set as 8. |
mBWA |
BWA parameter. In "bwa samse -n", it restricts the maximum number of alignments to output for each read. If a read has more hits, the XA tag will not be written. Default set as 99. |
csemDir |
Directory where CSEM was installed. |
picardDir |
Directory where PICARD jar file is saved. For incorporating multi-mapping reads, we do not recommend using picard to remove duplicates. You can leave this option empty so that samtools will be adopted to remove PCR duplicates. |
chrom.ref |
reference genome index summary information. First line is the number of chromosomes in the index, either bwaIndex or bowtieIndex, including chrM. Second line is the size of each chromosome. Third line is the name of chromsome. |
saveFiles |
Option to save intermediate files created. Default set as TRUE. |
Processes DNase-seq and/or histone ChIP-seq files and generates module for further
analysis in priorGenerate
. If no DNase-seq data available and do not
know which histone data could play as the DNase-seq data in the model, start from
priorHistone_init
and priorHistone_multi
functions.
If no DNase-seq or histone ChIP-seq data available, run readAllocate
directly and multi-reads will
be allocated without using prior information.
If no chrList is provided, priorProcess
will generate the
list from processed files (.sam file if DNase-seq input file is in fastq
format or .bed file if DNase-seq input file is in .bam or .bed
format). Otherwise, if given by the user, it will accelerate the procedure, but the chrList should be consistent with the
chromosome name(s) in the corresponding .fa or .fasta file(s). In other
words, for example, it should be the name on the first line after ">" in .fa file.
Users can select from Bowtie and BWA to do the alignment by providing the corresponding index and leaving the other as default value NULL. If both indices are provided, the package will automatically use Bowtie to do the multi-mapping reads alignment.
DNase-seq and/or Histone aligned sam file will go through filtering process to remove duplicates. By default, 'samtools rmdup -s' function will be used. PICARD jar can take over if PICARD jar path is provided.
plot()
, summary()
, names()
and print()
methods can be used to see the information contained in "Prior"
object. To obtain the ChIP-drq alignment information from bowtie, use summary()
.
A new "Prior" object is created containing the following information:
dnaseName |
Name of dataset that is used as DNase-seq, especially in the "Only histone" situation, dnaseName is the selected histone ChIP-seq dataset. |
dnaseAlign |
DNase-seq alignment summary information from bowtie. |
dnaseKnots |
A vector of knots points for the B-spline functions. They are the 90, 99 and 99.9th percentiles of read counts. |
dnaseThres |
A vector of DNase-seq group created to generate aggregated ChIP
data. After alignment, positions which have the same DNase-seq read count are clustered into
one group. |
posLoc_bychr |
Location of the files containing the group index of each segment of the genome. |
histoneName |
Name of histone ChIP-seq dataset(s). If no giving values, the histoneName would be set as a vector of index number(1:length(histoneFile)). |
histoneNum |
Number of histone ChIP-seq dataset(s). |
histoneAlign |
histone ChIP-seq alignment summary information from bowtie. |
dataNum |
Number of dataset(s) that are used. |
chrList |
Chromosome list. |
fragL |
Fragment length. |
bowtieInfo |
List of Bowtie related information: bowtieDir, bowtieIndex, vBowtie, mBowtie and pBowtie. |
bwaInfo |
List of BWA related information: bwaDir, bwaIndex, nBWA, oBWA, tBWA, mBWA. |
csemDir |
Directory where CSEM was installed. |
picardDir |
Directory where PICARD jar file is saved. |
outfileLoc |
Directory to store output files. |
chrom.ref |
Name of the file for chromosome info. |
Xin Zeng, M. Constanza Rojo-Alfaro, Ye Zheng
1 2 3 4 5 6 | ## Not run:
object = priorProcess(dnaseFile = NULL, histoneFile = NULL, dnaseName = 'dnase',
histoneName = NULL, fragL = 200, chrList, capping = 0, outfileLoc = "./",
outfile, bowtieIndex, csemDir, picardDir, chrom.ref, saveFiles = FALSE)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.