atacPipe2: Pipeline for single replicate case-control paired-end...

View source: R/Methods.R

atacPipe2R Documentation

Pipeline for single replicate case-control paired-end sequencing data

Description

The preset pipeline to process case control study sequencing data. An HTML report file, result files(e.g. BED, BAM files) and conclusion list will generated. See detail for usage.

Usage

atacPipe2(
  genome,
  case = list(fastqInput1 = "paths/To/fastq1", fastqInput2 = "paths/To/fastq2",
    adapter1 = NULL, adapter2 = NULL),
  control = list(fastqInput1 = "paths/To/fastq1", fastqInput2 = "paths/To/fastq2",
    adapter1 = NULL, adapter2 = NULL),
  tmpdir = file.path(getwd(), "esATAC-pipeline"),
  refdir = file.path(tmpdir, "refdir"),
  threads = 2,
  interleave = FALSE,
  createReport = TRUE,
  motifs = NULL,
  chr = c(1:22, "X", "Y"),
  p.cutoff = 1e-06,
  ...
)

Arguments

genome

Character scalar. The genome(like hg19, mm10, etc.) reference data in "refdir" to be used in the pipeline.

case

List scalar. Input for case sample. fastqInput1, the path(s) of the mate 1 fastq file(s), is required. fastqInput2, the path(s) of the mate 2 fastq file(s), is required, when interleave=FALSE. adapter1 and adapter2 are optional.

control

List scalar. Input for control sample. fastqInput1, the path(s) of the mate 1 fastq file(s), is required. fastqInput2, the path(s) of the mate 2 fastq file(s), is required, when interleave=FALSE. adapter1 and adapter2 are optional.

tmpdir

Character scalar. The temporary file storage path.

refdir

Character scalar. The path for reference data being installed to and storage.

threads

Integer scalar. The max threads allowed to be created.

interleave

Logical scalar. Set TRUE when files are interleaved paired-end sequencing data.

createReport

Logical scalar. If the HTML report file will be created.

motifs

eitherPFMatrix, PFMatrixList, PWMatrix, PWMatrixList, default: vertebrates motif from JASPAR.

chr

Which chromatin the program will processing. It must be identical with the filename of cut site information files or subset of . Default:c(1:22, "X", "Y").

p.cutoff

p-value cutoff for returning motifs, default: 1e-6.

...

Additional arguments, currently unused.

Details

NOTE: Build bowtie index in this function may take some time. If you already have bowtie2 index files or you want to download(ftp://ftp.ccb.jhu.edu/pub/data/bowtie2_indexes) instead of building, you can let esATAC skip the steps by renaming them following the format (genome+suffix) and put them in reference installation path (refdir). Example: hg19 bowtie2 index files

  • hg19.1.bt2

  • hg19.2.bt2

  • hg19.3.bt2

  • hg19.4.bt2

  • hg19.rev.1.bt2

  • hg19.rev.2.bt2

For single end reads FASTQ files, The required parameters are fastqInput1 and adapter1. For paired end reads non-interleaved FASTQ files (interleave=FALSE,defualt), The required parameters are fastqInput1 and fastqInput2. Otherwise, parameter fastqInput2 is not required (interleave=TRUE)

The paths of sequencing data replicates can be a Character vector. For example:

fastqInput1=c("file_1.rep1.fastq","file_1.rep2.fastq")

fastqInput2=c("file_2.rep1.fastq","file_2.rep2.fastq")

The result will be return by the function. An HTML report file will be created for paired end reads. Intermediate files will be save at tmpdir path (default is ./)

Value

List scalar. It is a list that save the result of the pipeline. Slot "wholesummary": a dataframe for quality control summary of case and control data Slot "caselist" and "ctrlist": Each of them is a list that save the result for case or control data. Slots of "caselist" and "ctrllist": Slot "filelist": the input file paths. Slot "wholesummary": a dataframe for quality control summary of case or control data Slot "atacProcs": ATACProc-class objects generated by each process in the pipeline. Slot "filtstat": a dataframe that summary the reads filted in each process.

Author(s)

Zheng Wei and Wei Zhang

See Also

atacPipe

Examples

## Not run: 
## These codes are time consuming so they will not be run and
## checked by bioconductor checker.


# call pipeline
# for a quick example(only CTCF and BATF3 will be processed)
conclusion <-
   atacPipe2(
       # MODIFY: Change these paths to your own case files!
       # e.g. fastqInput1 = "your/own/data/path.fastq"
       case=list(fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"),
                fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz")),
       # MODIFY: Change these paths to your own control files!
       # e.g. fastqInput1 = "your/own/data/path.fastq"
       control=list(fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2"),
                    fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")),
       # MODIFY: Set the genome for your data
       genome = "hg19",
       motifs = getMotifInfo(motif.file = system.file("extdata", "CustomizedMotif.txt", package="esATAC")))

# call pipeline
# for overall example(all vertebrates motif in JASPAR will be processed)
conclusion <-
   atacPipe2(
       # MODIFY: Change these paths to your own case files!
       # e.g. fastqInput1 = "your/own/data/path.fastq"
       case=list(fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"),
                fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz")),
       # MODIFY: Change these paths to your own control files!
       # e.g. fastqInput1 = "your/own/data/path.fastq"
       control=list(fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2"),
                    fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")),
       # MODIFY: Set the genome for your data
       genome = "hg19")

## End(Not run)

wzthu/ATACFlow documentation built on Aug. 9, 2022, 2:24 a.m.