atacRepsPipe2: Pipeline for multi-replicates case-control paired-end...

View source: R/Methods.R

atacRepsPipe2R Documentation

Pipeline for multi-replicates case-control paired-end sequencing data

Description

The preset pipeline to process multi-replicates case control study sequencing data. HTML report files, result files(e.g. BED, BAM files) and conclusion list will generated. See detail for usage.

Usage

atacRepsPipe2(
  genome,
  caseFastqInput1,
  caseFastqInput2,
  ctrlFastqInput1,
  ctrlFastqInput2,
  caseAdapter1 = NULL,
  caseAdapter2 = NULL,
  ctrlAdapter1 = NULL,
  ctrlAdapter2 = NULL,
  refdir = NULL,
  tmpdir = NULL,
  threads = 2,
  interleave = FALSE,
  createReport = TRUE,
  motifs = NULL,
  chr = c(1:22, "X", "Y"),
  p.cutoff = 1e-06,
  ...
)

Arguments

genome

Character scalar. The genome(like hg19, mm10, etc.) reference data in "refdir" to be used in the pipeline.

caseFastqInput1

List scalar. Input for case samples. For single-end sequencing, it contains sequence file paths. For paired-end sequencing, it can be file paths with #1 mates paired with file paths in fastqInput2 And it can also be interleaved file paths when argument interleaved=TRUE. Each element in the caseFastqInput1 List is for a replicate It can be a Character vector of FASTQ files paths to be merged.

caseFastqInput2

List scalar. Input for case samples. It contains file paths with #2 mates paired with file paths in caseFastqInput1 For single-end sequencing files and interleaved paired-end sequencing files(argument interleaved=TRUE), it must be NULL. Each element in the caseFastqInput2 List is for a replicate

ctrlFastqInput1

List scalar. Input for control samples. For single-end sequencing, it contains sequence file paths. For paired-end sequencing, it can be file paths with #1 mates paired with file paths in ctrlFastqInput2 And it can also be interleaved file paths when argument interleaved=TRUE. Each element in the ctrlFastqInput1 List is for a replicate It can be a Character vector of FASTQ files paths to be merged.

ctrlFastqInput2

List scalar. Input for control samples. It contains file paths with #2 mates paired with file paths in fastqInput1. For single-end sequencing files and interleaved paired-end sequencing files(argument interleaved=TRUE), it must be NULL. Each element in the ctrlFastqInput1 List is for a replicate

caseAdapter1

Character scalar. Adapter for caseFastqInput1.

caseAdapter2

Character scalar. Adapter for caseFastqInput2.

ctrlAdapter1

Character scalar. Adapter for ctrlFastqInput1.

ctrlAdapter2

Character scalar. Adapter for ctrlFastqInput2.

refdir

Character scalar. The path for reference data being installed to and storage.

tmpdir

Character scalar. The temporary file storage path.

threads

Integer scalar. The max threads allowed to be created.

interleave

Logical scalar. Set TRUE when files are interleaved paired-end sequencing data.

createReport

Logical scalar. If the HTML report file will be created.

motifs

eitherPFMatrix, PFMatrixList, PWMatrix, PWMatrixList, default: vertebrates motif from JASPAR.

chr

Which chromatin the program will processing. It must be identical with the filename of cut site information files or subset of . Default:c(1:22, "X", "Y").

p.cutoff

p-value cutoff for returning motifs, default: 1e-6.

...

Additional arguments, currently unused.

Details

NOTE: Build bowtie index in this function may take some time. If you already have bowtie2 index files or you want to download(ftp://ftp.ccb.jhu.edu/pub/data/bowtie2_indexes) instead of building, you can let esATAC skip the steps by renaming them following the format (genome+suffix) and put them in reference installation path (refdir). Example: hg19 bowtie2 index files

  • hg19.1.bt2

  • hg19.2.bt2

  • hg19.3.bt2

  • hg19.4.bt2

  • hg19.rev.1.bt2

  • hg19.rev.2.bt2

For single end reads FASTQ files, The required parameters are fastqInput1 and adapter1. For paired end reads non-interleaved FASTQ files (interleave=FALSE,defualt), The required parameters are fastqInput1 and fastqInput2. Otherwise, parameter fastqInput2 is not required (interleave=TRUE)

The paths of sequencing data replicates can be a Character vector. For example:

fastqInput1=c("file_1.rep1.fastq","file_1.rep2.fastq")

fastqInput2=c("file_2.rep1.fastq","file_2.rep2.fastq")

The result will be return by the function. An HTML report file will be created for paired end reads. Intermediate files will be save at tmpdir path (default is ./)

Value

List scalar. It is a list that save the result of the pipeline. Slot "caselist" and "ctrlist": Each of them is a list that save the result for case or control data. Slot "comp_result": compare analysis result for case and control data

Author(s)

Zheng Wei and Wei Zhang

See Also

atacPipe

Examples

## Not run: 
## These codes are time consuming so they will not be run and
## checked by bioconductor checker.


# call pipeline
# for a quick example(only CTCF will be processed)
conclusion <-
    atacRepsPipe2(
        # MODIFY: Change these paths to your own case files!
        # e.g. fastqInput1 = "your/own/data/path.fastq"
     caseFastqInput1=list(system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"),
                          system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz")),
     # MODIFY: Change these paths to your own case files!
     # e.g. fastqInput1 = "your/own/data/path.fastq"
     caseFastqInput2=list(system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz"),
                          system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz")),
     # MODIFY: Change these paths to your own control files!
     # e.g. fastqInput1 = "your/own/data/path.fastq"
     ctrlFastqInput1=list(system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2"),
                          system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2")),
     # MODIFY: Change these paths to your own control files!
     # e.g. fastqInput1 = "your/own/data/path.fastq"
     ctrlFastqInput2=list(system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2"),
                          system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")),
     # MODIFY: Set the genome for your data
     genome = "hg19",
     motifs = getMotifInfo(motif.file = system.file("extdata", "CustomizedMotif.txt", package="esATAC")))


# call pipeline
# for overall example(all human motif in JASPAR will be processed)
conclusion <-
    atacRepsPipe2(
        # MODIFY: Change these paths to your own case files!
        # e.g. fastqInput1 = "your/own/data/path.fastq"
     caseFastqInput1=list(system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"),
                          system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz")),
     # MODIFY: Change these paths to your own case files!
     # e.g. fastqInput1 = "your/own/data/path.fastq"
     caseFastqInput2=list(system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz"),
                          system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz")),
     # MODIFY: Change these paths to your own control files!
     # e.g. fastqInput1 = "your/own/data/path.fastq"
     ctrlFastqInput1=list(system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2"),
                          system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2")),
     # MODIFY: Change these paths to your own control files!
     # e.g. fastqInput1 = "your/own/data/path.fastq"
     ctrlFastqInput2=list(system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2"),
                          system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")),
     # MODIFY: Set the genome for your data
     genome = "hg19"
     )

## End(Not run)

wzthu/ATACFlow documentation built on Aug. 9, 2022, 2:24 a.m.