setup_CPsSearch: prepare data for predicting cleavage and polyadenylation (CP)...
In haibol2016/InPAS: Identify Novel Alternative PolyAdenylation Sites (PAS) from RNA-seq data

setup_CPsSearch

R Documentation

prepare data for predicting cleavage and polyadenylation (CP) sites

Description

prepare data for predicting cleavage and polyadenylation (CP) sites

Usage

setup_CPsSearch(
  sqlite_db,
  genome = getInPASGenome(),
  chr.utr3,
  seqname,
  background = c("same_as_long_coverage_threshold", "1K", "5K", "10K", "50K"),
  TxDb = getInPASTxDb(),
  hugeData = TRUE,
  outdir = getInPASOutputDirectory(),
  silence = FALSE,
  minZ = 2,
  cutStart = 10,
  MINSIZE = 10,
  coverage_threshold = 5
)

Arguments

`sqlite_db`	A path to the SQLite database for InPAS, i.e. the output of `setup_sqlitedb()`.
`genome`	An object of BSgenome::BSgenome
`chr.utr3`	An object of GenomicRanges::GRanges, an element of the output of `extract_UTR3Anno()`
`seqname`	A character(1), the name of a chromosome/scaffold
`background`	A character(1) vector, the range for calculating cutoff threshold of local background. It can be "same_as_long_coverage_threshold", "1K", "5K","10K", or "50K".
`TxDb`	an object of GenomicFeatures::TxDb
`hugeData`	A logical(1) vector, indicating whether it is huge data
`outdir`	A character(1) vector, a path with write permission for storing InPAS analysis results. If it doesn't exist, it will be created.
`silence`	report progress or not. By default it doesn't report progress.
`minZ`	A numeric(1), a Z score cutoff value
`cutStart`	An integer(1) vector a numeric(1) vector. What percentage or how many nucleotides should be removed from 5' extremities before searching for CP sites? It can be a decimal between 0, and 1, or an integer greater than 1. 0.1 means 10 percent, 25 means cut first 25 bases
`MINSIZE`	A integer(1) vector, specifying the minimal length in bp of a short/proximal 3' UTR. Default, 10
`coverage_threshold`	An integer(1) vector, specifying the cutoff threshold of coverage for first 100 nucleotides. If the coverage of first 100 nucleotides is lower than coverage_threshold, that transcript will be not considered for further analysis. Default, 5.

Value

A file storing a list as described below:

background: The type of methods for background coverage calculation
z2s: Z-score cutoff thresholds for each 3' UTRs
depth.weight: A named vector containing depth weight
chr.cov.merge: A matrix storing condition/sample-specific coverage for 3' UTR and next.exon.gap (if exist)
conn_next_utr3: A logical vector, indicating whether a 3'UTR has a convergent 3' UTR of its downstream transcript
chr.utr3: A GRangesList, storing extracted 3' UTR annotation of transcript on a given chr

Author(s)

Jianhong Ou, Haibo Liu

Examples

if (interactive()) {
  library(BSgenome.Mmusculus.UCSC.mm10)
  library("TxDb.Mmusculus.UCSC.mm10.knownGene")
  genome <- BSgenome.Mmusculus.UCSC.mm10
  TxDb <- TxDb.Mmusculus.UCSC.mm10.knownGene

  ## load UTR3 annotation and convert it into a GRangesList
  data(utr3.mm10)
  utr3 <- split(utr3.mm10, seqnames(utr3.mm10), drop = TRUE)

  bedgraphs <- system.file("extdata", c(
    "Baf3.extract.bedgraph",
    "UM15.extract.bedgraph"
  ),
  package = "InPAS"
  )
  tags <- c("Baf3", "UM15")
  metadata <- data.frame(
    tag = tags,
    condition = c("Baf3", "UM15"),
    bedgraph_file = bedgraphs
  )
  outdir <- tempdir()
  write.table(metadata,
    file = file.path(outdir, "metadata.txt"),
    sep = "\t", quote = FALSE, row.names = FALSE
  )

  sqlite_db <- setup_sqlitedb(
    metadata = file.path(
      outdir,
      "metadata.txt"
    ),
    outdir
  )
  addLockName(filename = tempfile())
  coverage <- list()
  for (i in seq_along(bedgraphs)) {
    coverage[[tags[i]]] <- get_ssRleCov(
      bedgraph = bedgraphs[i],
      tag = tags[i],
      genome = genome,
      sqlite_db = sqlite_db,
      outdir = outdir,
      chr2exclude = "chrM"
    )
  }
  data4CPsitesSearch <- setup_CPsSearch(sqlite_db,
    genome,
    chr.utr3 = utr3[["chr6"]],
    seqname = "chr6",
    background = "10K",
    TxDb = TxDb,
    hugeData = TRUE,
    outdir = outdir
  )
}

haibol2016/InPAS documentation built on Feb. 21, 2025, 6:26 p.m.

haibol2016/InPAS index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

haibol2016/InPAS
Identify Novel Alternative PolyAdenylation Sites (PAS) from RNA-seq data

setup_CPsSearch: prepare data for predicting cleavage and polyadenylation (CP)...
In haibol2016/InPAS: Identify Novel Alternative PolyAdenylation Sites (PAS) from RNA-seq data

prepare data for predicting cleavage and polyadenylation (CP) sites

Description

Usage

Arguments

Value

Author(s)

Examples

Related to setup_CPsSearch in haibol2016/InPAS...

R Package Documentation

Browse R Packages

We want your feedback!

haibol2016/InPAS Identify Novel Alternative PolyAdenylation Sites (PAS) from RNA-seq data

setup_CPsSearch: prepare data for predicting cleavage and polyadenylation (CP)... In haibol2016/InPAS: Identify Novel Alternative PolyAdenylation Sites (PAS) from RNA-seq data

prepare data for predicting cleavage and polyadenylation (CP) sites

Description

Usage

Arguments

Value

Author(s)

Examples

Related to setup_CPsSearch in haibol2016/InPAS...

R Package Documentation

Browse R Packages

We want your feedback!

haibol2016/InPAS
Identify Novel Alternative PolyAdenylation Sites (PAS) from RNA-seq data

setup_CPsSearch: prepare data for predicting cleavage and polyadenylation (CP)...
In haibol2016/InPAS: Identify Novel Alternative PolyAdenylation Sites (PAS) from RNA-seq data