View source: R/11.setup_CPsSearch.R
setup_CPsSearch | R Documentation |
prepare data for predicting cleavage and polyadenylation (CP) sites
setup_CPsSearch( sqlite_db, genome = getInPASGenome(), chr.utr3, seqname, background = c("same_as_long_coverage_threshold", "1K", "5K", "10K", "50K"), TxDb = getInPASTxDb(), hugeData = TRUE, outdir = getInPASOutputDirectory(), silence = FALSE, minZ = 2, cutStart = 10, MINSIZE = 10, coverage_threshold = 5 )
sqlite_db |
A path to the SQLite database for InPAS, i.e. the output of
|
genome |
An object of BSgenome::BSgenome |
chr.utr3 |
An object of GenomicRanges::GRanges, an element of the
output of |
seqname |
A character(1), the name of a chromosome/scaffold |
background |
A character(1) vector, the range for calculating cutoff threshold of local background. It can be "same_as_long_coverage_threshold", "1K", "5K","10K", or "50K". |
TxDb |
an object of GenomicFeatures::TxDb |
hugeData |
A logical(1) vector, indicating whether it is huge data |
outdir |
A character(1) vector, a path with write permission for storing InPAS analysis results. If it doesn't exist, it will be created. |
silence |
report progress or not. By default it doesn't report progress. |
minZ |
A numeric(1), a Z score cutoff value |
cutStart |
An integer(1) vector a numeric(1) vector. What percentage or how many nucleotides should be removed from 5' extremities before searching for CP sites? It can be a decimal between 0, and 1, or an integer greater than 1. 0.1 means 10 percent, 25 means cut first 25 bases |
MINSIZE |
A integer(1) vector, specifying the minimal length in bp of a short/proximal 3' UTR. Default, 10 |
coverage_threshold |
An integer(1) vector, specifying the cutoff threshold of coverage for first 100 nucleotides. If the coverage of first 100 nucleotides is lower than coverage_threshold, that transcript will be not considered for further analysis. Default, 5. |
A file storing a list as described below:
The type of methods for background coverage calculation
Z-score cutoff thresholds for each 3' UTRs
A named vector containing depth weight
A matrix storing condition/sample-specific coverage for 3' UTR and next.exon.gap (if exist)
A logical vector, indicating whether a 3'UTR has a convergent 3' UTR of its downstream transcript
A GRangesList, storing extracted 3' UTR annotation of transcript on a given chr
Jianhong Ou, Haibo Liu
if (interactive()) { library(BSgenome.Mmusculus.UCSC.mm10) library("TxDb.Mmusculus.UCSC.mm10.knownGene") genome <- BSgenome.Mmusculus.UCSC.mm10 TxDb <- TxDb.Mmusculus.UCSC.mm10.knownGene ## load UTR3 annotation and convert it into a GRangesList data(utr3.mm10) utr3 <- split(utr3.mm10, seqnames(utr3.mm10), drop = TRUE) bedgraphs <- system.file("extdata", c( "Baf3.extract.bedgraph", "UM15.extract.bedgraph" ), package = "InPAS" ) tags <- c("Baf3", "UM15") metadata <- data.frame( tag = tags, condition = c("Baf3", "UM15"), bedgraph_file = bedgraphs ) outdir <- tempdir() write.table(metadata, file = file.path(outdir, "metadata.txt"), sep = "\t", quote = FALSE, row.names = FALSE ) sqlite_db <- setup_sqlitedb( metadata = file.path( outdir, "metadata.txt" ), outdir ) addLockName(filename = tempfile()) coverage <- list() for (i in seq_along(bedgraphs)) { coverage[[tags[i]]] <- get_ssRleCov( bedgraph = bedgraphs[i], tag = tags[i], genome = genome, sqlite_db = sqlite_db, outdir = outdir, chr2exclude = "chrM" ) } data4CPsitesSearch <- setup_CPsSearch(sqlite_db, genome, chr.utr3 = utr3[["chr6"]], seqname = "chr6", background = "10K", TxDb = TxDb, hugeData = TRUE, outdir = outdir ) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.