BSFind | R Documentation |
This is the main function that performs the binding site definition analysis through the following steps:
Filter PureCLIP sites by their score distribution: pureClipGlobalFilter
Estimate the appropriate binding site width together with the optimal gene-wise filter level: estimateBsWidth
Filter PureCLIP sites by their score distribution per gene: pureClipGeneWiseFilter
Define equally sized binding sites: makeBindingSites
Perform replicate reproducibility filter: reproducibilityFilter
Assign binding sites to their hosting genes: assignToGenes
Assign binding sites to their hosting transcript regions: assignToTranscriptRegions
Re-assign PureCLIP scores to binding sites: annotateWithScore
Calculation of signal-to-flank ratio: calculateSignalToFlankScore
BSFind(
object,
bsSize = NULL,
cutoff.geneWiseFilter = NULL,
cutoff.globalFilter = 0.01,
est.bsResolution = "medium",
est.geneResolution = "medium",
est.maxBsWidth = 13,
est.minimumStepGain = 0.02,
est.maxSites = Inf,
est.subsetChromosome = "chr1",
est.minWidth = 2,
est.offset = 1,
est.sensitive = FALSE,
est.sensitive.size = 5,
est.sensitive.minWidth = 2,
merge.minWidth = 2,
merge.minCrosslinks = 2,
merge.minClSites = 1,
merge.CenterIsClSite = TRUE,
merge.CenterIsSummit = TRUE,
repro.cutoff = NULL,
repro.nReps = NULL,
repro.minCrosslinks = 1,
overlaps.geneWiseFilter = "keepSingle",
overlaps.geneAssignment = "frequency",
overlaps.rule.geneAssignment = NULL,
overlaps.TranscriptRegions = "frequency",
overlaps.rule.TranscriptRegions = NULL,
stf.flank = "bs",
stf.flank.size = NULL,
match.score = "score",
match.geneID = "gene_id",
match.geneName = "gene_name",
match.geneType = "gene_type",
match.ranges.score = NULL,
match.option.score = "max",
anno.annoDB = NULL,
anno.genes = NULL,
anno.transcriptRegionList = NULL,
quiet = FALSE,
veryQuiet = FALSE,
...
)
object |
a |
bsSize |
an odd integer value specifying the size of the output binding sites |
cutoff.geneWiseFilter |
numeric; defines the cutoff for which sites to
remove in in function |
cutoff.globalFilter |
numeric; defines the cutoff for which sites to
keep, the smallest step is 1% (0.01) in function
|
est.bsResolution |
character; level of resolution of the binding site
width in function |
est.geneResolution |
character; level of resolution of the gene-wise
filtering in function |
est.maxBsWidth |
numeric; the largest binding site width which should considered in the testing |
est.minimumStepGain |
numeric; the minimum additional gain in the score in percent the next binding site width has to have, to be selected as best option |
est.maxSites |
numeric; maximum number of PureCLIP sites that are used |
est.subsetChromosome |
character; define on which chromosome the
estimation should be done in function |
est.minWidth |
the minimum size of regions that are subjected to the iterative merging routine, after the initial region concatenation. |
est.offset |
constant added to the flanking count in the signal-to-flank ratio calculation to avoid division by Zero |
est.sensitive |
logical; whether to enable sensitive pre-filtering before binding site merging or not |
est.sensitive.size |
numeric; the size (in nucleotides) of the merged sensitive region |
est.sensitive.minWidth |
numeric; the minimum size (in nucleoties) of the merged sensitive region |
merge.minWidth |
the minimum size of regions that are subjected to the iterative merging routine, after the initial region concatenation. |
merge.minCrosslinks |
the minimal number of positions to overlap with at least one crosslink event in the final binding sites |
merge.minClSites |
the minimal number of crosslink sites that have to overlap a final binding site |
merge.CenterIsClSite |
logical, whether the center of a final binding site must be covered by an initial crosslink site |
merge.CenterIsSummit |
logical, whether the center of a final binding site must exhibit the highest number of crosslink events |
repro.cutoff |
numeric; percentage cutoff to be used for the reproducibility quantile filtering |
repro.nReps |
numeric; number of replicates that must meet the cutoff
defined in |
repro.minCrosslinks |
numeric; minimal number of crosslinks a binding
site needs to have to be called reproducible. Acts as a lower boundary for
|
overlaps.geneWiseFilter |
character; how overlaps should be handled in
|
overlaps.geneAssignment |
character; how overlaps should be handled in
|
overlaps.rule.geneAssignment |
character vector; a vector of gene types
that should be used to handle overlaps if option 'hierarchy' is selected
for |
overlaps.TranscriptRegions |
character; how overlaps should be handled in
|
overlaps.rule.TranscriptRegions |
character vector; a vector of gene types
that should be used to handle overlaps if option 'hierarchy' is selected
for |
stf.flank |
character; how the flanking region shoule be set. Options are 'bs', 'manual' |
stf.flank.size |
numeric; if flank='manual' provide the desired flanking size |
match.score |
character; meta column name of the crosslink site |
match.geneID |
character; meta column name of the genes |
match.geneName |
character; meta column name of the gene name |
match.geneType |
character; meta column name of the gene type |
match.ranges.score |
a GRanges object, with numeric column for the score
to match in function |
match.option.score |
character; meta column name of the crosslink site
in function |
anno.annoDB |
an object of class |
anno.genes |
an object of class |
anno.transcriptRegionList |
an object of class |
quiet |
logical; whether to print messages |
veryQuiet |
logical; whether to suppress all messages |
... |
additional arguments passed to |
If only the annotation is provided (anno.genes
and
anno.transcriptRegionList
), then binding sites size (bsSize
)
and gene-wise cutoff (cutoff.geneWiseFilter
) are estimated using
estimateBsWidth
. To avoid this behavior one has to provide
input values for the arguments bsSize
and cutoff.geneWiseFilter
.
If no binding site size is provided through bsSize
, then
estimateBsWidth
is called to estimate the optimal size for the
given data-set. The result of this estimation can be looked at with
estimateBsWidthPlot
and arguments can be adjusted if needed.
Use the processingStepsFlowChart
function to get an overview
of all steps carried out by the function.
For complete details on each step, see the manual pages of the respective
functions. The BSFind
function returns a BSFDataSet
with ranges merged into binding sites. A full flowchart for the entire process
can be visualized with processingStepsFlowChart
. For each of the
individual steps dedicated diagnostic plots exists. Further information can be
found in our Bioconductor vignette:
https://www.bioconductor.org/packages/release/bioc/html/BindingSiteFinder.html
an object of class BSFDataSet
with ranges merged into
binding sites given the inputs.
BSFDataSet
, estimateBsWidth
,
pureClipGlobalFilter
, pureClipGeneWiseFilter
,
assignToGenes
, assignToTranscriptRegions
,
annotateWithScore
, reproducibilityFilter
,
calculateSignalToFlankScore
,
processingStepsFlowChart
# load clip data
files <- system.file("extdata", package="BindingSiteFinder")
load(list.files(files, pattern = ".rda$", full.names = TRUE))
# Load genes
load(list.files(files, pattern = ".rds$", full.names = TRUE)[1])
# load transcript regions
load(list.files(files, pattern = ".rds$", full.names = TRUE)[2])
BSFind(object = bds, bsSize = 9, anno.genes = gns,
anno.transcriptRegionList = regions, est.subsetChromosome = "chr22")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.