GenometriCorrelation: GenometriCorrelation
In favorov/GenometriCorr: Genometric Correlation package

Description Usage Arguments Value Author(s) References See Also Examples

The GenometriCorrelation function compares two interval annotations on a chromosome, set of chromosomes or on an entire genome, and performs various statistical tests to determine whether the two interval sets are independent or are positioned nonrandomly with respect to each other. For a complete description of the tests, refer to the the package vignette.

GenometriCorrelation(
  query,
  reference,
  chromosomes.to.proceed = c(),
  chromosomes.to.include.in.awhole = c(),
  chromosomes.to.exclude.from.awhole = c(),
  add.chr.as.prefix = FALSE,
  awhole.only = FALSE,
  map.to.half = TRUE,
  showProgressBar = TRUE,
  showTkProgressBar = FALSE,
  chromosomes.length = c(),
  suppress.evaluated.length.warning = FALSE,
  permut.number = 100,
  ecdf.area.permut.number = permut.number,
  mean.distance.permut.number = permut.number,
  jaccard.measure.permut.number = permut.number,
  jaccard.permut.is.rearrangement = FALSE,
  alternative = "two.sided",
  awhole.space.name = "awhole",
  keep.distributions = FALSE,
  representing.point.function = mitl,
  query.representing.point.function = representing.point.function,
  reference.representing.point.function = representing.point.function,
  supress.evaluated.length.warning
)

`query`	GRanges object that contains the query interval set coordinates and spaces (generally, chromosomes). The `GenometriCorrelation` function tests whether it is positioned independently relative to the `reference` interval set.
`reference`	The `GRanges` object that contains the reference interval set coordinates and spaces. The `GenometriCorrelation` function tests whether `query` is positioned independently relative to `reference`.
`chromosomes.to.proceed`	This vector of strings contains the names of spaces (chromosomes) to analyze. If both `query` and `reference` are `GRanges` if the parameter is not given (its default is c()), the initial list of spaces to proceed is the intersection of the space lists of query and reference names, and a warning is rasied if the lists are different. if `chromosomes.to.proceed` is given, it restricts the list so that only those names from the intersection that are in `chromosomes.to.proceed` are analyzed. If it is a subset of the `query` and `reference` chromosone lists intersection, the warning that the lists are different is not raised.
`chromosomes.to.include.in.awhole`	ia a vector of strings contains the names of spaces to be included in the overall (awhole) statistics. Its default is c(), meaning that all the analysed genes are included.
`chromosomes.to.exclude.from.awhole`	is a list of chromosomes (spaces) to be excluded from the overall statistics.
`add.chr.as.prefix`	deals with the chr chromosome name prefix. The correlation is only performed on chromosomes that have exactly the same name, so by default, a chromosome named chr1 will not be considered the same chromosome as one simply labeled 1. This argument is provided so that if the chromosome names in the `query`, `reference` and the `chromosomes*` parameters, or the names of the chromosomes in `chromosome.length`) have no prefix `'chr'` (lower-, upper-, or mixed case), this prefix will be added, and all strings matching `'chr'` in uppercase or partly uppercase are changed to lowercase. This way, comparisons can be made even if the chromosome names differ by any variation of the word `'chr'`. The default is FALSE.
`awhole.only`	If `FALSE`, all the considered chromosomes statistics and the summary (awhole) information are returned (default). If `TRUE`, only the summary (awhole) information is returned. Useful for pipelines. The `TRUE` value is default that is posted by `run.config` if the data is processed through mapping.
`map.to.half`	Some of the tests we use are besed on distances between a query point and the closest reference point. If map.to.half is TRUE (default) we look for the closest reference point upstream or downstream, if it is `FALSE`, we look only downstream (left). Useful if you are mapping to intervals and want to preserve directionality.
`showProgressBar`	Toggle the text progress bar. Default is `TRUE`.
`showTkProgressBar`	Toggle the Tk progress bar. If it is `TRUE` but the Tcl/Tk is not loadable (e.g. `require('tcltk')` returns `FALSE`), the parameter is turned to `FALSE`. Default is `FALSE`.
`chromosomes.length`	A vector of lengths of the chromosomes to be tested; each chromosome is given a name and a numerical length. The order of the chromosomes does not matter.
`suppress.evaluated.length.warning`	If there is a chromosome that is included in the evaluation but its length is not given in `chromosome.length`, the length of the chromosome is calculated as the maximal position mentioned in the data. If this parameter is `FALSE`, this will generate a warning, which is suppressed when the parameter is `TRUE`. The default is `FALSE`.
`permut.number`	is the common default for `ecdf.area.permut.number`, `mean.distance.permut.number`, and `jaccard.measure.permut.number`. `permut.number=0` defaults all the permutations to off.
`ecdf.area.permut.number`	The number of permutations performed to get the p-value for the area between the ecdf for relative distance distribution and the straight line representing the uniform relative area for the random case.
`mean.distance.permut.number`	The number of permutations to ascribe p-value to minimal query-reference distance averaged over all query points.
`jaccard.measure.permut.number`	The number of permutations for Jaccard measure p-value estimation.
`jaccard.permut.is.rearrangement`	If `TRUE`, the permutations of the reference for the Jaccard test retain the lengths of all intervals and gaps in the query. All the permuted queries will mirror the original, so the p-value is overestimated. If `FALSE` (the default), the permutation is a random resampling of starts of the query intervals.
`alternative`	a character string specifying the alternative hypothesis, must be one of `"two.sided"` (default), `"attraction"` or `"repulsion"`. You can specify just the initial letter.
`awhole.space.name`	The name of the pseudo-space that describes the overall genome statistics. Default is 'awhole'.
`keep.distributions`	It this is true, the procedure returns all points in th distributions calculated for comparison. This is useful for making figures. Default is `FALSE`.
`representing.point.function`	By default, the midpoint of each interval is used as the surrogate for the position of the interval. To force the program to use something other than the midpoint, define the function to use to return comparison points. The function must take the same parameters as the default `mitl` that returns the middle points. The function is to be passed as the `representing.point.function` parameter. The default for the parameter is: `mitl<-function(start,end,chromosome.length,space){return ((as.integer(start)+as.integer(end))/2)}`
`query.representing.point.function`	The same thing as the `representing.point.function`, but the representation point calculation is overloaded only for query intervals.
`reference.representing.point.function`	The same thing as the `representing.point.function`, but the representation point calculation is overloaded only for query intervals.
`supress.evaluated.length.warning`	It was a typo for `suppress.evaluated.length.warning`. Now obsoleted, use `suppress.evaluated.length.warning`

The result is an instance of the GenometriCorrResult-class that describes the run parameters in its GenometriCorrResult-class @config slot and that extends a namedList list (created originally with a list() call with the results of the run. Each element of the list is also a list that describes results for a space (chromosome); one of them is 'awhole' (or other awhole.space.name if given) that describes the genome awhole, all others are named the same as the chromosomes and describe the chromosomewise statistics. The elements of the 'awhole' and chromosomewise lists are statistical measures and some datasets. The statistical measures are described in the GenometriCorr package help. For further explanation, see the the package vignette.

Below is the description of the values of the list returned for each chromosome.

`query.population`	Query points used in the comparisons.
`reference.population`	Reference points used in the comparisons.
`alternative`	Shows the value of the `alternative` parameter passed.
`relative.distances.ks.p.value`	p-value for local independence obtained by the Kolmogorov-Smirnov test for relative distances.
`relative.distances.ecdf.deviation.area.p.value`	p-value for local independence obtained by the permutation test for relative distances.
`relative.distances.ecdf.deviation.area.test.direction`	"attraction" or "repulsion". If the `alternative` parameter is "two.sided" (default), if shows the direction of the effect; if the parameter is "attraction" or "repulsion" it is not shown
`relative.distances.ecdf.area.correlation`	Has the same sign with the relative distance-based local correlation.
`projection.test.p.value`	p-value for chromosome-scale independence obtained by the projection test.
`projection.test.direction`	"attraction" or "repulsion". If the `alternative` parameter is "two.sided" (default), if shows the direction of the effect; if the parameter is "attraction" or "repulsion" it is not shown
`projection.test.obs.to.exp`	To measure the effect size, the observed to expected ratio for the projection test statistics that is the number of query characteristic points (by default, midpoints) that fell into a reference features.
`scaled.absolute.min.distance.sum.p.value`	p-value for chromosome-scale null hypothesis as obtained by the permutations of the query points and the mean of the distances to the two closest reference points.
`scaled.absolute.min.distance.sum.test.direction`	"attraction" or "repulsion". If the `alternative` parameter is "two.sided" (default), if shows the direction of the effect; if the parameter is "attraction" or "repulsion" it is not shown
`query.reference.intersection`	Intersection of reference and query, in bases.
`query.reference.union`	Union of reference and query, in bases.
`jaccard.measure`	Jaccard measure of query and reference overlap.
`jaccard.measure.p.value`	The permutation-based evaluation of the p-value for the obtained Jaccard measure, given the null hypothesis of independence.
`jaccard.measure.test.direction`	"attraction" or "repulsion". If the `alternative` parameter is "two.sided" (default), if shows the direction of the effect; if the parameter is "attraction" or "repulsion" it is not shown

The additional values that are returned if keep.distributions=TRUE

`relative.distances.data`	The original relative distances
`relative.distances.ecdf.deviation.area`	The real value of the ECDF deviation area to be compared with the permutation to obtain the p-value
`relative.distances.ecdf.deviation.area.null.list`	The null distribution
`projection.test`	List of three values: `space.length` is length of a chromosome; `reference.coverage` is length of that chromosome covered by reference intervale, and `query.hits` is the number of query points that fall into the reference intervals.
`absolute.min.distance.data`	The distribution of query-reference distances
`absolute.inter.reference.distance.data`	The distribution of reference-reference distances
`scaled.absolute.min.distance.sum`	The value of the sum (i.e. mean) of scaled absolute distances
`scaled.absolute.min.distance.sum.null.list`	The null distribution for the scaled absolute distances
`jaccard.measure.null.list`	The null distribution of Jaccard measures in permutations

Alexander Favorov favorov@sensi.org, Loris Mularoni, Yulia Medvedeva, Harris A. Jaffee, Ekaterina V. Zhuravleva, Veronica Busa, Leslie M. Cope, Andrey A. Mironov, Vsevolod J. Makeev, Sarah J. Wheelan.

GenometriCorr home

The GenometriCorr documentation and vignette.

library('rtracklayer')
library('GenometriCorr')
library('TxDb.Hsapiens.UCSC.hg19.knownGene')

refseq<-transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene)
cpgis<-import(system.file("extdata", "UCSCcpgis_hg19.bed", package = "GenometriCorr"))
seqinfo(cpgis)<-seqinfo(TxDb.Hsapiens.UCSC.hg19.knownGene)[seqnames(seqinfo(cpgis))]

pn.area<-10
pn.dist<-10
pn.jacc<-10

cpgi_to_genes<-GenometriCorrelation(cpgis,refseq,chromosomes.to.proceed=c('chr1','chr2','chr3'),ecdf.area.permut.number=pn.area,mean.distance.permut.number=pn.dist,jaccard.measure.permut.number=pn.jacc,keep.distributions=FALSE,showProgressBar=FALSE)

print(cpgi_to_genes)