Description Details Author(s) References See Also Examples
GenometriCorr
evaluates the spatial correlation between two types of features (experimental or annotated) in genomic coordinates. Several different approaches are implemented, each intended to evaluate a different biologically relevant spatial relationship. The genomic features are split into intervals and correlations are evaluated as described in the package vignette.
Most standard genomic feature annotation file formats are accepted as input. Each feature type is given in a separate file, one used as a query and the other used as a reference. The statistical comparisons that are made are not symmetric; that is, they are sensitive to which file is the query and which is the reference, so we recommend running the comparisons twice, switching the status of the input files for the second run. This is intentional, as biological relationships can be asymmetric. Once the files are read, the annotations are stored as GRanges
objects from GenomicRanges
package).
There are two classes that are provided by the package. One of them, GenometriCorrConfig-class
provide serialization into configuration file that describe a full run of the package that is intended to explore spatial correlations between two markups. The class also provide run.config method that open the input data files, read them, make all the calculations by applying the central GenometriCorrelation
function to mapped or unmapped data (see below). The GenometriCorrelation
function returns an instance of GenometriCorrResult-class
that is based on a list with the results of the run and also contains the GenometriCorrConfig-class
instance that describes the run as a slot. The GenometriCorrResult-class
provides show
method and two graphical representations: graphical.report
and visualize
.
Below we describe the statistical comparisons implemented in this package. One set of features is termed the “query” and the other, the “reference”, throughout.
scaled.absolute.min.distance.sum.p.value
Query and reference intervals that are often separated by the same distance will have significant p-values; the magnitude of the p-value depends on the number of permutations performed. To determine whether a significant result indicates a small or large distance, we provide the scaled.absolute.min.distance.sum.lower.tail
, a boolean that is TRUE
when the actual absolute distances are smaller than expected and FALSE
if they are higher.
relative.distances.ks.p.value This p-value will be low when the query and reference intervals have similar spatial distributions. This test is hypersensitive and is most useful when the p-value is high.
relative.distances.ecdf.deviation.area.p.value If the query and reference reference intervals are closer or farther apart than expected, this p-value will be lo w.
relative.distances.ecdf.area.correlation This value will be negative when the query and reference intervals are anticorrelated and positive when they are positively correlated. This value has no relation to the p-value of correlation.
projection.test.p.value
If overlaps between query characteristic points (by default, midpoints) and the reference features occur less often than expected, the projection.test.lower.tail
is TRUE
; if they are more common than expected, it is FALSE
. To measure the effect size, the observed to expected ratio is calculated as projection.test.obs.to.exp
jaccard.measure The Jaccard test compares the length of the union of all query and reference features with the length of the intersection of the query and reference features.
jaccard.measure.p.value
This is the permutation-based evaluation of the p-value for the Jaccard measure; the jaccard.measure.lower.tail
is TRUE
if there are fewer overlaps than expected (less overlap) ond FALSE
otherwise.
When both query and reference features are restricted to genomic subsets, or when we want to compare their relationship only within smaller genomic intervals (e.g. genes), we can remap the intervals of interest onto pseudochromosomes and use the MapRangesToGenomicIntervals
provided by the package. Each mapping result is a GRanges
object. The pseudochromosomes can now be treated as usual by GenometriCorrelation
to test the correlations of interest.
The package also provides the VisualiseTwoIRanges
function that creates a very high-level graphic overview of a pair of annotations on a chromosome (space).
Package: | GenometriCorr |
Type: | Package |
Title: | Genometric Correlation package |
Version: | 1.1.23 |
Date: | 2020-02-20 |
License: | Artistic-2.0 |
LazyLoad: | yes |
biocViews: | Annotation, Genetics, Infrastructure, DataRepresentation, Bioinformatics, StatisticalMethod |
URL: | http://genometricorr.sourceforge.net/ |
Alexander Favorov favorov@sensi.org, Loris Mularoni, Yulia Medvedeva, Harris A. Jaffee, Ekaterina V. Zhuravleva, Veronica Busa, Leslie M. Cope, Andrey A. Mironov, Vsevolod J. Makeev, Sarah J. Wheelan
http://genometricorr.sourceforge.net/
See the GenometriCorr package vignette.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | library('rtracklayer')
library('GenometriCorr')
library('rtracklayer')
library('TxDb.Hsapiens.UCSC.hg19.knownGene')
refseq<-transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene)
cpgis<-import(system.file("extdata", "UCSCcpgis_hg19.bed", package = "GenometriCorr"))
seqinfo(cpgis)<-seqinfo(TxDb.Hsapiens.UCSC.hg19.knownGene)[seqnames(seqinfo(cpgis))]
permut.number<-0
#permut.number=0 means all the permutations are off
#permut.number is the common default for ecdf.area.permut.number, mean.distance.permut.number, and jaccard.measure.permut.number
#these three can be set separetely; explicit set overrloads the default
cpgi_to_genes<-GenometriCorrelation(cpgis,refseq,chromosomes.to.proceed=c('chr1','chr2','chr3'),permut.number=permut.number,keep.distributions=FALSE,showProgressBar=FALSE)
print(cpgi_to_genes)
VisualiseTwoIRanges(
ranges(cpgis[seqnames(cpgis)=='chr1']),
ranges(refseq[seqnames(refseq)=='chr1']),
nameA='CpG Islands',nameB='RefSeq Genes',
chrom_length=seqlengths(TxDb.Hsapiens.UCSC.hg19.knownGene)['chr1'],
title="CpGIslands and RefGenes on chr1 of Hg19 animal")
#mapping example, same as in the vignette
population<-1000
chromo.length<-c(3000000)
names(chromo.length)<-c('the_chromosome')
rquery<-GRanges(ranges=IRanges(start=runif(population,1000001,2000000-9),width=c(10)),seqnames='the_chromosome')
rref<-GRanges(ranges=IRanges(start=runif(population,1000001,2000000-9),width=c(10)),seqnames='the_chromosome')
#create two features, they are randomly scattered in 1 000 000...2 000 000
unmapped_result<-GenometriCorrelation(rquery,rref,chromosomes.length=chromo.length,permut.number=permut.number,keep.distributions=FALSE,showProgressBar=FALSE)
#correlate them on the whole chromosome: 1...3 000 000
cat('Unmapped result:\n')
print(unmapped_result)
map_space<-GRanges(ranges=IRanges(start=c(1000001),end=c(2000000)),seqname='the_chromosome')
mapped_rquery<-MapRangesToGenomicIntervals(what.to.map=rquery,where.to.map=map_space)
mapped_rref<-MapRangesToGenomicIntervals(what.to.map=rref,where.to.map=map_space)
#map them into 1 000 001...2 000 000
mapped_result<-GenometriCorrelation(mapped_rquery,mapped_rref,permut.number=permut.number,keep.distributions=FALSE,showProgressBar=FALSE)
#then, correlate again
cat('Mapped result:\n')
print(mapped_result)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.