Description Usage Arguments Value Note Author(s) See Also Examples
View source: R/makeGRangesFromDataFrame.R
makeGRangesFromDataFrame
takes a data-frame-like object as
input and tries to automatically find the columns that describe
genomic ranges. It returns them as a GRanges object.
makeGRangesFromDataFrame
is also the workhorse behind the
coercion method from data.frame (or DataFrame) to
GRanges.
1 2 3 4 5 6 7 8 9 10 11 12 |
df |
A data.frame or DataFrame object. If not, then
the function first tries to turn |
keep.extra.columns |
|
ignore.strand |
|
seqinfo |
Either |
seqnames.field |
A character vector of recognized names for the column in |
start.field |
A character vector of recognized names for the column in |
end.field |
A character vector of recognized names for the column in |
strand.field |
A character vector of recognized names for the column in |
starts.in.df.are.0based |
|
A GRanges object with one element per row in the input.
If the seqinfo
argument was supplied, the returned object will
have exactly the seqlevels specified in seqinfo
and in the same
order. Otherwise, the seqlevels are ordered according to the output of
the rankSeqlevels
function (except if
df
contains the seqnames in the form of a factor-Rle, in which
case the levels of the factor-Rle become the seqlevels of the returned
object and with no re-ordering).
If df
has non-automatic row names (i.e. rownames(df)
is
not NULL
and is not seq_len(nrow(df))
), then they will be
used to set names on the returned GRanges object.
Coercing data.frame or DataFrame df
into
a GRanges object (with as(df, "GRanges")
), or
calling GRanges(df)
, are both equivalent to calling
makeGRangesFromDataFrame(df, keep.extra.columns=TRUE)
.
H. Pag<c3><a8>s, based on a proposal by Kasper Daniel Hansen
GRanges objects.
Seqinfo objects and the
rankSeqlevels
function in the
GenomeInfoDb package.
The makeGRangesListFromFeatureFragments
function
for making a GRangesList object from a list of fragmented
features.
The getTable
function in the
rtracklayer package for an R interface to the UCSC
Table Browser.
DataFrame objects in the S4Vectors package.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | ## ---------------------------------------------------------------------
## BASIC EXAMPLES
## ---------------------------------------------------------------------
df <- data.frame(chr="chr1", start=11:15, end=12:16,
strand=c("+","-","+","*","."), score=1:5)
df
makeGRangesFromDataFrame(df) # strand value "." is replaced with "*"
## The strand column is optional:
df <- data.frame(chr="chr1", start=11:15, end=12:16, score=1:5)
makeGRangesFromDataFrame(df)
gr <- makeGRangesFromDataFrame(df, keep.extra.columns=TRUE)
gr2 <- as(df, "GRanges") # equivalent to the above
stopifnot(identical(gr, gr2))
gr2 <- GRanges(df) # equivalent to the above
stopifnot(identical(gr, gr2))
makeGRangesFromDataFrame(df, ignore.strand=TRUE)
makeGRangesFromDataFrame(df, keep.extra.columns=TRUE,
ignore.strand=TRUE)
makeGRangesFromDataFrame(df, seqinfo=paste0("chr", 4:1))
makeGRangesFromDataFrame(df, seqinfo=c(chrM=NA, chr1=500, chrX=100))
makeGRangesFromDataFrame(df, seqinfo=Seqinfo(paste0("chr", 4:1)))
## ---------------------------------------------------------------------
## ABOUT AUTOMATIC DETECTION OF THE seqnames/start/end/strand COLUMNS
## ---------------------------------------------------------------------
## Automatic detection of the seqnames/start/end/strand columns is
## case insensitive:
df <- data.frame(ChRoM="chr1", StarT=11:15, stoP=12:16,
STRAND=c("+","-","+","*","."), score=1:5)
makeGRangesFromDataFrame(df)
## It also ignores a common prefix between the start and end columns:
df <- data.frame(seqnames="chr1", tx_start=11:15, tx_end=12:16,
strand=c("+","-","+","*","."), score=1:5)
makeGRangesFromDataFrame(df)
## The common prefix between the start and end columns is used to
## disambiguate between more than one seqnames column:
df <- data.frame(chrom="chr1", tx_start=11:15, tx_end=12:16,
tx_chr="chr2", score=1:5)
makeGRangesFromDataFrame(df)
## ---------------------------------------------------------------------
## 0-BASED VS 1-BASED START POSITIONS
## ---------------------------------------------------------------------
if (require(rtracklayer)) {
session <- browserSession()
genome(session) <- "sacCer2"
query <- ucscTableQuery(session, "Assembly")
df <- getTable(query)
head(df)
## A common pitfall is to forget that the UCSC Table Browser uses the
## "0-based start" convention:
gr0 <- makeGRangesFromDataFrame(df, keep.extra.columns=TRUE,
start.field="chromStart",
end.field="chromEnd")
head(gr0)
## The start positions need to be converted into 1-based positions,
## to adhere to the convention used in Bioconductor:
gr1 <- makeGRangesFromDataFrame(df, keep.extra.columns=TRUE,
start.field="chromStart",
end.field="chromEnd",
starts.in.df.are.0based=TRUE)
head(gr1)
}
|
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, basename, cbind, colMeans, colSums, colnames,
dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
intersect, is.unsorted, lapply, lengths, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which, which.max, which.min
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following object is masked from 'package:base':
expand.grid
Loading required package: IRanges
Loading required package: GenomeInfoDb
chr start end strand score
1 chr1 11 12 + 1
2 chr1 12 13 - 2
3 chr1 13 14 + 3
4 chr1 14 15 * 4
5 chr1 15 16 . 5
GRanges object with 5 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 11-12 +
[2] chr1 12-13 -
[3] chr1 13-14 +
[4] chr1 14-15 *
[5] chr1 15-16 *
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
GRanges object with 5 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 11-12 *
[2] chr1 12-13 *
[3] chr1 13-14 *
[4] chr1 14-15 *
[5] chr1 15-16 *
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
GRanges object with 5 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 11-12 *
[2] chr1 12-13 *
[3] chr1 13-14 *
[4] chr1 14-15 *
[5] chr1 15-16 *
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
GRanges object with 5 ranges and 1 metadata column:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <integer>
[1] chr1 11-12 * | 1
[2] chr1 12-13 * | 2
[3] chr1 13-14 * | 3
[4] chr1 14-15 * | 4
[5] chr1 15-16 * | 5
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
GRanges object with 5 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 11-12 *
[2] chr1 12-13 *
[3] chr1 13-14 *
[4] chr1 14-15 *
[5] chr1 15-16 *
-------
seqinfo: 4 sequences from an unspecified genome; no seqlengths
GRanges object with 5 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 11-12 *
[2] chr1 12-13 *
[3] chr1 13-14 *
[4] chr1 14-15 *
[5] chr1 15-16 *
-------
seqinfo: 3 sequences from an unspecified genome
GRanges object with 5 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 11-12 *
[2] chr1 12-13 *
[3] chr1 13-14 *
[4] chr1 14-15 *
[5] chr1 15-16 *
-------
seqinfo: 4 sequences from an unspecified genome; no seqlengths
GRanges object with 5 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 11-12 +
[2] chr1 12-13 -
[3] chr1 13-14 +
[4] chr1 14-15 *
[5] chr1 15-16 *
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
GRanges object with 5 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 11-12 +
[2] chr1 12-13 -
[3] chr1 13-14 +
[4] chr1 14-15 *
[5] chr1 15-16 *
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
GRanges object with 5 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2 11-12 *
[2] chr2 12-13 *
[3] chr2 13-14 *
[4] chr2 14-15 *
[5] chr2 15-16 *
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
Loading required package: rtracklayer
Error: package or namespace load failed for 'rtracklayer':
objects 'DataFrame', 'RangedDataList', 'Rle', 'isSingleString', 'recycleIntegerArg', 'recycleNumericArg', 'isSingleStringOrNA', 'isTRUEorFALSE', 'isSingleNumberOrNA' are not exported by 'namespace:IRanges'
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.