predORF | R Documentation |
Predicts open reading frames (ORFs) and coding sequences (CDSs) in DNA sequences provided as DNAString
or DNAStringSet
objects.
predORF(x, n = 1, type = "grl", mode = "orf", strand = "sense", longest_disjoint=FALSE, startcodon = "ATG", stopcodon = c("TAA", "TAG", "TGA"))
x |
DNA query sequence(s) provided as |
n |
Defines the maximum number of ORFs to return for each input sequence. The ORFs identified are sorted decreasingly
by their length. For instance, |
type |
One of three options provided as character values: |
mode |
The setting |
strand |
One of three options passed on as character vector of length one: |
longest_disjoint |
If set to |
startcodon |
Defines the start codon(s) for ORF predictions. The default is set to the standard start codon 'ATG'. Any custom set of triplet DNA sequences can be assigned here. |
stopcodon |
Defines the stop codon(s) for ORF predictions. The default is set to the three standard stop codons 'TAA', 'TAG' and 'TGA'. Any custom set of triplet DNA sequences can be assigned here. |
Returns ORF/CDS ranges identified in query sequences as GRanges
or
data.frame
object. The type
argument defines which one of them
will be returned. The objects contain the following columns:
seqnames
: names of query sequences
subject_id
: identified ORF/CDS ranges numbered by query
start/end
: start and end positions of ORF/CDS ranges
strand
: strand of query sequence used for prediction
width
: length of subject range in bases
inframe2end
: frame of identified ORF/CDS relative to 3'
end of query sequence. This can be important if the query sequence was
extracted directly upstream of an ORF (e.g. 5' UTR upstream of main ORF).
The value 1 stands for in-frame with downstream ORF, while 2 or 3 indicates
a shift of one or two bases, respectively.
Thomas Girke
scaleRanges
## Load DNA sample data set from Biostrings package
file <- system.file("extdata", "someORF.fa", package="Biostrings")
dna <- readDNAStringSet(file)
## Predict longest ORF for sense strand in each query sequence
(orf <- predORF(dna[1:4], n=1, type="gr", mode="orf", strand="sense"))
## Not run:
## Usage for more complex example
library(txdbmaker); library(systemPipeRdata)
gff <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=gff, format="gff3", organism="Arabidopsis")
futr <- fiveUTRsByTranscript(txdb, use.names=TRUE)
genome <- system.file("extdata/annotation", "tair10.fasta", package="systemPipeRdata")
dna <- extractTranscriptSeqs(FaFile(genome), futr)
uorf <- predORF(dna, n="all", mode="orf", longest_disjoint=TRUE, strand="sense")
grl_scaled <- scaleRanges(subject=futr, query=uorf, type="uORF", verbose=TRUE)
export.gff3(unlist(grl_scaled), "uorf.gff")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.