findORFs | R Documentation |
Find all Open Reading Frames (ORFs) on the simple input sequences
in ONLY 5'- 3' direction (+), but within all three possible reading frames.
Do not use findORFs for mapping to full chromosomes,
then use findMapORFs
!
For each sequence of the input vector IRanges
with START and
STOP positions (inclusive) will be returned as
IRangesList
. Returned coordinates are relative to the
input sequences.
findORFs(
seqs,
startCodon = startDefinition(1),
stopCodon = stopDefinition(1),
longestORF = TRUE,
minimumLength = 0
)
seqs |
(DNAStringSet or character vector) - DNA/RNA sequences to search
for Open Reading Frames. Can be both uppercase or lowercase. Easiest call to
get seqs if you want only regions from a fasta/fasta index pair is:
seqs = ORFik:::txSeqsFromFa(grl, faFile), where grl is a GRanges/List of
search regions and faFile is a |
startCodon |
(character vector) Possible START codons to search for.
Check |
stopCodon |
(character vector) Possible STOP codons to search for.
Check |
longestORF |
(logical) Default TRUE. Keep only the longest ORF per
unique stopcodon: (seqname, strand, stopcodon) combination, Note: Not longest
per transcript! You can also use function
|
minimumLength |
(integer) Default is 0. Which is START + STOP = 6 bp. Minimum length of ORF, without counting 3bps for START and STOP codons. For example minimumLength = 8 will result in size of ORFs to be at least START + 8*3 (bp) + STOP = 30 bases. Use this param to restrict search. |
If you want antisence strand too, do:
#positive strands
pos <- findORFs(seqs)
#negative strands (DNAStringSet only if character)
neg <- findORFs(reverseComplement(DNAStringSet(seqs)))
relist(c(GRanges(pos, strand = "+"), GRanges(neg, strand = "-")),
skeleton = merge(pos, neg))
(IRangesList) of ORFs locations by START and STOP sites grouped by input sequences. In a list of sequences, only the indices of the sequences that had ORFs will be returned, e.g. 3 sequences where only 1 and 3 has ORFs, will return size 2 IRangesList with names c("1", "3"). If there are a total of 0 ORFs, an empty IRangesList will be returned.
Other findORFs:
findMapORFs()
,
findORFsFasta()
,
findUORFs()
,
startDefinition()
,
stopDefinition()
## Simple examples
findORFs("ATGTAA")
findORFs("ATGTTAA") # not in frame anymore
findORFs("ATGATGTAA") # only longest of two above
findORFs("ATGATGTAA", longestORF = FALSE) # two ORFs
findORFs(c("ATGTAA", "ATGATGTAA")) # 1 ORF per transcript
## Get DNA sequences from ORFs
seq <- DNAStringSet(c("ATGTAA", "AAA", "ATGATGTAA"))
names(seq) <- c("tx1", "tx2", "tx3")
orfs <- findORFs(seq, longestORF = FALSE)
# you can get sequences like this:
gr <- unlist(orfs, use.names = TRUE)
gr <- GRanges(seqnames = names(seq)[as.integer(names(gr))],
ranges = gr, strand = "+")
# Give them some proper names:
names(gr) <- paste0("ORF_", seq.int(length(gr)), "_", seqnames(gr))
orf_seqs <- getSeq(seq, gr)
orf_seqs
# Save as .fasta (orf_seqs must be of type DNAStringSet)
# writeXStringSet(orf_seqs, "orfs.fasta")
## Reading from file and find ORFs
#findORFs(readDNAStringSet("path/to/transcripts.fasta"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.