Description Usage Arguments Details Value Author(s) See Also Examples
Predicts the start and stop positions of protein coding genes in a genome.
1 2 3 4 5 6 7 |
myDNAStringSet |
A |
geneticCode |
A named character vector defining the translation from codons to amino acids. Optionally, an |
minGeneLength |
Integer specifying the minimum length of genes to find in the genome. |
allowEdges |
Logical determining whether to allow genes that run off the edge of the sequences. If |
allScores |
Logical indicating whether to return information about all possible open reading frame or only the predicted genes (the default). |
showPlot |
Logical determining whether a plot is displayed with the distribution of gene lengths and scores. (See details section below.) |
verbose |
Logical indicating whether to print information about the predictions on each iteration. (See details section below.) |
Protein coding genes are identified by learning their characteristic signature directly from the genome, i.e., ab initio prediction. Gene signatures are derived from the content of the open reading frame and surrounding signals that indicate the presence of a gene. Genes are assumed to not contain introns or frame shifts, making the function best suited for prokaryotic genomes.
If showPlot
is TRUE
then a plot is displayed with four panels. The upper left panel shows the fitted distribution of background open reading frame lengths. The upper right panel shows this distribution on top of the fitted distribution of predicted gene lengths. The lower left panel shows the fitted distribution of scores for the intergenic spacing between genes on the same and opposite genome strands. The bottom right panel shows the total score of open reading frames and predicted genes by length.
If verbose
is TRUE
, information is shown about the predictions at each iteration of gene finding. The mean score difference between genes and non-genes is updated at each iteration, unless it is negative, in which case the score is dropped and a "-"
is displayed. The columns denote the number of iterations ("Iter"
), number of codon scoring models ("Models"
), start codon scores ("Start"
), upstream k-mer motif scores ("Motif"
), mRNA folding scores ("Fold"
), initial codon bias scores ("Init"
), upstream nucleotide bias scores ("UpsNt"
), termination codon bias scores ("Term"
), ribosome binding site scores ("RBS"
), codon autocorrelation scores ("Auto"
), stop codon scores ("Stop"
), and number of predicted genes ("Genes"
).
An object of class Genes
, which is stored as a matrix with information corresponding to each open reading frame.
Erik Wright eswright@pitt.edu
ExtractGenes
, Genes-class
, WriteGenes
1 2 3 4 5 6 7 8 9 10 | # import a test genome
fas <- system.file("extdata",
"Chlamydia_trachomatis_NC_000117.fas.gz",
package="DECIPHER")
genome <- readDNAStringSet(fas)
x <- FindGenes(genome)
x
genes <- ExtractGenes(x, genome)
proteins <- ExtractGenes(x, genome, type="AAStringSet")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.