Description Usage Arguments Details Value See Also Examples
This function reads a GTF file and extracts the transcript ID and
corresponding gene ID. This function assumes that the GTF file is properly
formatted. See http://mblab.wustl.edu/GTF2.html for a detailed
description of proper GTF format. Note that GFF3 files have a somewhat
different and more complicated format in the attribute field, which this
function does not support. See http://gmod.org/wiki/GFF3 for a detailed
description of proper GFF3 format. To extract transcript and gene information
from GFF3 files, see the function tr2g_gff3
in this package.
1 2 3 4 |
file |
Path to a GTF file to be read. The file can remain gzipped. |
type_use |
Character vector, the values taken by the |
transcript_id |
Character vector of length 1. Tag in |
gene_id |
Character vector of length 1. Tag in |
gene_name |
Character vector of length 1. Tag in |
transcript_version |
Character vector of length 1. Tag in |
gene_version |
Character vector of length 1. Tag in |
version_sep |
Character to separate bewteen the main ID and the version number. Defaults to ".", as in Ensembl. |
verbose |
Whether to display progress. |
Transcript and gene versions may not be present in all GTF files, so these
arguments are optional. This function has arguments for transcript and gene
version numbers because Ensembl IDs have version numbers. For Ensembl IDs, we
recommend including the version number, since a change in version number
signals a change in the entity referred to by the ID after reannotation. If a
version is used, then it will be appended to the ID, separated by
version_sep
.
The transcript and gene IDs are The attribute
field (the last
field) of GTF files can be complicated and inconsistent across different
sources. Please check the attribute
tags in your GTF file and consider
the arguments of this function carefully. The defaults are set according to
Ensembl GTF files; defaults may not work for files from other sources. Due to
the general lack of standards for the attribute
field, you may need to
further clean up the output of this function.
A data frame at least 2 columns: gene
for gene ID,
transcript
for transcript ID, and optionally, gene_name
for
gene names.
Other functions to retrieve transcript and gene info: sort_tr2g
,
tr2g_EnsDb
, tr2g_TxDb
,
tr2g_ensembl
, tr2g_fasta
,
tr2g_gff3
, transcript2gene
1 2 3 4 5 6 7 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.