get_pairs | R Documentation |
This function processes the input data to retrieve information from ensembl and uniprot to generate a dataframe containing the gene names, transcript IDs, APPRIS annotations, and protein sequences for each pair of primary and alternative transcripts. Additionally, this function creates a fasta file with the transcript ID followed by the amino acid sequence for all inputted and associated primary transcripts. The file is organized so that all transcripts from a gene are next to each other. Finally, the function also produces a final table in csv form containing the gene names, transcript IDs, APPRIS annotations, and amino acid sequences for each transcript
get_pairs(data_file, if_aa = FALSE, organism = "human", temp = FALSE)
data_file |
Path to the input file |
if_aa |
Boolean value indicating if the input file contains amino acid sequences with TRUE indicating that sequences are present and FALSE indicating that only IDs are present |
organism |
String indicating if the transcripts are from a human or a mouse |
temp |
Boolean indicating if the fasta file should be deleted after the function finishes running or not. Recommended to always be set to FALSE. |
A data frame containing the gene names, transcript IDs, APPRIS annotations,and protein sequences for each pair of primary and alternative transcripts.
This function also creates a fasta file containing the transcript IDs and associated amino acid sequences in the root directory. In addition to the fasta file, a csv file containing the returned dataframe is saved to the working directory.
tmhmm_folder_name <- "~/TMHMM2.0c" if (check_tmhmm_install(tmhmm_folder_name)) { currwd <- getwd() AA_seq <- get_pairs(system.file("extdata", "crb1_example.csv", package = "surfaltr" ), TRUE, "mouse", TRUE) setwd(currwd) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.