View source: R/analyze_external_sequence_analysis.R
analyzeSignalP | R Documentation |
Allows for easy integration of the result of SignalP (external sequence analysis of signal peptides) in the IsoformSwitchAnalyzeR workflow. Please note that due to the 'removeNoncodinORFs' option in analyzeCPAT
and analyzeCPC2
we recommend using analyzeCPC2/analyzeCPAT before using analyzeSignalP, analyzeNetSurfP2, analyzePFAM if you have predicted the ORFs with analyzeORF
.
analyzeSignalP(
switchAnalyzeRlist,
pathToSignalPresultFile,
minSignalPeptideProbability = 0.5,
ignoreAfterBar = TRUE,
ignoreAfterSpace = TRUE,
ignoreAfterPeriod = FALSE,
quiet=FALSE
)
switchAnalyzeRlist |
A |
pathToSignalPresultFile |
A string indicating the full path to the summary SignalP result file(s). If multiple result files were created (multiple web-server runs) just supply all the paths as a vector of strings. See |
minSignalPeptideProbability |
A numeric between 0 and 1 indicating the minimum probability for calling a signal peptide. Default is 0.5 |
ignoreAfterBar |
A logic indicating whether to subset the isoform ids by ignoring everything after the first bar ("|"). Useful for analysis of GENCODE data. Default is TRUE. |
ignoreAfterSpace |
A logic indicating whether to subset the isoform ids by ignoring everything after the first space (" "). Useful for analysis of gffutils generated GTF files. Default is TRUE. |
ignoreAfterPeriod |
A logic indicating whether to subset the gene/isoform is by ignoring everything after the first period ("."). Should be used with care. Default is FALSE. |
quiet |
A logic indicating whether to avoid printing progress messages (incl. progress bar). Default is FALSE |
A signal peptide is a short peptide sequence which indicate a protein is destined towards the secretory pathway.
The SignalP web-server is less stringent than PFAM with regards to the number of sequences in the files uploaded so we suggest trying the combined fasta file first - and if that does not work try the files containing subsets. See extractSequence for info on how to split the amino acid fasta files.
Notes for how to run the external tools: If using the web-server (http://www.cbs.dtu.dk/services/SignalP/) SignalP should be run with the parameter "Short output (no figures)" under "Output format" and one should select the appropriate "Organism group". When using a stand-alone version SignalP should be run with the '-f summary' option. If using the web-server the results can be downloaded using the "Downloads" bottom in the top-right corner where the user should select "Prediction summary" and supply the path to the resulting file to the pathToSignalPresultFile argument. If a stand-alone version was just supply the path to the summary result file.
Please note that the analyzeSignalP()
function will automatically only import the SignalP results from the isoforms stored in the switchAnalyzeRlist - even if many more are stored in the result file.
Also note that analyzeSignalP automatically subset SignalP results to only contain predictions with an annotated cleavage site (CS pos) and "Probable protein fragment" results are also removed.
A column called 'signal_peptide_identified' is added to isoformFeatures
containing a binary indication (yes/no) of whether a transcript contains a signal peptide or not. Furthermore the data.frame 'signalPeptideAnalysis' is added to the switchAnalyzeRlist
containing the details of the signal peptide analysis.
The data.frame added have one row pr isoform and contains 6 columns:
isoform_id
: The name of the isoform analyzed. Matches the 'isoform_id' entry in the 'isoformFeatures' entry of the switchAnalyzeRlist
has_signal_peptide
: A text string indicating whether there is a signal peptide or not. Can be yes or no
network_used
: A text string indicating whether SignalP used the Neural Network (NN) optimized for proteins with trans-membrane sections (string='TM') or proteins without trans-membrane sections (string='noTM'). Per default, SignalP 4.1 uses the NN with TM as a preprocessor to determine whether to use TM or noTM in the final prediction (if 4 or more positions are predicted to be in a transmembrane state, TM is used, otherwise SignalP-noTM). Reference: http://www.cbs.dtu.dk/services/SignalP/instructions.php
aa_removed
: A integer giving the number of amino acids removed when the signal peptide is cleaved off.
transcriptClevageAfter
: The transcript position of the last nucleotide in the isoform which is removed when the signal peptide is cleaved off.
genomicClevageAfter
: The genomic position of the last nucleotide in the isoform which is removed when the signal peptide is cleaved off.
Kristoffer Vitting-Seerup
This function
: Vitting-Seerup et al. The Landscape of Isoform Switches in Human Cancers. Mol. Cancer Res. (2017).
SignalP
: Almagro et al. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat. Biotechnol (2019).
createSwitchAnalyzeRlist
extractSequence
analyzePFAM
analyzeNetSurfP3
analyzeCPAT
analyzeSwitchConsequences
### Load example data
data("exampleSwitchListIntermediary")
exampleSwitchListIntermediary
### Add SignalP analysis
exampleSwitchListAnalyzed <- analyzeSignalP(
switchAnalyzeRlist = exampleSwitchListIntermediary,
pathToSignalPresultFile = system.file(
"extdata/signalP_results.txt",
package = "IsoformSwitchAnalyzeR")
)
exampleSwitchListAnalyzed
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.