Description Usage Arguments Details Value Author(s) References See Also Examples
This function removes genes/isoforms from a switchAnalyzeRlist with the aim of allowing faster processing time as well as more trustworthy results.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | preFilter(
switchAnalyzeRlist,
geneExpressionCutoff = 1,
isoformExpressionCutoff = 0,
IFcutoff=0.01,
acceptedGeneBiotype = NULL,
acceptedIsoformClassCode = NULL,
removeSingleIsoformGenes = TRUE,
reduceToSwitchingGenes=FALSE,
reduceFurtherToGenesWithConsequencePotential = FALSE,
onlySigIsoforms = FALSE,
keepIsoformInAllConditions=FALSE,
alpha=0.05,
dIFcutoff = 0.1,
quiet=FALSE
)
|
switchAnalyzeRlist |
A |
geneExpressionCutoff |
The expression cutoff (most likely in TPM/RPKM/FPKM) which the average expression in BOTH condisions must be higher than. NULL disables the filter (Not recomended). Default is 1 FPKM/TPM/RPKM.). |
isoformExpressionCutoff |
The expression cutoff (most likely in RPKM/FPKM) which isoforms must be expressed more than, in at least one conditions of a comparison. NULL disables the filter. Default is 0 (which removes completely unused isoforms). |
IFcutoff |
The cutoff on isoform usage (measured as Isoform Fraction, see details) which isoforms must be used more than in at least one conditions of a comparison. NULL disables the filter. Default is 0 (which removes non-contributing isoforms). |
acceptedGeneBiotype |
A vector of strings indicating which gene biotypes (data typically obtained from GTF files). Can be any biotype annotated, the most common being: "protein_coding", "lincRNA" and "antisense". Default is NULL. |
acceptedIsoformClassCode |
A vector of strings indicating which cufflinks class codes are accepted. Can only be used if data origins from cufflinks. For an updated list with full description see the bottom of this website: http://cole-trapnell-lab.github.io/cufflinks/cuffcompare/#tracking-transfrags-through-multiple-samples-outprefixtracking. Set to NULL to disable. Default is NULL. |
removeSingleIsoformGenes |
A logic indicating whether to only keep genes containing more than one isoform (in any comparison, after the other filters have been applied). Default is TRUE. |
reduceToSwitchingGenes |
A logic indicating whether the switchAnalyzeRlist should be reduced to the genes which contains significant switching (as indicated by the |
reduceFurtherToGenesWithConsequencePotential |
A logic indicating whether the switchAnalyzeRlist should be reduced to the genes which have the potential to find isoform switches with predicted consequences. This argument is a more strict version of |
onlySigIsoforms |
A logic indicating whether both isoforms the pairs considered if |
keepIsoformInAllConditions |
A logic indicating whether the an isoform should be kept in all comparisons even if it is only passes the filters in one comparison. Default is FALSE. |
alpha |
The cutoff which the FDR correct p-values must be smaller than for calling significant switches. Only considered if |
dIFcutoff |
The cutoff which the changes in (absolute) isoform usage must be larger than before an isoform is considered switching. This cutoff can remove cases where isoforms with (very) low dIF values are deemed significant and thereby included in the downstream analysis. This cutoff is analogous to having a cutoff on log2 fold change in a normal differential expression analysis of genes to ensure the genes have a certain effect size. Only considered if |
quiet |
A logic indicating whether to avoid printing progress messages. Default is FALSE |
The filtering works by first requiring that the average isoforms/genes expression/usage across all samples is expressed above the cutoffs supplied, then the data is filtered for isoform classes and lastly for single-isoform genes.
Especially the filter for gene expression can be important since a fundamental problem with the IF values (calculated as <isoform_exp> / <gene_exp>) is when the gene expression is low it causes the IF measure to loose precision. This can easily be illustrated with the following example: Lets consider a gene with two isoforms which are expressed so they contribute to the gene expression with 73.3% and 26.7%, if we have 100 RNA-seq reads to describe these the problem is easy and we recapitulate the 73%/27% ratio. If we only have 10 reads the measurements get a little more inaccurate since the estimates now will be 70% vs 30%. If the gene is even lower expressed say 5 reads the estimates become 80%/20%. Therefore we want to filter out these genes.
Please note that for the exon entry as well as any replicate matrix entry (counts, abundances or isoform fractions) all isoforms from genes where at least one isoform passed the filters are kept.
A switchAnalyzeRlist
object where the genes and isoforms not passing the filters have been removed (from all annotated entries)
Kristoffer Vitting-Seerup
Vitting-Seerup et al. The Landscape of Isoform Switches in Human Cancers. Mol. Cancer Res. (2017).
createSwitchAnalyzeRlist
importCufflinksFiles
importRdata
1 2 3 4 5 6 7 | data("exampleSwitchList")
exampleSwitchListFiltered <- preFilter(
exampleSwitchList,
geneExpressionCutoff = 1,
isoformExpressionCutoff = 0,
removeSingleIsoformGenes = TRUE
)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.