simplifyGOFromMultipleLists | R Documentation |
Perform simplifyGO analysis with multiple lists of GO IDs
simplifyGOFromMultipleLists(
lt,
go_id_column = NULL,
padj_column = NULL,
padj_cutoff = 0.01,
filter = function(x) any(x < padj_cutoff),
default = 1,
ont = NULL,
db = "org.Hs.eg.db",
measure = "Sim_XGraSM_2013",
heatmap_param = list(NULL),
show_barplot = TRUE,
method = "binary_cut",
control = list(),
min_term = NULL,
verbose = TRUE,
column_title = NULL,
...
)
lt |
A data frame, a list of numeric vectors (e.g. adjusted p-values) where each numeric vector has GO IDs as names, or a list of GO IDs. |
go_id_column |
Column index of GO ID if |
padj_column |
Column index of adjusted p-values if |
padj_cutoff |
Cut off for adjusted p-values. |
filter |
A self-defined function for filtering GO IDs. By default it requires GO IDs should be significant in at least one list. |
default |
The default value for the adjusted p-values. See Details. |
ont |
Pass to |
db |
Pass to |
measure |
Pass to |
heatmap_param |
Parameters for controlling the heatmap, see Details. |
show_barplot |
Whether draw barplots which shows numbers of significant GO terms in clusters. |
method |
Pass to |
control |
Pass to |
min_term |
Pass to |
verbose |
Pass to |
column_title |
Pass to |
... |
Pass to |
The input data can have three types of formats:
A list of numeric vectors of adjusted p-values where each vector has the GO IDs as names.
A data frame. The column of the GO IDs can be specified with go_id_column
argument and the column of the adjusted p-values can be
specified with padj_column
argument. If these columns are not specified, they are automatically identified. The GO ID column
is found by checking whether a column contains all GO IDs. The adjusted p-value column is found by comparing the column names of the
data frame to see whether it might be a column for adjusted p-values. These two columns are used to construct a numeric vector
with GO IDs as names.
A list of character vectors of GO IDs. In this case, each character vector is changed to a numeric vector where all values take 1 and the original GO IDs are used as names of the vector.
Now let's assume there are n
GO lists, we first construct a global matrix where columns correspond to the n
GO lists and rows correspond
to the "union" of all GO IDs in the lists. The value for the ith GO ID and in the jth list are taken from the corresponding numeric vector
in lt
. If the jth vector in lt
does not contain the ith GO ID, the value defined by default
argument is taken there (e.g. in most cases the numeric
values are adjusted p-values, default
is set to 1). Let's call this matrix as M0
.
Next step is to filter M0
so that we only take a subset of GO IDs of interest. We define a proper function via argument filter
to remove
GO IDs that are not important for the analysis. Functions for filter
is applied to every row in M0
and filter
function needs
to return a logical value to decide whether to remove the current GO ID. For example, if the values in lt
are adjusted p-values, the filter
function
can be set as function(x) any(x < padj_cutoff)
so that the GO ID is kept as long as it is signfiicant in at least one list. After the filter, let's call
the filtered matrix M1
.
GO IDs in M1
(row names of M1
) are used for clustering. A heatmap of M1
is attached to the left of the GO similarity heatmap so that
the group-specific (or list-specific) patterns can be easily observed and to corresponded to GO functions.
Argument heatmap_param
controls several parameters for heatmap M1
:
transform
: A self-defined function to transform the data for heatmap visualization. The most typical case is to transform adjusted p-values by -log10(x)
.
breaks
: break values for color interpolation.
col
: The corresponding values for breaks
.
labels
: The corresponding labels.
name
: Legend title.
# perform functional enrichment on the signatures genes from cola anlaysis
require(cola)
data(golub_cola)
res = golub_cola["ATC:skmeans"]
require(hu6800.db)
x = hu6800ENTREZID
mapped_probes = mappedkeys(x)
id_mapping = unlist(as.list(x[mapped_probes]))
lt = functional_enrichment(res, k = 3, id_mapping = id_mapping) # you can check the value of `lt`
# a list of data frames
simplifyGOFromMultipleLists(lt, padj_cutoff = 0.001)
# a list of numeric values
lt2 = lapply(lt, function(x) structure(x$p.adjust, names = x$ID))
simplifyGOFromMultipleLists(lt2, padj_cutoff = 0.001)
# a list of GO IDS
lt3 = lapply(lt, function(x) x$ID[x$p.adjust < 0.001])
simplifyGOFromMultipleLists(lt3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.