map_data_anndata: Convert AnnData

View source: R/map_data_anndata.R

map_data_anndataR Documentation

Convert AnnData

Description

Convert an AnnData object across-species (gene orthologs) or within-species (gene synonyms).

Usage

map_data_anndata(
  obj,
  gene_map = NULL,
  input_col = "input_gene",
  output_col = "ortholog_gene",
  standardise_genes = FALSE,
  input_species = NULL,
  output_species = input_species,
  method = c("homologene", "gprofiler", "babelgene"),
  drop_nonorths = TRUE,
  non121_strategy = "drop_both_species",
  agg_fun = "sum",
  mthreshold = Inf,
  as_sparse = TRUE,
  as_delayedarray = FALSE,
  sort_rows = FALSE,
  test_species = NULL,
  chunk_size = NULL,
  verbose = TRUE
)

Arguments

gene_map

A data.frame that maps the current gene names to new gene names. This function's behaviour will adapt to different situations as follows:

  • gene_map=<data.frame> :
    When a data.frame containing the gene key:value columns (specified by input_col and output_col, respectively) is provided, this will be used to perform aggregation/expansion.

  • gene_map=NULL and input_species!=output_species :
    A gene_map is automatically generated by map_orthologs to perform inter-species gene aggregation/expansion.

  • gene_map=NULL and input_species==output_species :
    A gene_map is automatically generated by map_genes to perform within-species gene gene symbol standardization and aggregation/expansion.

input_col

Column name within gene_map with gene names matching the row names of X.

output_col

Column name within gene_map with gene names that you wish you map the row names of X onto.

standardise_genes

If TRUE AND gene_output="columns", a new column "input_gene_standard" will be added to gene_df containing standardised HGNC symbols identified by gorth.

input_species

Name of the input species (e.g., "mouse","fly"). Use map_species to return a full list of available species.

output_species

Name of the output species (e.g. "human","chicken"). Use map_species to return a full list of available species.

method

R package to use for gene mapping:

  • "gprofiler" : Slower but more species and genes.

  • "homologene" : Faster but fewer species and genes.

  • "babelgene" : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.

drop_nonorths

Drop genes that don't have an ortholog in the output_species.

non121_strategy

How to handle genes that don't have 1:1 mappings between input_species:output_species. Options include:

  • "drop_both_species" or "dbs" or 1 :
    Drop genes that have duplicate mappings in either the input_species or output_species
    (DEFAULT).

  • "drop_input_species" or "dis" or 2 :
    Only drop genes that have duplicate mappings in the input_species.

  • "drop_output_species" or "dos" or 3 :
    Only drop genes that have duplicate mappings in the output_species.

  • "keep_both_species" or "kbs" or 4 :
    Keep all genes regardless of whether they have duplicate mappings in either species.

  • "keep_popular" or "kp" or 5 :
    Return only the most "popular" interspecies ortholog mappings. This procedure tends to yield a greater number of returned genes but at the cost of many of them not being true biological 1:1 orthologs.

  • "sum","mean","median","min" or "max" :
    When gene_df is a matrix and gene_output="rownames", these options will aggregate many-to-one gene mappings (input_species-to-output_species) after dropping any duplicate genes in the output_species.

agg_fun

Aggregation function.

mthreshold

maximum number of results per initial alias to show. Shows all by default.

as_sparse

Convert aggregated matrix to sparse matrix.

sort_rows

Sort gene_df rows alphanumerically.

verbose

Print messages.

Value

AnnData


bschilder/scKirby documentation built on Oct. 2, 2024, 10:16 p.m.