flowjo_to_gatingset: Parse a flowJo Workspace

View source: R/flowJoWorkspace_Methods.R

parseWorkspaceR Documentation

Parse a flowJo Workspace

Description

Function to parse a flowJo Workspace, generate a GatingHierarchy or GatingSet object, and associated flowCore gates. The data are not loaded or acted upon until an explicit call to recompute() is made on the GatingHierarchy objects in the GatingSet.

Usage

parseWorkspace(obj, ...)

## S4 method for signature 'flowjo_workspace'
parseWorkspace(obj, ...)

flowjo_to_gatingset(
  ws,
  name = NULL,
  subset = list(),
  execute = TRUE,
  path = "",
  cytoset = NULL,
  backend_dir = tempdir(),
  backend = get_default_backend(),
  includeGates = TRUE,
  additional.keys = "$TOT",
  additional.sampleID = FALSE,
  keywords = character(),
  keywords.source = "XML",
  keyword.ignore.case = FALSE,
  extend_val = 0,
  extend_to = -4000,
  channel.ignore.case = FALSE,
  leaf.bool = TRUE,
  include_empty_tree = FALSE,
  skip_faulty_gate = FALSE,
  compensation = NULL,
  transform = TRUE,
  fcs_file_extension = ".fcs",
  greedy_match = FALSE,
  mc.cores = 1,
  ...
)

Arguments

obj

flowjo_workspace

...

Additional arguments to be passed to FCS parser

ws

A flowjo_workspace to be parsed.

name

numeric or character. The name or index of the group of samples to be imported. If NULL, the groups are printed to the screen and one can be selected interactively. Usually, multiple groups are defined in the flowJo workspace file.

subset

numeric vector specifying the subset of samples in a group to import. Or a character specifying the FCS filenames to be imported. Or an expression to be passed to 'subset' function to filter samples by 'pData' (Note that the columns referred by the expression must also be explicitly specified in 'keywords' argument)

execute

TRUE|FALSE a logical specifying if the gates, transformations, and compensation should be immediately calculated after the flowJo workspace have been imported. TRUE by default.

path

either a character scalar . it is a path to the fcs files that are to be imported. The code will search recursively, so you can point it to a location above the files.

cytoset

a cytoset object that provides the alternative data source other than FCS files. It is useful sometime to preprocess the raw fcs files (e.g. standardize channels using cytoqc package) and then directly use them for flowJo parsing. when cytoset is provided, path argument is ignored.

includeGates

logical Should gates be imported, or just the data with compensation and transformation?

additional.keys

character vector: The keywords (parsed from FCS header) to be combined(concatenated with "_") with FCS filename to uniquely identify samples. Default is '$TOT' (total number of cells) and more keywords can be added to make this GUID.

additional.sampleID

boolean: A boolean specifying whether to include the flowJo sample ID in a GUID to uniquely identify samples. This can be helpful when the filename or other keywords are not enough to differentiate between samples. Default is FALSE.

keywords

character vector specifying the keywords to be extracted as pData of GatingSet

keywords.source

character the place where the keywords are extracted from, can be either "XML" or "FCS"

keyword.ignore.case

a logical flag indicates whether the keywords matching needs to be case sensitive.

extend_val

numeric the threshold that determine wether the gates need to be extended. default is 0. It is triggered when gate coordinates are below this value.

extend_to

numeric the value that gate coordinates are extended to. Default is -4000. Usually this value will be automatically detected according to the real data range. But when the gates needs to be extended without loading the raw data (i.e. execute is set to FALSE), then this hard-coded value is used.

channel.ignore.case

a logical flag indicates whether the colnames(channel names) matching needs to be case sensitive (e.g. compensation, gating..)

leaf.bool

a logical whether to compute the leaf boolean gates. Default is TRUE. It helps to speed up parsing by turning it off when the statistics of these leaf boolean gates are not important for analysis. (e.g. COMPASS package will calculate them by itself.) If needed, they can be calculated by calling recompute method at later stage.

include_empty_tree

a logical whether to include samples that don't have gates.

skip_faulty_gate

a logical whether to skip the faulty gates so that the parser can still process the rest of gating tree.

compensation

a compensation object, matrix or data.frame or a list of these objects that allow the customized compensation () to be used instead of the one specified in flowJo workspace or FCS file. When it is a list, its names is supposed to be matched to sample guids (Default is the fcs filename suffixed by $TOT. See "additional.keys" arguments for details of guids) When some of the samples don't have the external compensations matched, it will fall back to the flowJo xml or FCS looking for the compensation matrix.

transform

logical to enable/disable transformation of gates and data. Default is TRUE. It is mainly for debug purpose (when the raw gates need to be parsed.), and only valid when execute is FALSE.

fcs_file_extension

default is ".fcs"

greedy_match

logical: By default, if flowjo_to_gatingset finds multiple FCS files matching a sample by total event count as well as sampleID and/or keywords specified by additional.keys and additional.sampleID, it will return an error listing the duplicate files. If greedy_match is TRUE, the method will simply take the first file with either filename or $FIL keyword matching the sample name and having the correct number of events.

mc.cores

numeric the number of threads to pass to the C++ parser to run in parallel

h5_dir

the path to write h5 data

Details

A flowjo_workspace is generated with a call to open_flowjo_xml(), passing the name of the xml workspace file. This returns a flowjo_workspace, which can be parsed using the flowjo_to_gatingset() method. The function can be called non-interactively by passing the index or name of the group of samples to be imported via flowjo_to_gatingset(obj,name=x), where x is either the numeric index, or the name. The subset argument allows one to select a set of files from the chosen sample group. The routine will take the intersection of the files in the sample group, the files specified in subset and the files available on disk, and import them.

Value

a GatingSet, which is a wrapper around a list of GatingHierarchy objects, each representing a single sample in the workspace. The GatingHierarchy objects contain graphNEL trees that represent the gating hierarchy of each sample. Each node in the GatingHierarchy has associated data, including the population counts from flowJo, the parent population counts, the flowCore gates generated from the flowJo workspace gate definitions. Data are not yet loaded or acted upon at this stage. To execute the gating of each data file, a call to execute() must be made on each GatingHierarchy object in the GatingSet. This is done automatically by default, and there is no more reason to set this argument to FALSE.

See Also

fj_ws_get_sample_groups,GatingSet

Examples

## Not run: 
	 #f is a xml file name of a flowJo workspace
	ws <- open_flowjo_xml(f)
 #parse the second group
	gs <- flowjo_to_gatingset(ws, name = 2); #assume that the fcs files are under the same folder as workspace

 
 gs <- flowjo_to_gatingset(ws, name = 4
                        , path = dataDir     #specify the FCS path 
                        , subset = "CytoTrol_CytoTrol_1.fcs")     #subset the parsing by FCS filename

 

 gs <- flowjo_to_gatingset(ws, path = dataDir, name = 4
                         , keywords = c("PATIENT ID", "SAMPLE ID", "$TOT", "EXPERIMENT NAME") #tell the parser to extract keywords as pData
                         , keywords.source = "XML" # keywords are extracted from xml workspace (alternatively can be set to "FCS")
                         , additional.keys = c("PATIENT ID") #use additional keywords together with FCS filename to uniquely identify samples
                         , execute = F) # parse workspace without the actual gating (can save time if just want to get the info from xml)

#subset by pData (extracted from keywords)
gs <- flowjo_to_gatingset(ws, path = dataDir, name = 4
                         , subset = `TUBE NAME` %in% c("CytoTrol_1", "CytoTrol_2")
                         , keywords = "TUBE NAME")


#overide the default compensation defined in xml with the customized compenstations
gs <- flowjo_to_gatingset(ws, name = 2, compensation = comps); #comp is either a compensation object or a list of compensation objects

## End(Not run)

RGLab/CytoML documentation built on Jan. 4, 2025, 3:40 a.m.