setMarFilter: Filter out markers

setMarFilterR Documentation

Filter out markers

Description

Search markers which do not meet the criteria and label them as "invalid".

Usage

setMarFilter(
  object,
  id = NA_integer_,
  missing = 1,
  het = c(0, 1),
  mac = 0,
  maf = 0,
  ad_ref = c(0, Inf),
  ad_alt = c(0, Inf),
  dp = c(0, Inf),
  mean_ref = c(0, Inf),
  mean_alt = c(0, Inf),
  sd_ref = Inf,
  sd_alt = Inf,
  ...
)

## S4 method for signature 'GbsrGenotypeData'
setMarFilter(
  object,
  id,
  missing,
  het,
  mac,
  maf,
  ad_ref,
  ad_alt,
  dp,
  mean_ref,
  mean_alt,
  sd_ref,
  sd_alt
)

Arguments

object

A GbsrGenotypeData object.

id

A vector of integers matching with snp ID which can be retrieve by getMarID(). The markers with the specified IDs will be filtered out.

missing

A numeric value [0-1] to specify the maximum missing genotype call rate per marker

het

A numeric vector with length two [0-1] to specify the minimum and maximum heterozygous genotype call rate per marker

mac

A integer value to specify the minimum minor allele count per marker

maf

A numeric value to specify the minimum minor allele frequency per marker.

ad_ref

A numeric vector with length two specifying lower and upper limit of reference read counts per marker.

ad_alt

A numeric vector with length two specifying lower and upper limit of alternative read counts per marker.

dp

A numeric vector with length two specifying lower and upper limit of total read counts per marker.

mean_ref

A numeric vector with length two specifying lower and upper limit of mean of reference read counts per marker.

mean_alt

A numeric vector with length two specifying lower and upper limit of mean of alternative read counts per marker.

sd_ref

A numeric value specifying the upper limit of standard deviation of reference read counts per marker.

sd_alt

A numeric value specifying the upper limit of standard deviation of alternative read counts per marker.

...

Unused.

Details

For mean_ref, mean_alt, sd_ref, and sd_alt, this function calculate mean and standard deviation of reads obtained for samples at each SNP marker. If a mean read counts of a marker was smaller than the specified lower limit or larger than the upper limit, this function labels the marker as "invalid". In the case of sd_ref and sd_alt, standard deviations of read counts of each marker are checked and the markers having a larger standard deviation will be labeled as "invalid". To check valid and invalid markers, run validMar().

Value

A GbsrGenotypeData object with filters on markers.

Examples

# Load data in the GDS file and instantiate a [GbsrGenotypeData] object.
gds_fn <- system.file("extdata", "sample.gds", package = "GBScleanR")
gds <- loadGDS(gds_fn)

# Summarize the information needed for filtering.
gds <- countGenotype(gds)
gds <- countRead(gds)

gds <- setMarFilter(gds,
                      id = getMarID(gds)[1:100],
                      missing = 0.2,
                      dp = c(5, Inf))

# Close the connection to the GDS file.
closeGDS(gds)


tomoyukif/GBScleanR documentation built on Oct. 31, 2024, 2:43 a.m.