estGeno: Genotype estimation using a hiden Morkov model

estGenoR Documentation

Genotype estimation using a hiden Morkov model

Description

Clean up genotype data by error correction based on genotype estimation using a hidden Markov model.

Usage

estGeno(
  object,
  node = "raw",
  recomb_rate = 0.04,
  error_rate = 0.0025,
  call_threshold = 0.9,
  het_parent = FALSE,
  optim = TRUE,
  iter = 2,
  n_threads = 1,
  dummy_reads = 5,
  ...
)

## S4 method for signature 'GbsrGenotypeData'
estGeno(
  object,
  node,
  recomb_rate,
  error_rate,
  call_threshold,
  het_parent,
  optim,
  iter,
  n_threads,
  dummy_reads
)

Arguments

object

A GbsrGenotypeData object.

node

Either "raw" or "filt" to indicate whether raw or filtered read counts are used for genotype estimation. See setCallFilter() for the details of filtered read counts.

recomb_rate

A numeric value to indicate the expected recombination frequency per chromosome per megabase pairs.

error_rate

A numeric value of the expected sequence error rate.

call_threshold

A numeric value of the probability threshold to accept estimated genotype calls.

het_parent

A logical value to indicate whether parental samples are outbred or inbred. If FALSE, this function assume all true genotype of markers in parents are homozygotes.

optim

A logical value to specify whether to conduct parameter optimization for error correction.

iter

An integer value to specify the number of iterative parameter updates.

n_threads

An integer value to specify the number of threads used for the calculation. The default is 1 and if n_threads = NULL, automatically set half the number of available threads on the computer.

dummy_reads

An integer to specify the number of dummy reads to assign to dummy parental samples for genotype estimation. See details.

...

Unused.

Details

If you have not set parental samples by setParents() and initialized the scheme object using initScheme(), you have the scheme object without explicit parental information that is assumed to be a bi-parental population. In this case, estGeno() will run in the parentless mode. In the parentless mode, the algorithm assumes that the given population is a bi-parental population. The number of reference allele reads and the number of alternative allele reads of the dummy parents are set based on dummy_reads, respectively. Dummy parent 1 has dummy_reads of the reference allele reads and 0 alternative allele reads at all markers, while dummy parent 2 has 0 and dummy_reads of reference and alternative allele reads at all markers. If the parents of your population were outbred lines or you cannot assume one of the parents has completely reference homozygotes and another has laternative homozygotes at all markers, Set dummy_reads = 0 to leave uncertainty to estimate parental genotypes based on the offspring genotypes. Nevertheless, the parentless mode is less accurate and has more chance to get a genotype estimate randomly selected from the equally likely genotype estimates.

Value

A GbsrGenotypeData object in which the "estimated.haplotype", "corrected.genotype" and "parents.genotype" nodes were added.

Examples

# Load data in the GDS file and instantiate a [GbsrGenotypeData] object.
gds_fn <- system.file("extdata", "sample.gds", package = "GBScleanR")
gds <- loadGDS(gds_fn)

# Find the IDs of parental samples.
parents <- grep("Founder", getSamID(gds), value = TRUE)

# Set the parents and flip allele information
# if the reference sample (Founder1 in our case) has homozygous
# alternative genotype at some markers of which alleles will
# be swapped to make the reference sample have homozygous
# reference genotype.
gds <- setParents(gds, parents = parents)

# Initialize a scheme object stored in the slot of the GbsrGenotypeData.
# We chose `crosstype = "pair"` because two inbred founders were mated
# in our breeding scheme.
# We also need to specify the mating matrix which has two rows and
# one column with integers 1 and 2 indicating a sample (founder)
# with the memberID 1 and a sample (founder) with the memberID 2
# were mated.
gds <- initScheme(gds, mating = cbind(c(1:2)))

# Add information of the next cross conducted in our scheme.
# We chose 'crosstype = "selfing"', which do not require a
# mating matrix.
gds <- addScheme(gds, crosstype = "selfing")

# Execute error correction by estimating genotype and haplotype of
# founders and offspring.
gds <- estGeno(gds)

# Close the connection to the GDS file.
closeGDS(gds)


tomoyukif/GBScleanR documentation built on Oct. 31, 2024, 2:43 a.m.