knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
suppressPackageStartupMessages({
  library(comapr)
  library(SummarizedExperiment)
  })

Introduction

In the previous vignette, we demonstrated how to calculate genetic distances using genotyped markers from a group of samples.

Genetic distance calculate from genotype shifts of markers

In this document, we will focus on building individualized genetic maps from output files of sscocaller which is available at here.

Locate file path

The comapr package includes a list of toy example output files from the sscocaller command line tool. The follow code will get the file path that we will use later.

demo_path <-paste0(system.file("extdata",package = "comapr"),"/")

We can see that we have two samples (individual donors) and each of them has haplotype states inferred for chr1 to chr5.

list.files(demo_path)

File information

Diagnostic functions

comapr provides quality-check functions for examining SNP coverage per chr and per cell, chromosome segregation pattern checks, and summary statics for filtering low confidence crossovers.

perCellChrQC

pcQC <- perCellChrQC(sampleName="s1",chroms=c("chr1"),
                     path=paste0(demo_path,"/"),
                     barcodeFile=NULL)

Input parsing

Input-parsing functions are implemented in comapr to construct RangedSummarizedExpriment object that parses files generated from sscocaller and filter out low-confidence COs at the same time. For the demo dataset, these filters do not have any effects:

minSNP = 0, 
minlogllRatio = 50,
bpDist = 100,
maxRawCO=10,
minCellSNP = 1

Construct RangedSummarizedExpriment object

We first construct RangedSummarizedExpriment object from parsing output files from sscocaller and filter out low-confidence crossovers.

s1_rse_state <- readHapState("s1",chroms=c("chr1","chr2"),
                             path=demo_path,barcodeFile=NULL,
                             minSNP = 0,
                             minlogllRatio = 50,
                             bpDist = 100,
                             maxRawCO=10,
                             minCellSNP = 1)

s2_rse_state <- readHapState("s2",chroms=c("chr1","chr2"),
                             path=demo_path,
                             barcodeFile=NULL,
                             minSNP = 0,
                             minlogllRatio = 50,
                             bpDist = 100,
                             maxRawCO=10,
                             minCellSNP = 1)
s1_rse_state

The Viterbi states for SNP markers are stored in the "assay" slot:

assay(s1_rse_state)

The rowRanges contains the SNP's positions:

rowRanges(s1_rse_state)

Formate sample group factor

We have read in the Viterbi states for cells from two samples: s1 and s2. We now combine them into one data object for downstream analysis.

Add sample group factor

colData(s1_rse_state)

colData(s1_rse_state)$sampleGroup <- "s1"

colData(s2_rse_state)$sampleGroup <- "s2"

Combine two groups

We now call combineHapState function to combine sample s1 and sample s2 into one RangedSummarizedExperiment object.

twoSample <- combineHapState(s1_rse_state,
                             s2_rse_state)

Now the assay data slot contains the Viterbi states across SNP positions for the combined samples.

twoSample <- combineHapState(s1_rse_state,s2_rse_state)

Now the twoSample object contains the cells from both samples with assay slot contains the Viterbi states and rowRanges contains position annotaitons for the list SNP markers.

assay(twoSample)

Count crossovers

The countCOs function can then find out the crossover intervals according to the Viterbi states for each cell and the number of crossovers per cell is then calculated.

Count crossovers for SNP intervals

countCOs function will find the crossover intervals for each samples and attribute the observed crossovers from each sample to the corresponding intervals.

cocounts <- countCOs(twoSample)

The rowRanges from the resulting data object now denotes the crossover interval and the assay slot now contains the number of crossovers in each cell.

Now rowRanges contains the intervals

rowRanges(cocounts)

The colData slot still contains the annotations of each cell.

colData(cocounts)

Calculate genetic distances

The genetic distances can be calculated by using mapping functions such as the Kosambi or Haldane \cite{} and assay slot contains the number of crossovers found in each sample across these intervals.

assay(cocounts)

Calculate genetic distances

calGeneticDist function will convert the raw crossover frequencies to genetic distances via selected mapping function (ie. kosambi or haldane).

dist_gr <- calGeneticDist(co_count = cocounts, mapping_fun = "k")
dist_gr

The genetic distances for each interval are stored in rowData.

rowData(dist_gr)

The above genetic distances have been calculated using all samples. We can also specify the group factor so that we can calculate genetic distances for different sample groups:

## sampleGroup is a column in the colData slot
dist_gr <- calGeneticDist(co_count = cocounts,
                          group_by  = "sampleGroup",
                          mapping_fun = "k")

Now the group/Sample specific distances are stored in rowData

rowData(dist_gr)$kosambi

Plot whole genome genetic distances

We construct a GRange object from the dist_gr first.

p_gr <- granges(dist_gr)
mcols(p_gr) <- rowData(dist_gr)$kosambi

We can plot whole-genome genetic distances

plotWholeGenome(p_gr)

We can also do per chromosome plot

plotGeneticDist(p_gr,chr = "chr1",cumulative = TRUE)

Group differences

bootstrapDist function generates the sampling distribution of the difference in total genetic distances for the two groups under comparisons.

set.seed(100)
bootsDiff <- bootstrapDist(co_gr = cocounts,B=10,
                           group_by = "sampleGroup")
hist(bootsDiff)

From the bootsDiff data object, we can find a 95% confidence interval to test whether the difference is statistically significant.

quantile(bootsDiff,c(0.025,0.975),na.rm =TRUE)

An alternative re-sampling method, permuteDist, can be applied to generate a null distribution for the group difference by reshuffling the group labels across the samples.

set.seed(100)
perms <- permuteDist(cocounts,B=1000,group_by = "sampleGroup")

A p-value can be calculated using the statmod::permp function [@Phipson2010-xi].

statmod::permp(x = sum(perms$permutes>=perms$observed_diff,
                       na.rm = TRUE),
               nperm = sum(!is.na(perms$permutes)),
               n1 = perms$nSample[1],
               n2 = perms$nSample[2])

Session info

sessionInfo()


ruqianl/comapr documentation built on Dec. 24, 2024, 9:35 p.m.