Chi-Yun Wu, Zhang Lab, University of Pennsylvania
For scDNA-seq data, Alleloscope enables allele-specific copy number profiling at the single cell level to 1. detect complex multi-allelic copy number alterations (including copy neutral loss-of-heterozygosity and mirrored subclones that have the same total copy number) and 2. reconstruct tumor lineages based on the multi-allelic copy number profile.
For more information about the method, please check out the github and the paper.
The following are the input files for different steps.
SNPs are recommended to be called from the bam file of the matched normal samples. Without matched normal samples, our results show that calling SNPs from the tumor/cellline sample itself can also work.
A tsv file with all cell barcodes. EXAMPLE
The "barcodes.tsv" files are the standard outputs of the Cell Ranger software.
SNP by cell (sparse) matrices for both reference allele and alternative alleles. EXAMPLE
The information for each SNP should be in the vcf file, the labeling for each cell should be in the barcodes.tsv file (with the same order).
Bin by cell (sparse) matrices for tumor and normal samples. EXAMPLE
library(Alleloscope) # load the library
setwd("~/Alleloscope/") # set path to the github folder
dir_path <- "./samples/P5931/scDNA/output/"; dir.create(dir_path) # set up output directory
data(centromere.GRCh38)
data(telomere.GRCh38)
size=read.table("data-raw/sizes.cellranger-GRCh38-1.0.0.txt", stringsAsFactors = F)
# SNP by cell matrices for ref and alt alleles
barcodes=read.table("data-raw/P5931/scDNA/barcodes_sub.tsv", sep='\t', stringsAsFactors = F, header=F)
alt_all=readMM("data-raw/P5931/scDNA/alt_all_sub.mtx")
ref_all=readMM("data-raw/P5931/scDNA/ref_all_sub.mtx")
var_all=read.table("data-raw/P5931/scDNA/var_all_sub.vcf", header = F, sep='\t', stringsAsFactors = F)
# bin by cell matrices for tumor and normal for segmentation
raw_counts=read.table("data-raw/P5931/scDNA/tumor_sub.txt", sep='\t', header=T, row.names = 1,stringsAsFactors = F)
ref_counts=read.table("data-raw/P5931/scDNA/normal_sub.txt", sep='\t', header=T, row.names = 1,stringsAsFactors = F)
Obj=Createobj(alt_all =alt_all, ref_all = ref_all, var_all = var_all ,samplename='P5931', genome_assembly="GRCh38", dir_path=dir_path, barcodes=barcodes, size=size, assay='scDNAseq')
Obj_filtered=Matrix_filter(Obj=Obj, cell_filter=10, SNP_filter=10, min_vaf = 0, max_vaf = 1, centro=centromere.GRCh38, telo=telomere.GRCh38)
set.seed(2021)
Obj_filtered=Est_regions(Obj_filtered = Obj_filtered, max_nSNP = 30000, plot_stat = T, cont=T)
# Recommend max_nSNP <50000
# Regions without allelic imbalence do not coverge (Reach the max number of iterations.)
Obj_filtered=Select_normal(Obj_filtered = Obj_filtered, raw_counts=raw_counts, plot_theta = TRUE)
# add "select_normal" list to the Obj_filtered object.
# The list includes barcodes for the normal cells and some candidate "normal regions"
print(Obj_filtered$select_normal$region_normal)
Obj_filtered$ref=Obj_filtered$select_normal$region_normal[1] # choose one normal region
Obj_filtered=Genotype_value(Obj_filtered = Obj_filtered, type='tumor', raw_counts=raw_counts) # for tumor
Obj_filtered=Genotype(Obj_filtered = Obj_filtered )
The output genotying results for the five regions are shown below.
linplot=Lineage_plot(Obj_filtered = Obj_filtered, nSNP = 2000, nclust = 3)
In the scatter plot of chr3 (Step5) above, there are few cells located near but not at the canonical point (1.5, 0.66). To improve the estimation, Alleloscope is able to perform second-round estimation only on these few cells with the example codes shown below.
Select target cells
sub_cells=rownames(Obj_filtered$genotypes[which(Obj_filtered$genotypes[,2]!=4),]) #select abnormal cells (4 represents diploid)
Obj_filtered=Est_regions(Obj_filtered = Obj_filtered, max_nSNP = 50000, plot_stat = T, sub_cells =sub_cells, sub_region = '3', max_iter = 100 )
# theta_hat values are updated in the rds_list for the selected region.
# Estimate genotype values
Obj_filtered=Genotype_value(Obj_filtered = Obj_filtered, type='tumor', raw_counts=raw_counts)
# Genotype each cell
Obj_filtered=Genotype(Obj_filtered = Obj_filtered, plot_path="./samples/P5931/scDNA/output/plots/gtype_scatter_updated.pdf")
The updated genotying results for chr3 are shown below.
Wu, C.-Y. et al. Integrative single-cell analysis of allele-specific copy number alterations and chromatin accessibility in cancer. Nature Biotechnology (2021): https://doi.org/10.1038/s41587-021-00911-w
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.