Chi-Yun Wu, Zhang Lab, University of Pennsylvania
With matched scDNA-seq and scATAC-seq data, Alleloscope is able to integrate allele-specific copy number alterations (CNAs) (genomics) and chromatin accessibility (epigenetics). Based on allele-specific CNAs profiled in scDNA-seq data, a tumor lineage structure can be reconstructed with several subclones identified. Then each cells in scATAC-seq data can be confidently assigned to the detected subclones by matching multiple allele-specific CNAs detected independently from either scDNA-seq data or scATAC-seq data. This will facilitate dissection of the contributions of chromosomal instability and chromatin remodeling in tumor evolution.
For more information about the method, please check out the github and the paper.
The following are the input files for different steps.
SNPs are recommended to be called from the bam file of the matched normal samples. Without matched normal samples, our results show that calling SNPs from the tumor/cellline sample itself can also work.
A tsv file with all cell barcodes. EXAMPLE
The "barcodes.tsv" files are the standard outputs of the Cell Ranger software.
SNP by cell (sparse) matrices for both reference allele and alternative alleles. EXAMPLE
The information for each SNP should be in the vcf file, the labeling for each cell should be in the barcodes.tsv file (with the same order).
Bin by cell (sparse) matrices for tumor samples. EXAMPLE
For scATAC-seq data, peak by cell matrix can be converted to bin by cell matrix by summing up the signals or using standard fragments.tsv files (Cell Ranger output) (Example script).
An alleloscope object generated from matched scDNA-seq data. Tutorial for generating scDNA-seq object can be found here.
library(Alleloscope) # load the library
setwd("~/Alleloscope/") # set path to the github folder
dir_path <- "./samples/SNU601/scATAC/output/"; dir.create(dir_path) # set up output directory
data(centromere.GRCh38)
data(telomere.GRCh38)
size=read.table("data-raw/sizes.cellranger-GRCh38-1.0.0.txt", stringsAsFactors = F)
# SNP by cell matrices for ref and alt alleles
barcodes=read.table("data-raw/SNU601/scATAC/barcodes.tsv", sep='\t', stringsAsFactors = F, header=F)
alt_all=readMM("data-raw/SNU601/scATAC/alt_all.mtx")
ref_all=readMM("data-raw/SNU601/scATAC/ref_all.mtx")
var_all=read.table("data-raw/SNU601/scATAC/var_all.vcf", header = F, sep='\t', stringsAsFactors = F)
# bin by cell matrices for tumor and normal for segmentation
raw_counts=read.table('data-raw/SNU601/scATAC/chr200k_fragments_sub.txt', sep='\t', header=T, row.names = 1,stringsAsFactors = F)
# Without paired normal sample, use matched scDNA-seq result to help normalize coverge for scATAC-seq data.
Obj_scDNA=readRDS("data-raw/SNU601/scATAC/SNU601_dna.rds")
Obj=Createobj(alt_all =alt_all, ref_all = ref_all, var_all = var_all ,samplename='Sample', genome_assembly="GRCh38", dir_path=dir_path, barcodes=barcodes, size=size, assay='scATACseq')
Obj_filtered=Matrix_filter(Obj=Obj, cell_filter=5, SNP_filter=5, centro=centromere.GRCh38, telo=telomere.GRCh38)
# Since phasing information is estimated in the matched scDNA-seq dataset,
# loose filter: cell_filter=5 and SNP_filter=5 can be used.
# No further filter for extreme VAF values is needed.
Obj_filtered$seg_table_filtered=Obj_scDNA$seg_table_filtered
Obj_filtered = Est_regions(Obj_filtered = Obj_filtered, max_nSNP = 30000, min_cell = 20, phases = Obj_scDNA$rds_list, plot_stat = T, cont = TRUE)
# The phases for each SNP estimated from DNA sequencing data can help estimate the major haplotype proportion for each cell in scATAC-seq data.
# Recommend max_nSNP <50000
# Regions without allelic imbalence do not coverge (Reach the max number of iterations.)
Obj_filtered$ref=Obj_scDNA$ref # choose one normal region
Estimate cell-specific (rho_hat, theta_hat) values for each region.
Set ref_gv = "genotype_values" (from scDNA-seq) to help with rho_hat estimation for scATAC-seq data.
Obj_filtered=Genotype_value(Obj_filtered = Obj_filtered, type='cellline', raw_counts=raw_counts, cov_adj =1 ,ref_gtv = Obj_scDNA$genotype_values)
Genotype all cells for each region and generate a genotype plot
Set ref_gt = "genotypes" (from scDNA-seq) to help estimate haplotype profiles for each cell in scATAC-seq data.
Obj_filtered=Genotype(Obj_filtered = Obj_filtered, ref_gt = Obj_scDNA$genotypes,xmax=4)
The genotying results for the 10 marker regions are shown below.
More explanation about the colors can be found here.
clone.genotypes=readRDS("./data-raw/SNU601/scATAC/clone.genotypes.rds")
Obj_filtered=AssignClones_ref(Obj_filtered=Obj_filtered, clone.genotypes=clone.genotypes)
umap_peak=readRDS("./data-raw/SNU601/scATAC/peak_umap.rds")
Clone=Obj_filtered$cloneAssign$cloneAssign[match(rownames(umap_peak), names(Obj_filtered$cloneAssign$cloneAssign))]
umap_peak=cbind(umap_peak, Clone)
The two signals can be visualized simultaneously for each cell in the scATAC-seq data.
Wu, C.-Y. et al. Integrative single-cell analysis of allele-specific copy number alterations and chromatin accessibility in cancer. Nature Biotechnology (2021): https://doi.org/10.1038/s41587-021-00911-w
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.