CNproScan is R package developed for CNV detection in bacterial genomes. It employs Generalized Extreme Studentized Deviate test for outliers to detect CNVs in read-depth data with discordant reads detection to annotate the CNVs type.
The not-updated Matlab version is here: https://github.com/robinjugas/CNproScanMatlab
This is the latest version v1.0 For previous versions see Tags/Releases.
Package was tested on R 4.x with several dependencies: parallel, foreach, doParallel, seqinr, Rsamtools, GenomicRanges, IRanges, data.table.
devtools::install_github("robinjugas/CNproScan")
Several input files are neccessary:
bwa index -a is reference.fasta
samtools faidx reference.fasta
bwa mem reference.fasta read1.fq read2.fq > file.sam
samtools view -b -F 4 file.sam > file.bam # mapped reads only
samtools sort -o file.bam file1.bam
samtools index file.bam
samtools depth -a file.bam > file.coverage
genmap index -F reference.fasta -I mapp_index
genmap map -K 30 -E 2 -I mapp_index -O mapp_genmap -t -w -bg
R script:
library("CNproScan")
# Working directory with files
setwd("workdir")
# File paths
fasta_file <- "reference.fasta"
bam_file <- "file.bam"
coverage_file <- "file.coverage"
bedgraph_file <- "mapp_genmap.bedgraph"
# For only GC normalization
DF <- CNproScanCNV(coverage_file, bam_file, fasta_file,
GCnorm=TRUE, MAPnorm=FALSE, ORICnorm=FALSE, cores=4)
# Without any normalization
DF <- CNproScanCNV(coverage_file, bam_file, fasta_file,
GCnorm=FALSE, MAPnorm=FALSE, ORICnorm=FALSE, cores=4)
# Both GC normalization and mappability normalization
DF <- CNproScanCNV(coverage_file, bam_file, fasta_file,
GCnorm=TRUE, MAPnorm=TRUE, ORICnorm=FALSE, bedgraph_file, cores=4)
# Both GC normalization, mappability normalization and OriC normalization
DF <- CNproScanCNV(coverage_file, bam_file, fasta_file,
GCnorm=TRUE, MAPnorm=TRUE, ORICnorm=TRUE, bedgraph_file, oriCposition=1, cores=4)
# or with multiple oriC positions
DF <- CNproScanCNV(coverage_file, bam_file, fasta_file,
GCnorm=TRUE, MAPnorm=TRUE, ORICnorm=TRUE, bedgraph_file, oriCposition=c(10,5000), cores=4)
Caution : OriC normalization is working only in single-chromosome mode!
# Write VCF file (additional function from the package)
writeVCF(DF, "fileName.vcf")
# write TAB-separated file (optional)
write.table(DF, file = "TSVfile.tsv", row.names=FALSE, col.names = TRUE, sep="\t")
BWA ignores the rest of FASTA header after the first whitespace. CNproScan expects all the headers to be the same. That means, the FASTA headers, BAM RNAME names and coverage file from samtools contain the same contig/chrosome names. The package uses seqinr::read.fasta where whole.header==FALSE crops header at the first whitespace. If this behaviour is issue, please post it as github issue.
Robin Jugas, Karel Sedlar, Martin Vitek, Marketa Nykrynova, Vojtech Barton, Matej Bezdicek, Martina Lengerova, Helena Skutkova, CNproScan: Hybrid CNV detection for bacterial genomes, Genomics, Volume 113, Issue 5, 2021, Pages 3103-3111, ISSN 0888-7543, https://doi.org/10.1016/j.ygeno.2021.06.040. (https://www.sciencedirect.com/science/article/pii/S0888754321002779)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.