CNproScan is R package developed for CNV detection in bacterial genomes. It employs Generalized Extreme Studentized Deviate test for outliers to detect CNVs in read-depth data with discordant reads detection to annotate the CNVs. It was tested and proven to be able to detect short CNVs. Following text is a workflow showcase. The CNproScan consist of a single function CNproScanCNV which carries the whole procedure. The steps necessary to get input files is also explained at the GitHub repository.
CNproScan github repository and issues reporting: https://github.com/robinjugas/CNproScan
Install CNproScan
from Bioconductor:
## try http:// if https:// URLs are not supported source("https://bioconductor.org/biocLite.R") biocLite("CNproScan")
or install the most current release from Github:
install.packages("devtools") library(devtools) install_github("robinjugas/CNproScan")
Apply the following steps to get the files needed to CNV detection.
# Prerequest: reference fasta file, samtools, bwa aligner # Alignment bwa index -a is reference.fasta samtools faidx reference.fasta bwa mem reference.fasta read1.fq read2.fq > file.sam # BAM processing samtools view -b -F 4 file.sam > file.bam # mapped reads only samtools sort -o file.bam file1.bam samtools index file.bam # Calculate coverage with zero coverage reported samtools depth -a file.bam > file.coverage # Genome mappability file by GENMAP (https://github.com/cpockrandt/genmap) # only for mappability normalization genmap index -F reference.fasta -I mapp_index genmap map -K 30 -E 2 -I mapp_index -O mapp_genmap -t -w -bg
library("CNproScan") # Working directory with files setwd("workdir") # File paths fasta_file <- "reference.fasta" bam_file <- "file.bam" coverage_file <- "file.coverage" bedgraph_file <- "mapp_genmap.bedgraph" # For only GC normalization DF <- CNproScanCNV(coverage_file, bam_file, fasta_file, GCnorm=TRUE, MAPnorm=FALSE, cores=4) # Without any normalization DF <- CNproScanCNV(coverage_file, bam_file, fasta_file, GCnorm=FALSE, MAPnorm=FALSE, cores=4) # Both GC normalization and mappability normalization DF <- CNproScanCNV(coverage_file, bam_file, fasta_file, GCnorm=TRUE, MAPnorm=TRUE, bedgraph_file, cores=4)
The output is a data.frame object with several descriptive columns. Also, the VCF file is written into the working directory.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.