normalize: Normalize calls

View source: R/appreci8R_classical.R

normalizeR Documentation

Normalize calls


appreci8R combines and filters the output of different variant calling tools according to the 'appreci8'-algorithm. In the 2nd analysis step, all calls are normalized with respect to reporting indels, MNVs, reporting of several alternate alleles and reporting of complex indels. A GRanges object with all normalized calls is returned.


normalize(output_folder, caller_name, target_calls, caller_indels_pm,



The folder to write the output files into. If an empty string is provided, no files are written out.


Name of the variant calling tool (only necessary if an output folder is provided).


List of data.frames. One list element per sample. FilterTarget()-output can directly be taken as input.


Deletions are reported with a “minus” (e.g. C > -A), insertions are reported with a “plus” (e.g. C > +A) (TRUE or FALSE).


MNVs are reported (e.g. CA > GT instead of C > G and A > T; or CGAG > TGAT instead of C > T and G > T) (TRUE or FALSE).


The function normalize covers two to four normalization steps:

1) Check alternative bases: Calls containing a “comma” are split up. A call like C > A,G is converted to C > A and an additional C > G call. This enables evaluation of the output of different callers while caller1 reports C > A,G, caller2 only reports C > A and caller3 only reports C > G. This normalization step is always performed.

2) Find string differences: Calls are checked for un-mutated bases. The smallest option of reporting a variant at the left-most position is chosen. For example, CAAAC > CAAC is converted to CA > C. This normalization step is always performed.

3) Convert indels: If deletions are reported with a “minus” and insertions are reported with a “plus”, these are converted. An deletion like C > -G is converted to CG > C, while an insertion like C > +G is converted to C > CG. This normalization step is only performed if caller_indels_pm is TRUE.

4) Convert MNVs: If MNVs are reported, these are converted. This enables evaluation of the output of different callers if not all callers report all mutations being part of an MNV. A call like CA > GT is split up to a C > G and an A > T variant. But also a call like CGAG > TGAT is split up to C > T and G > T (G > G and A > A are not reported as they do not pass the normalization step “Find string differences”). This normalization step is only performed if caller_mnvs is TRUE.


A GRanges object is returned (metadata columns: SampleID, Ref, Alt).

If an output folder is provided, the output is saved as <caller_name>.normalized.txt.


Sarah Sandmann <>

See Also

appreci8R, appreci8Rshiny, filterTarget, annotate, combineOutput, evaluateCovAndBQ, determineCharacteristics, finalFiltration


sample1<-data.frame(SampleID = c("Sample1","Sample1","Sample1"),
                    Chr = c("2","17","X"),
                    Pos = c(25469502,7579472,15838366),
                    Ref = c("CAG","G","C"),
                    Alt = c("TAT","C","T,A"))
sample2<-data.frame(SampleID = c("Sample2","Sample2","Sample2","sample2"),
                    Chr = c("4","12","12","21"),
                    Pos = c(106196951,12046289,12046341,36164405),
                    Ref = c("A","C","A","GGG"),
                    Alt = c("G","+AAAG","G","TGG"))
input<-list(sample1, sample2)

normalized<-normalize("", "", input, TRUE, TRUE)

sandmanns/appreci8R documentation built on Dec. 23, 2024, 11:32 a.m.