run_kmer_frequency_normalization: Provide normalized correction factors for kmer content
In eilslabs/YAPSA: Yet Another Package for Signature Analysis

Description Usage Arguments Value See Also Examples

This function is analogous to normalizeMotifs. If an analysis of mutational signatures is performed on e.g. Whole Exome Sequencing (WES) data, the signatures and exposures have to be adapted to the potentially different kmer (trinucleotide) content of the target capture. The present function takes as arguments paths to the used reference genome and target capture file. It the extracts the sequence of the target capture by calling bedtools getfasta on the system command prompt. run_kmer_frequency_normalization then calls a custom made perl script kmer_frequencies.pl also included in this package to count the occurences of the tripletts in both the whole reference genome and the created target capture sequence. These counts are used for normalization as in normalizeMotifs. Note that kmerFrequency provides a solution to approximate kmer frequencies by random sampling. As opposed to that approach, the function described here deterministically counts all occurences of the kmers in the respective genome.

1 2	run_kmer_frequency_normalization(in_ref_genome_fasta, in_target_capture_bed, in_word_length, project_folder, in_verbose = 1)

`in_ref_genome_fasta`	Path to the reference genome fasta file used.
`in_target_capture_bed`	Path to a bed file containing the information on the used target capture. May also be a compressed bed.
`in_word_length`	Integer number defining the length of the features or motifs, e.g. 3 for tripletts or 5 for pentamers
`project_folder`	Path where the created files, especially the fasta file with the sequence of the target capture and the count matrices, can be stored.
`in_verbose`	Verbose if `in_verbose=1`