Description Usage Arguments Value See Also Examples
This function is analogous to
normalizeMotifs
. If an
analysis of mutational signatures is performed on e.g. Whole Exome
Sequencing (WES) data, the signatures and exposures have to be adapted to
the potentially different kmer (trinucleotide) content of the target
capture. The present function takes as arguments paths to the used reference
genome and target capture file. It the extracts the sequence of the target
capture by calling bedtools getfasta
on the system command prompt.
run_kmer_frequency_normalization
then calls a custom made perl script
kmer_frequencies.pl
also included in this package to count the
occurences of the tripletts in both the whole reference genome and the
created target capture sequence. These counts are used for normalization as
in normalizeMotifs
. Note that
kmerFrequency
provides a solution to
approximate kmer frequencies by random sampling. As opposed to that
approach, the function described here deterministically counts all
occurences of the kmers in the respective genome.
1 2 | run_kmer_frequency_normalization(in_ref_genome_fasta, in_target_capture_bed,
in_word_length, project_folder, in_verbose = 1)
|
in_ref_genome_fasta |
Path to the reference genome fasta file used. |
in_target_capture_bed |
Path to a bed file containing the information on the used target capture. May also be a compressed bed. |
in_word_length |
Integer number defining the length of the features or motifs, e.g. 3 for tripletts or 5 for pentamers |
project_folder |
Path where the created files, especially the fasta file with the sequence of the target capture and the count matrices, can be stored. |
in_verbose |
Verbose if |
A numeric vector with correction factors
1 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.