overrep_kmer: Generate overrepresented kmers of length k based on their...

Description Usage Arguments Value Examples

View source: R/overrep_kmer.R

Description

Generate overrepresented kmers of length k based on their observed to expected ratio at each position across all sequences in the dataset. The expected proportion of a length k kmer assumes site independence and is computed as the sum of the count of each base pair in the kmer times the probability of observing that base pair in the data set, i.e. P(A)count_in_kmer(A)+P(C)count_in_kmer(C)+... The observed to expected ratio is computed as log2(obs/exp). Those with obsexp_ratio > 2 are considered to be overrepresented and appear in the returned data frame along with their position in the sequence.

Usage

1
overrep_kmer(infile, k, output_file = NA)

Arguments

infile

path to gzipped FASTQ file

k

the kmer length

output_file

File to save plot to. Default NA.

Value

Data frame with columns: Position (in read), Obsexp_ratio, & Kmer

Examples

1
2
3
infile <-system.file("extdata", "test.fq.gz",
    package = "qckitfastq")
overrep_kmer(infile,k=4)

compbiocore/qckitfastq documentation built on Sept. 20, 2019, 9:30 a.m.