Input: Input Functionalities.
In matdoering/openPrimeR: Multiplex PCR Primer Design and Analysis

Input

R Documentation

Input Functionalities.

Description

read_primers: Reads one or multiple input files with primer sequences. The input can either be in FASTA or in CSV format.
read_templates: Read one or multiple files with template sequences in FASTA or CSV format.
read_settings: Loads primer analysis settings from an XML file.
Templates: The Templates class encapsulates a data frame containing the sequencs of the templates, their binding regions, as well as additional information (e.g. template coverage).
Primers: The Primers class encapsulates a data frame representing a set of primers. Objects of this class store all properties associated with a set of primers, for example the results from evaluating the properties of a primer set or from determining its coverage.

Usage

Templates(...)

read_templates(
  fname,
  hdr.structure = NULL,
  delim = NULL,
  id.column = NULL,
  rm.keywords = NULL,
  remove.duplicates = FALSE,
  fw.region = c(1, 30),
  rev.region = c(1, 30),
  gap.char = "-",
  run = NULL
)

Primers(...)

read_primers(
  fname,
  fw.id = "_fw",
  rev.id = "_rev",
  merge.ambig = c("none", "merge", "unmerge"),
  max.degen = 16,
  template.df = NULL,
  adapter.action = c("warn", "rm"),
  sample.name = NULL,
  updateProgress = NULL
)

read_settings(
  filename = list.files(system.file("extdata", "settings", package = "openPrimeR"),
    pattern = "*.xml", full.names = TRUE),
  frontend = FALSE
)

Arguments

`...`	A data frame fulfilling the structural requirements for initializing a `Templates` or `Primers` object.
`fname`	Character vector providing either a single or multiple paths to FASTA or CSV files.
`hdr.structure`	A character vector describing the information contained in the FASTA headers. In case that the headers of `fasta.file` contain template group information, please include the keyword "GROUP" in `hdr.structure`. If the numer of elements provided via `hdr.structure` is shorter than the actual header structure, the missing fields are ignored.
`delim`	Delimiter for the information in the FASTA headers.
`id.column`	Field in the header to be used as the identifier of individual template sequences.
`rm.keywords`	A vector of keywords that are used to remove templates whose headers contain any of the keywords.
`remove.duplicates`	Whether duplicate sequence shall be removed.
`fw.region`	The positional interval from the template 5' end specifying the binding sites for forward primers. The default `fw.region` is set to the first 30 bases of the templates.
`rev.region`	The positional interval from the template 3' end specifying the binding sites for reverse primers. The default `rev.region` is set to the last 30 bases of the templates.
`gap.char`	The character in the input file representing gaps. Gaps are automatically removed upon input and the default character is "-".
`run`	An identifier for the set of template sequences. By default, `run` is `NULL` and its value is set via `template.file`.
`fw.id`	For FASTA input, the identifier for forward primers in the FASTA headers.
`rev.id`	For FASTA input, the identifier for reverse primers in the FASTA headers.
`merge.ambig`	Indicates whether similar primers should be merged ("merge") using IUPAC ambiguity codes or whether primers should be disambiguated ("unmerge"). By default `merge.ambig` is set to "none", leaving primers as they are.
`max.degen`	A scalar numeric providing the maximum allowed degeneracy for merging primers if `merge.ambig` is set to "merge". Degeneracy is defined by the number of disambiguated sequences that are represented by a degenerate primer.
`template.df`	An object of class `Templates`. If `template.df` is provided for `read_primers` then the primers are checked for restriction sites upon input; otherwhise they are not checked.
`adapter.action`	The action to be performed when `template.df` is provided for identifying adapter sequences. Either "warn" to issue warning about adapter sequences or "rm" to remove identified adapter sequences. The default is "warn".
`sample.name`	An identifier for the input primers.
`updateProgress`	A Shiny progress callback function. This is `NULL` by default such that no progress is tracked.
`filename`	Path to a valid XML file containing the primer analysis settings. By default, `filename` is set to all settings that are shipped with openPrimeR and the lexicographically first file is loaded.
`frontend`	Indicates whether settings shall be loaded for the Shiny frontend. In this case no unit conversions for the PCR settings are performed. The default setting is `FALSE` such that the correct units are used.

Details

In the following you can find a description of the most important columns that can be found in an object of class Templates. Note that angle brackets in the column names indicate the existence of multiple possibilities.

ID: The identifiers of the templates.
Identifier: The internal identifiers of the templates.
Group: The identifiers of the groups that the templates belong to.
Allowed_Start_<fw|rev>: The start of the interval in the templates where binding is allowed for forward and reverse primers, respectively.
Allowed_End_<fw|rev>: The end of the interval in the templates where binding is allowed for forward and reverse primers, respectively.
Allowed_<fw|rev>: The template sequence where binding is allowed for forward and reverse primers, respectively.
Run: An identifier for the set of template sequences.
Covered_By_Primers: The identifiers of primers covering the templates, when the template coverage has been annotated.
primer_coverage: The number of primers covering the templates, when the template coverage has been annotated.

When loading a FASTA file with read_templates, the input arguments hdr.structure, delim, id.column, rm.keywords, remove.duplicates, fw.region, rev.region, gap.character, and run are utilized. Most importantly, hdr.structure and delim should match the FASTA header structure. To learn more about setting the primer binding regions, consider the assign_binding_regions function. In contrast, when a CSV file is loaded with read_templates, the data are loaded without performing any modifications because the CSV file should represent an object of class Templates, which can be stored using the write_templates function.

When loading primers via read_primers, the input arguments fw.id, rev.id, merge.ambig, and max.degen are only used for loading primers from a FASTA file. In this case, please ensure that fw.id and rev.id are set according to the keywords indicating the primer directionalities in the FASTA file. When loading primers from a CSV file, the format of the file should adhere to the structure defined by the Primers class.

When loading a settings file with read_settings, if filename is not provided, a default XMl settings file is loaded. Please review the function's examples to learn more about the default settings. If you want to load custom settings, you can store a modified DesignSettings object as an XML file using write_settings.

Value

The Templates constructor returns a Templates object, an instance of a data frame.

read_templates returns a single object of class Templates if a single filename was provided or a list of such objects if multiple file names were provided.

The Primers constructor returns an object of class Primers.

read_primers returns a single object of class Primers if a single input file is provided or a list of such objects if multiple files are provided.

read_settings returns an object of class DesignSettings.

Basic columns

In the following you can find a description of the most important columns that can be found in objects of class Primers. Note that angular brackets indicate the existence of multiple possibilities. The following columns are present when a set of primers is loaded from a FASTA file using read_primers:

ID: The identifiers of the primers.
Identifier: The internal identifiers of the primers.
Forward: The sequences of forward primers.
Reverse: The sequences of reverse primers.
primer_length<fw|rev>: The lengths of forward and reverse primer sequences, respectively.
Direction: Either 'fw' for forward primers, 'rev' for reverse primers, or 'both' for a primer pair.
Degeneracy_<fw|rev>: The degeneracy (ambiguity) of forward and reverse primers, respectively.
Run: An identifier describing the primer set.

Coverage-related columns

The following columns are only available in an object of class Primers after primer coverage has been computed, that is after check_constraints has been called with the active primer_coverage constraint. Computed coverage values relating solely to string matching are indicated by the prefix Basic_, while columns without this prefix relate to the coverage after applying the constraints formulated via CoverageConstraints. Information on off-target coverage events are indicated by the Off_ prefix, while on-target coverage events do not carry this prefix.

primer_coverage: The number of templates that are covered by the primers. Note that if a primer set contains primers of both directions, a template is only considered covered if it is covered by primers of both directions.
Coverage_Ratio: The ratio of templates that are covered by the primers.
Binding_Position_Start_<fw|rev>: The upstream position in the templates where forward and reverse primers respectively bind.
Binding_Position_End_<fw|rev>: The downstream position in the templates where forward and reverse primers respectively bind.
Relative_<Forward|Reverse>_Binding_Position_<Start|End>_<fw|rev>: The binding upstream (Start) or downstream (End) positions of the primers relative to the forward (Forward) or reverse (Reverse) binding regions, either for forward (fw) or reverse primers (rev).
Binding_Region_Allowed: Whether a coverage event occurred in the target binding region or not. If the allowed off-target ratio was set to 0 only coverage events within the the target region are reported.
Nbr_of_mismatches_<fw|rev>: The number of mismatches of forward and reverse primer coverage events, respectively.
Mismatch_pos_<fw|rev>: The position of mismatches for forward and reverse coverage events, respectively. Mismatch positions are reported relative to the 3' end, that is, position 1 indicates a mismatch in the last base of a primer.
primer_specificity: The specificity of a primer as determined by its ratio of off-target binding events.

Constraint-related columns

Each constraint that is considered when calling check_constraints gives rise to at least one column in the provided Primers object. Due to the large number of possible constraints, we will limit our description to the gc_clamp constraint. Once the GC clamp property has been computed, the gc_clamp_fw column contains the length of the GC clamp for forward primers and gc_clamp_rev the corresponding length for reverse primers. Whether the desired extent of the GC clamp was obtained by a primer is indicated by the EVAL_gc_clamp column. It contains TRUE when the GC clamp constraint was fulfilled and FALSE when it was broken. To identify whether all required constraints were fulfilled by a primer, the constraints_passed column can be used. It contains TRUE if all active.constraints used by check_constraints were fulfilled and FALSE otherwise.

Examples


# Load a set of templates:
fasta.file <- system.file("extdata", "IMGT_data", "templates", 
     "Homo_sapiens_IGH_functional_exon.fasta", package = "openPrimeR")
hdr.structure <- c("ACCESSION", "GROUP", "SPECIES", "FUNCTION")
template.df <- read_templates(fasta.file, hdr.structure, "|", "GROUP")
# Load templates from a FASTA file
fasta.file <- system.file("extdata", "IMGT_data", "templates", 
          "Homo_sapiens_IGH_functional_exon.fasta", package = "openPrimeR")
hdr.structure <- c("ACCESSION", "GROUP", "SPECIES", "FUNCTION")
template.df.fasta <- read_templates(fasta.file, hdr.structure, "|", "GROUP")
# Load mutliple FASTA files
fasta.files <- c(fasta.file, fasta.file)
template.df.fastas <- read_templates(fasta.files, hdr.structure, "|", "GROUP")
# Load templates from a previously stored CSV file
csv.file <- system.file("extdata", "IMGT_data", "comparison", 
               "templates", "IGH_templates.csv", package = "openPrimeR")
template.df.csv <- read_templates(csv.file)
# Load multiple CSV files:
csv.files <- c(csv.file, csv.file)
template.df.csvs <- read_templates(csv.files)
# Load a mixture of FASTA/CSV files:
mixed.files <- c(csv.file, fasta.file)
template.data <- read_templates(mixed.files)

# Load a set of primers
primer.location <- system.file("extdata", "IMGT_data", "primers", "IGHV", 
                     "Ippolito2012.fasta", package = "openPrimeR")
primer.df <- read_primers(primer.location, "_fw", "_rev")

primer.fasta <- system.file("extdata", "IMGT_data", "primers", "IGHV", 
                     "Ippolito2012.fasta", package = "openPrimeR")
primer.df <- read_primers(primer.fasta, "_fw", "_rev")
# Read multiple FASTA files
fasta.files <- list.files(system.file("extdata", "IMGT_data", "primers", 
                 "IGHV", package = "openPrimeR"), pattern = "*\\.fasta",
                 full.names = TRUE)[1:3]
primer.data <- read_primers(fasta.files)
# Read primers from a CSV file
primer.csv <- system.file("extdata", "IMGT_data", "comparison", 
             "primer_sets", "IGL", "IGL_openPrimeR2017.csv",  package = "openPrimeR")
primer.df <- read_primers(primer.csv)
# Read multiple primer CSV files
primer.files <- list.files(path = system.file("extdata", "IMGT_data", "comparison", 
                         "primer_sets", "IGH", package = "openPrimeR"),
                          pattern = "*\\.csv", full.names = TRUE)[1:3]
primer.data <- read_primers(primer.files)
# Read a mixture of FASTA/CSV files:
mixed.primers <- c(primer.fasta, primer.csv)
primer.data <- read_primers(mixed.primers)

# Select available settings
available.settings <- list.files(
     system.file("extdata", "settings", package = "openPrimeR"), 
     pattern = "*.xml", full.names = TRUE)
# Select one of the settings and load them
filename <- available.settings[1]
settings <- read_settings(filename)

matdoering/openPrimeR documentation built on Feb. 11, 2024, 9:22 p.m.