Input: Input Functionalities.

InputR Documentation

Input Functionalities.

Description

read_primers

Reads one or multiple input files with primer sequences. The input can either be in FASTA or in CSV format.

read_templates

Read one or multiple files with template sequences in FASTA or CSV format.

read_settings

Loads primer analysis settings from an XML file.

Templates

The Templates class encapsulates a data frame containing the sequencs of the templates, their binding regions, as well as additional information (e.g. template coverage).

Primers

The Primers class encapsulates a data frame representing a set of primers. Objects of this class store all properties associated with a set of primers, for example the results from evaluating the properties of a primer set or from determining its coverage.

Usage

Templates(...)

read_templates(
  fname,
  hdr.structure = NULL,
  delim = NULL,
  id.column = NULL,
  rm.keywords = NULL,
  remove.duplicates = FALSE,
  fw.region = c(1, 30),
  rev.region = c(1, 30),
  gap.char = "-",
  run = NULL
)

Primers(...)

read_primers(
  fname,
  fw.id = "_fw",
  rev.id = "_rev",
  merge.ambig = c("none", "merge", "unmerge"),
  max.degen = 16,
  template.df = NULL,
  adapter.action = c("warn", "rm"),
  sample.name = NULL,
  updateProgress = NULL
)

read_settings(
  filename = list.files(system.file("extdata", "settings", package = "openPrimeR"),
    pattern = "*.xml", full.names = TRUE),
  frontend = FALSE
)

Arguments

...

A data frame fulfilling the structural requirements for initializing a Templates or Primers object.

fname

Character vector providing either a single or multiple paths to FASTA or CSV files.

hdr.structure

A character vector describing the information contained in the FASTA headers. In case that the headers of fasta.file contain template group information, please include the keyword "GROUP" in hdr.structure. If the numer of elements provided via hdr.structure is shorter than the actual header structure, the missing fields are ignored.

delim

Delimiter for the information in the FASTA headers.

id.column

Field in the header to be used as the identifier of individual template sequences.

rm.keywords

A vector of keywords that are used to remove templates whose headers contain any of the keywords.

remove.duplicates

Whether duplicate sequence shall be removed.

fw.region

The positional interval from the template 5' end specifying the binding sites for forward primers. The default fw.region is set to the first 30 bases of the templates.

rev.region

The positional interval from the template 3' end specifying the binding sites for reverse primers. The default rev.region is set to the last 30 bases of the templates.

gap.char

The character in the input file representing gaps. Gaps are automatically removed upon input and the default character is "-".

run

An identifier for the set of template sequences. By default, run is NULL and its value is set via template.file.

fw.id

For FASTA input, the identifier for forward primers in the FASTA headers.

rev.id

For FASTA input, the identifier for reverse primers in the FASTA headers.

merge.ambig

Indicates whether similar primers should be merged ("merge") using IUPAC ambiguity codes or whether primers should be disambiguated ("unmerge"). By default merge.ambig is set to "none", leaving primers as they are.

max.degen

A scalar numeric providing the maximum allowed degeneracy for merging primers if merge.ambig is set to "merge". Degeneracy is defined by the number of disambiguated sequences that are represented by a degenerate primer.

template.df

An object of class Templates. If template.df is provided for read_primers then the primers are checked for restriction sites upon input; otherwhise they are not checked.

adapter.action

The action to be performed when template.df is provided for identifying adapter sequences. Either "warn" to issue warning about adapter sequences or "rm" to remove identified adapter sequences. The default is "warn".

sample.name

An identifier for the input primers.

updateProgress

A Shiny progress callback function. This is NULL by default such that no progress is tracked.

filename

Path to a valid XML file containing the primer analysis settings. By default, filename is set to all settings that are shipped with openPrimeR and the lexicographically first file is loaded.

frontend

Indicates whether settings shall be loaded for the Shiny frontend. In this case no unit conversions for the PCR settings are performed. The default setting is FALSE such that the correct units are used.

Details

In the following you can find a description of the most important columns that can be found in an object of class Templates. Note that angle brackets in the column names indicate the existence of multiple possibilities.

ID

The identifiers of the templates.

Identifier

The internal identifiers of the templates.

Group

The identifiers of the groups that the templates belong to.

Allowed_Start_<fw|rev>

The start of the interval in the templates where binding is allowed for forward and reverse primers, respectively.

Allowed_End_<fw|rev>

The end of the interval in the templates where binding is allowed for forward and reverse primers, respectively.

Allowed_<fw|rev>

The template sequence where binding is allowed for forward and reverse primers, respectively.

Run

An identifier for the set of template sequences.

Covered_By_Primers

The identifiers of primers covering the templates, when the template coverage has been annotated.

primer_coverage

The number of primers covering the templates, when the template coverage has been annotated.

When loading a FASTA file with read_templates, the input arguments hdr.structure, delim, id.column, rm.keywords, remove.duplicates, fw.region, rev.region, gap.character, and run are utilized. Most importantly, hdr.structure and delim should match the FASTA header structure. To learn more about setting the primer binding regions, consider the assign_binding_regions function. In contrast, when a CSV file is loaded with read_templates, the data are loaded without performing any modifications because the CSV file should represent an object of class Templates, which can be stored using the write_templates function.

When loading primers via read_primers, the input arguments fw.id, rev.id, merge.ambig, and max.degen are only used for loading primers from a FASTA file. In this case, please ensure that fw.id and rev.id are set according to the keywords indicating the primer directionalities in the FASTA file. When loading primers from a CSV file, the format of the file should adhere to the structure defined by the Primers class.

When loading a settings file with read_settings, if filename is not provided, a default XMl settings file is loaded. Please review the function's examples to learn more about the default settings. If you want to load custom settings, you can store a modified DesignSettings object as an XML file using write_settings.

Value

The Templates constructor returns a Templates object, an instance of a data frame.

read_templates returns a single object of class Templates if a single filename was provided or a list of such objects if multiple file names were provided.

The Primers constructor returns an object of class Primers.

read_primers returns a single object of class Primers if a single input file is provided or a list of such objects if multiple files are provided.

read_settings returns an object of class DesignSettings.

Basic columns

In the following you can find a description of the most important columns that can be found in objects of class Primers. Note that angular brackets indicate the existence of multiple possibilities. The following columns are present when a set of primers is loaded from a FASTA file using read_primers:

ID

The identifiers of the primers.

Identifier

The internal identifiers of the primers.

Forward

The sequences of forward primers.

Reverse

The sequences of reverse primers.

primer_length<fw|rev>

The lengths of forward and reverse primer sequences, respectively.

Direction

Either 'fw' for forward primers, 'rev' for reverse primers, or 'both' for a primer pair.

Degeneracy_<fw|rev>

The degeneracy (ambiguity) of forward and reverse primers, respectively.

Run

An identifier describing the primer set.

Coverage-related columns

The following columns are only available in an object of class Primers after primer coverage has been computed, that is after check_constraints has been called with the active primer_coverage constraint. Computed coverage values relating solely to string matching are indicated by the prefix Basic_, while columns without this prefix relate to the coverage after applying the constraints formulated via CoverageConstraints. Information on off-target coverage events are indicated by the Off_ prefix, while on-target coverage events do not carry this prefix.

primer_coverage

The number of templates that are covered by the primers. Note that if a primer set contains primers of both directions, a template is only considered covered if it is covered by primers of both directions.

Coverage_Ratio

The ratio of templates that are covered by the primers.

Binding_Position_Start_<fw|rev>

The upstream position in the templates where forward and reverse primers respectively bind.

Binding_Position_End_<fw|rev>

The downstream position in the templates where forward and reverse primers respectively bind.

Relative_<Forward|Reverse>_Binding_Position_<Start|End>_<fw|rev>

The binding upstream (Start) or downstream (End) positions of the primers relative to the forward (Forward) or reverse (Reverse) binding regions, either for forward (fw) or reverse primers (rev).

Binding_Region_Allowed

Whether a coverage event occurred in the target binding region or not. If the allowed off-target ratio was set to 0 only coverage events within the the target region are reported.

Nbr_of_mismatches_<fw|rev>

The number of mismatches of forward and reverse primer coverage events, respectively.

Mismatch_pos_<fw|rev>

The position of mismatches for forward and reverse coverage events, respectively. Mismatch positions are reported relative to the 3' end, that is, position 1 indicates a mismatch in the last base of a primer.

primer_specificity

The specificity of a primer as determined by its ratio of off-target binding events.

Constraint-related columns

Each constraint that is considered when calling check_constraints gives rise to at least one column in the provided Primers object. Due to the large number of possible constraints, we will limit our description to the gc_clamp constraint. Once the GC clamp property has been computed, the gc_clamp_fw column contains the length of the GC clamp for forward primers and gc_clamp_rev the corresponding length for reverse primers. Whether the desired extent of the GC clamp was obtained by a primer is indicated by the EVAL_gc_clamp column. It contains TRUE when the GC clamp constraint was fulfilled and FALSE when it was broken. To identify whether all required constraints were fulfilled by a primer, the constraints_passed column can be used. It contains TRUE if all active.constraints used by check_constraints were fulfilled and FALSE otherwise.

Examples


# Load a set of templates:
fasta.file <- system.file("extdata", "IMGT_data", "templates", 
     "Homo_sapiens_IGH_functional_exon.fasta", package = "openPrimeR")
hdr.structure <- c("ACCESSION", "GROUP", "SPECIES", "FUNCTION")
template.df <- read_templates(fasta.file, hdr.structure, "|", "GROUP")
# Load templates from a FASTA file
fasta.file <- system.file("extdata", "IMGT_data", "templates", 
          "Homo_sapiens_IGH_functional_exon.fasta", package = "openPrimeR")
hdr.structure <- c("ACCESSION", "GROUP", "SPECIES", "FUNCTION")
template.df.fasta <- read_templates(fasta.file, hdr.structure, "|", "GROUP")
# Load mutliple FASTA files
fasta.files <- c(fasta.file, fasta.file)
template.df.fastas <- read_templates(fasta.files, hdr.structure, "|", "GROUP")
# Load templates from a previously stored CSV file
csv.file <- system.file("extdata", "IMGT_data", "comparison", 
               "templates", "IGH_templates.csv", package = "openPrimeR")
template.df.csv <- read_templates(csv.file)
# Load multiple CSV files:
csv.files <- c(csv.file, csv.file)
template.df.csvs <- read_templates(csv.files)
# Load a mixture of FASTA/CSV files:
mixed.files <- c(csv.file, fasta.file)
template.data <- read_templates(mixed.files)

# Load a set of primers
primer.location <- system.file("extdata", "IMGT_data", "primers", "IGHV", 
                     "Ippolito2012.fasta", package = "openPrimeR")
primer.df <- read_primers(primer.location, "_fw", "_rev")

primer.fasta <- system.file("extdata", "IMGT_data", "primers", "IGHV", 
                     "Ippolito2012.fasta", package = "openPrimeR")
primer.df <- read_primers(primer.fasta, "_fw", "_rev")
# Read multiple FASTA files
fasta.files <- list.files(system.file("extdata", "IMGT_data", "primers", 
                 "IGHV", package = "openPrimeR"), pattern = "*\\.fasta",
                 full.names = TRUE)[1:3]
primer.data <- read_primers(fasta.files)
# Read primers from a CSV file
primer.csv <- system.file("extdata", "IMGT_data", "comparison", 
             "primer_sets", "IGL", "IGL_openPrimeR2017.csv",  package = "openPrimeR")
primer.df <- read_primers(primer.csv)
# Read multiple primer CSV files
primer.files <- list.files(path = system.file("extdata", "IMGT_data", "comparison", 
                         "primer_sets", "IGH", package = "openPrimeR"),
                          pattern = "*\\.csv", full.names = TRUE)[1:3]
primer.data <- read_primers(primer.files)
# Read a mixture of FASTA/CSV files:
mixed.primers <- c(primer.fasta, primer.csv)
primer.data <- read_primers(mixed.primers)

# Select available settings
available.settings <- list.files(
     system.file("extdata", "settings", package = "openPrimeR"), 
     pattern = "*.xml", full.names = TRUE)
# Select one of the settings and load them
filename <- available.settings[1]
settings <- read_settings(filename)

matdoering/openPrimeR documentation built on Feb. 11, 2024, 9:22 p.m.