Depth and number of neurons per layer of the netwok can be specified. First layer can be a Convolutional Neural Network (CNN) that is designed to capture codons.
If a path to a folder where FASTA files are located is provided, batches will ge generated using an external generator which
is recommended for big training sets. Alternative, a dataset can be supplied that holds the preprocessed batches (generated by preprocessSemiRedundant()
)
and keeps them in RAM. Supports also training on instances with multiple GPUs and scales linear with number of GPUs present.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | trainNetwork(
train_type = "lm",
model_path = NULL,
model = NULL,
path = NULL,
path.val = NULL,
dataset = NULL,
checkpoint_path,
validation.split = 0.2,
run.name = "run",
batch.size = 64,
epochs = 10,
max.queue.size = 100,
lr.plateau.factor = 0.9,
patience = 5,
cooldown = 5,
steps.per.epoch = 1000,
step = 1,
randomFiles = FALSE,
initial_epoch = NULL,
vocabulary = c("a", "c", "g", "t"),
tensorboard.log,
save_best_only = TRUE,
compile = TRUE,
learning.rate = NULL,
solver = NULL,
max_iter = 1000,
seed = c(1234, 4321),
shuffleFastaEntries = FALSE,
output = list(none = FALSE, checkpoints = TRUE, tensorboard = TRUE, log = TRUE,
serialize_model = TRUE, full_model = TRUE),
format = "fasta",
fileLog = NULL,
labelVocabulary = NULL,
numberOfFiles = NULL,
reverseComplements = FALSE
)
|
train_type |
Either "lm" for language model, "label_header" or "label_folder". Language model is trained to predict next character in sequence. label_header/label_folder are trained to predict a corresponding class, given a sequence as input. If "label_header", class will be read from fasta headers. If "label_folder", class will be read from folder, i.e. all fasta files in one folder must belong to the same class. mailab |
model_path |
Path to a pretrained model. |
model |
A keras model. |
path |
Path to folder where individual or multiple FASTA files are located for training. If |
path.val |
Path to folder where individual or multiple FASTA files are located for validation.If |
dataset |
Dataframe holding training samples in RAM instead of using generator. |
checkpoint_path |
Path to checkpoints folder. |
validation.split |
Defines the fraction of the batches that will be used for validation (compared to size of training data). |
run.name |
Name of the run (without file ending). Name will be used to identify output from callbacks. |
batch.size |
Number of samples that are used for one network update. |
epochs |
Number of iterations. |
max.queue.size |
Queue on fit_generator(). |
lr.plateau.factor |
Factor of decreasing learning rate when plateau is reached. |
patience |
Number of epochs waiting for decrease in loss before reducing learning rate. |
cooldown |
Number of epochs without changing learning rate. |
steps.per.epoch |
Number of batches to finish one epoch. |
step |
Frequency of sampling steps. |
randomFiles |
TRUE/FALSE go through files sequentially or shuffle beforehand. |
initial_epoch |
Epoch at which to start training, set to 0 if no |
vocabulary |
Vector of allowed characters, character outside vocabulary get encoded as 0-vector. |
tensorboard.log |
Path to tensorboard log directory. |
save_best_only |
Only save model that improved on best val_loss score. |
compile |
Whether to compile the model after loading. |
learning.rate |
Learning rate for optimizer. Only used when pretrained model is given ( |
solver |
Optimization method, options are "adam", "adagrad", "rmsprop" or "sgd". Only used when pretrained model is given ( |
max_iter |
Stop after max_iter number of iterations failed to produce new sample. |
seed |
Sets seed for set.seed function, for reproducible results when using |
shuffleFastaEntries |
Logical, shuffle entries in file. |
output |
List of optional outputs, no output if none is TRUE. |
format |
File format, "fasta" or "fastq". |
fileLog |
Write name of files to csv file if path is specified. |
labelVocabulary |
Character vector of possible targets. Targets outside |
numberOfFiles |
Use only specified number of files, ignored if greater than number of files in corpus.dir. |
reverseComplements |
Logical, half of batch contains sequences and other its reverse complements. Reverse complement
is given by reversed order of sequence and switching A/T and C/G. |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.