README.md

deepG

Build Status codecov

Overview

deepG is a package for generating LSTM models from genomic text and provides scripts for various common tasks such as the extraction of cell response. It also comes with example datasets of genomic and human-readable languages for testing.

Installation and Usage

Please see our Wiki for further installation instructions. It covers also usage instructions for multi-GPU machines.

See the help files ?deepG to get started and for questions use the FAQ.

Datasets

The library comes with mutiple different datasets for testing:

Example

Preprocessing

library(deepG)
data("ecoli") # loads the nucleotide sequence of E. coli
preprocessed <- preprocessSemiRedundant(substr(ecoli, 2, 5000), maxlen = 250) # prepares the batches (one-hot encoding)

Training a language model on CPU

Will generate the binary file example_full_model.hdf5. For more options see the Wiki Training of GenomeNet.

trainNetwork(dataset = preprocessed, batch.size = 500, epochs = 5, maxlen = 250, layers.lstm = 2, layer.size = 25, use.cudnn = F, run.name = "example", tensorboard.log = "log", path.val = "", output = list(none = FALSE, checkpoints =FALSE, tensorboard = FALSE, log = FALSE, serialize_model = FALSE, full_model = TRUE))

Generation of the states

We can use now the trained model to generated neuron responses (states) for a suset of the E coli genome. This will generate a binary file named states.h5

writeStates(model.path = "example_full_model.hdf5", sequence = substr(ecoli, 2, 5000), batch.size = 256, layer.depth = 1, filename = "states", vocabulary = c("a","g","c","t"), step = 1, padding = TRUE)

License and Copyright

Copyright 2019 Philipp Münch

Supported by



hiddengenome/deepG documentation built on April 16, 2020, 1:38 a.m.