eiInit | R Documentation |
Takes the raw compound database in whatever format the given measure supports and creates a "data" directory.
eiInit(inputs,dir=".",format="sdf",descriptorType="ap",append=FALSE,
conn=defaultConn(dir,create=TRUE), updateByName = FALSE, cl = NULL, connSource = NULL,
priorityFn = forestSizePriorities,skipPriorities=FALSE)
inputs |
Either a filename of a file in |
dir |
The directory where the "data" directory lives. Defaults to the current directory. |
format |
The format of the data in |
descriptorType |
The format of the descriptor. Currently supported values are "ap" for atom pair, and "fp" for fingerprint. |
append |
If true the given compounds will be added to an existing database
and the <data-dir>/Main.iddb file will be updated with the new
compound id numbers. This should not normally be used directly, use
|
conn |
Database connection to use. If a connection is given, you must ensure that it has been initialized using
the |
updateByName |
If true we make the assumption that all compounds, both in the existing database and the given dataset, have unique names. This function will then avoid re-adding existing, identical compounds, and will update existing compounds with a new definition if a new compound definition with an existing name is given. If false, we allow duplicate compound names to exist in the database, though not duplicate definitions. So identical compounds will not be re-added, but if a new version of an existing compound is added it will not update the existing one, it will add the modified one as a completely new compound with a new compound id. |
cl |
A SNOW cluster can be given here to run this function in parallel. |
connSource |
A function returning a new database connection. Note that it is not sufficient to return a reference to an existing connection, it must be a distinct, new connection. This is needed for cluster operations that make use of the database as they will each need to create a new connection. If not given, certain parts of this function will not be parallelized. This function can also be used to setup the environment on the cluster worker nodes. For example, you might need to re-load libraries like RSQLite and such. |
priorityFn |
This option takes a function that takes a list of compound ids and returns a data frame with the compound ids as the column 'compound_id', and their priority as the column 'priority'. There are two pre-defined functions in ChemmineR: 'randomPriorities', and 'forestSizePriorities' (default). When several compounds map to the same descriptor, then when some functions need to go from a descriptor to a compound, there is ambiguity about which compound to select. In that case, it will pick the compound with the highest priority. |
skipPriorities |
If this is true, then no priority values will be computed. See option |
EiInit can take either an SDFset, or a filename. SDF and SMILES is supported
by default.
It might complain if your SDF file does not
follow the SDF specification. If this happens, you can create an
SDFset with the read.SDFset
command and then use that
instead of the filename.
EiInit will create a folder called
'data'. Commands should always be executed in the folder containing
this directory (ie, the parent directory of "data"), or else
specify the location of that directory with the dir
option.
A directory called "data" will have been created in the current working directory.
The generated compound ids of the given compounds will be returned. These can be used to
reference a compound or set of compounds in other functions, such as eiQuery
.
Kevin Horan
eiMakeDb
eiPerformanceTest
eiQuery
data(sdfsample)
dir=file.path(tempdir(),"init")
dir.create(dir)
eiInit(sdfsample,dir=dir,priorityFn=randomPriorities)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.