sdfStream | R Documentation |
Streaming function to compute descriptors for large SD Files without consuming much memory. In addition to descriptor values, it returns a line index that defines the positions of each molecule in the source SD File. This line index can be used by the read.SDFindex
function to retrieve specific compounds of interest from large SD Files without reading the entire file into memory.
sdfStream(input, output, append=FALSE, fct, Nlines = 10000, startline=1, restartNlines=10000, silent = FALSE, ...)
input |
file name of input SD file |
output |
file name of tabular descriptor file |
append |
if |
fct |
Function to select descriptor sets; any combination of descriptors, supported by |
Nlines |
Number of lines to read from input SD File at a time; the memory consumption will be proportional to this value. |
startline |
For restarting sdfStream at specific line assigned to |
restartNlines |
Number of lines to parse when |
silent |
if |
... |
Arguments to be passed to/from other methods. |
...
Writes a descriptor matrix to a tabular file. The first and last line number (position index) of each molecule is specified in the first two columns of the tabular output file, respectively.
Thomas Girke
SDF format definition: http://www.symyx.com/downloads/public/ctfile/ctfile.jsp
Import/export functions: read.AP
, read.SDFset
, read.SDFstr
, read.SDFstr
, read.SDFset
, write.SDFsplit
## Load sample data
library(ChemmineR)
data(sdfsample); sdfset <- sdfsample
## Not run: write.SDF(sdfset, "test.sdf")
## Define descriptor set in a simple function
desc <- function(sdfset) {
cbind(SDFID=sdfid(sdfset),
# datablock2ma(datablocklist=datablock(sdfset)),
MW=MW(sdfset),
groups(sdfset),
# AP=sdf2ap(sdfset, type="character"),
rings(sdfset, type="count", upper=6, arom=TRUE)
)
}
## Run sdfStream with desc function and write results to a file called 'matrix.xls'
sdfStream(input="test.sdf", output="matrix.xls", append=FALSE, fct=desc, Nlines=1000)
## Same as before but starting in SD file at line number 950
sdfStream(input="test.sdf", output="matrix.xls", append=FALSE, fct=desc, Nlines=1000, startline=950)
## Select molecules from SD File using line index from sdfStream
indexDF <- read.delim("matrix.xls", row.names=1)[,1:4]
indexDFsub <- indexDF[indexDF$MW < 400, ] # Selects molecules with MW < 400
sdfset <- read.SDFindex(file="test.sdf", index=indexDFsub, type="SDFset")
## Write result directly to SD file without storing larger numbers of molecules in memory
read.SDFindex(file="test.sdf", index=indexDFsub, type="file", outfile="sub.sdf")
## Read atom pair string representation from file into APset
apset <- read.AP(file="matrix.xls", colid="AP")
cid(apsdf) <- as.character(indexDF$SDFID)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.