Description Usage Arguments Value Examples
View source: R/newCountDataSet.R
Generate two nxp data sets: a training set and a test set, as well as outcome vectors y and yte of length n indicating the class labels of the training and test observations.
1 | newCountDataSet(n, p, K, param, sdsignal,drate)
|
n |
Number of observations desired. |
p |
Number of features desired. Note that drate of the features will differ between classes, though some of those differences may be small. |
K |
Number of classes desired. Note that the function requires that n be at least equal to 4K.i.e. there must be at least 4 observations per class on average. |
param |
The dispersion parameter for the negative binomial distribution. The negative binomial distribution is parameterized using "mu" and "size" in the R function "rnbinom". That is, Y ~ NB(mu, param) means that E(Y)=mu and Var(Y) = mu+mu^2/param.So when param is very large this is essentially a Poisson distribution, and when param is smaller then there is a lot of overdispersion relative to the Poisson distribution. |
sdsignal |
The extent to which the classes are different. If this equals zero then there are no class differences and if this is large then the classes are very different. |
drate |
The proportion of differentially expressed genes |
list(.) A list of output, "sim_train_data" represents training data of q*n data matrix. "sim_test_data" represents test data of q*n data matrix. The colnames of this two matrix are class labels for the n observations May have q<p because features with 0 total counts are removed. The q features are those with >0 total counts in dataset. So q <= p. "truesf" denotes size factors for training observations."isDE" represnts the differential gene label.
1 | dat <- newCountDataSet(n=40,p=500, K=4, param=10, sdsignal=0.1,drate=0.4)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.