PERFect_perm: Permutation PERFect filtering for microbiome data
In PERFect: Permutation filtration for microbiome data

Description Usage Arguments Details Value Author(s) References See Also Examples

Permutation filtering of the provided OTU table X at a test level alpha. Each set of j taxa significance is evaluated by fitting the Skew-Normal, Normal, t or Cauchy distribution to the sampling distribution obtained by permuted taxa labels.

PERFect_perm(X, infocol = NULL, Order = "NP", Order.user = NULL, normalize = "counts",
    algorithm = "fast", center = FALSE, quant = c(0.1, 0.25, 0.5),
    distr = "sn", alpha = 0.1, rollmean = TRUE, direction = "left", pvals_sim = NULL,
    k = 10000, nbins = 30, hist = TRUE, col = "red", fill = "green",
    hist_fill = 0.2, linecol = "blue")

`X`	OTU table, where taxa are columns and samples are rows of the table. It should be a in data frame format with columns corresponding to taxa names.
`infocol`	Index vector of the metadata. We assume user only gives a taxa table, but if the metadata of the samples are included in the columns of the input, this option needs to be specified.
`Order`	Taxa ordering. The default ordering is the number of occurrences (NP) of the taxa in all samples. Other types of order are p-value ordering, number of connected taxa and weighted number of connected taxa, denoted as `"pvals"`, `"NC"`, `"NCw"` respectively. More details about taxa ordering are described in Smirnova et al. User can also specify their preference order with Order.user.
`Order.user`	User's taxa ordering. This argument takes a character vector of ordered taxa names.
`normalize`	Normalizing taxa count. The default option does not normalize taxa count, but user can convert the OTU table into a proportion table using the option `"prop"` or convert it into a presence/absence table using `"pres"`.
`algorithm`	Algorithm speed. The default is speed is `"fast"`, which allows the program to efficiently search for significant taxa without computing all the p-values. User must use the default option `"hist = FALSE"` for the fast algorithm. The alternative setting is `"full"`, which computes all the taxa's p-values.
`center`	Centering OTU table. The default option does not center the OTU table.
`quant`	Quantile values used to fit the distribution to log DFL values. The number of quantile values corresponds to the number of parameters in the distribution the data is fitted to. Assuming that at least 50% of taxa are not informative, we suggest fitting the log Skew-Normal distribution by matching the 10%, 25% and 50% percentiles of the log-transformed samples to the Skew-Normal distribution.
`distr`	The type of distribution to fit log DFL values to. While we suggest using Skew-Normal distribution, and set as the default distribution, other choices are available. `"sn"` Skew-Normal distribution with 3 parameters: location xi, scale omega^2 and shape alpha `"norm"` Normal distribution with 2 parameters: mean and standard deviation sd
`alpha`	Test level alpha, set to 0.1 by default.
`rollmean`	Binary TRUE/FALSE value. If TRUE, rolling average (moving mean) of p-values will be calculated, with the lag window set to 3 by default.
`direction`	Character specifying whether the index of the result should be left- or right-aligned or centered compared to the rolling window of observations, set to "left" by default.
`pvals_sim`	Object resulting from simultaneous PERFect with taxa abundance ordering, allowing user to perform Simultaneous PERFect with p-values ordering. Be aware that the choice of distribution for both methods must be the same.
`k`	The number of permutations, set to 10000 by default.
`nbins`	Number of bins used to visualize the histogram of log DFL values, set to 30 by default.
`hist`	Binary TRUE/FALSE value. If TRUE, the function builds histograms for each taxon.
`col`	Graphical parameter for color of histogram bars border, set to "red" by default.
`fill`	Graphical parameter for color of histogram fill, set to "green" by default.
`hist_fill`	Graphical parameter for intensity of histogram fill, set to 0.2 by default.
`linecol`	Graphical parameter for the color of the fitted distribution density, set to "blue" by default.

Filtering is the process of identifying and removing a subset of taxa according to a particular criterion. As opposed to the the simultaneous filtering approach, we do not assume that all distributions for each set of taxa are identical and equal to the distribution of simultaneous filtering. Function PERFect_perm() filters the provided OTU table X and outputs a filtered table that contains signal taxa. PERFect_perm() calculates differences in filtering loss DFL for each taxon according to the given taxa order. By default, the function fits Skew-Normal distribution to the log-differences in filtering loss but Normal, t, or Cauchy distributions can be also used.

If "algorithm = full" is chosen, a list is returned containing:

`filtX`	Filtered OTU table.
`info`	The metadata information.
`pvals`	P-values of the test.
`DFL`	Differences in filtering loss values.
`fit`	Fitted values and further goodness of fit details passed from the `fitdistr()` function.
`hist`	Histogram of log differences in filtering loss.
`est`	Estimated distribution parameters.
`dfl_distr`	Plot of differences in filtering loss values.

If "algorithm = fast" is chosen, fit, hist, est, dfl_distr will not be returned.

Ekaterina Smirnova

Azzalini, A. (2005). The skew-normal distribution and related multivariate families. Scandinavian Journal of Statistics, 32(2), 159-188.

Smirnova, E., Huzurbazar, H., Jafari, F. “PERFect: permutationfiltration of microbiome data", to be submitted.

PERFect_sim

data(mock2)

# Proportion data matrix
Prop <- mock2$Prop

# Counts data matrix
Counts <- mock2$Counts

# Perform simultaenous filtering of the data
res_sim <- PERFect_sim(X=Counts)

#order according to p-values
pvals_sim <- pvals_Order(Counts, res_sim)

## Not run: 
# obtain permutation PERFEct results using NP taxa ordering
res_perm <- PERFect_perm(X = Prop, Order.user = pvals_sim, algorithm = "fast")

# permutation perfect colored by FLu values
pvals_Plots(PERFect = res_perm, X = Counts, quantiles = c(0.25, 0.5, 0.8, 0.9), alpha=0.05)

## End(Not run)