knitr::opts_chunk$set(warning = FALSE, message = FALSE, echo = FALSE)
rm(list=ls()) library(PERFect) library(ggplot2) library(knitr) library(kableExtra) set.seed(12341)
The PERFect software is freely available at https://github.com/katiasmirn/PERFect and can be installed using the following codes:
library(devtools) install_github("katiasmirn/PERFect")
This vignette explains the idea of filtering loss $FL(J)$ and demonstrates the PERFect permutation method on a small subset of the mock community data (Brooks et al. 2015). There are in total $10$ samples ($n = 10$) and $20$ taxa ($p = 20$) which contain $7$ signal taxa. The two tables below shows ordered taxa abundance by columns, where the first table gives $13$ noise taxa and the second table gives $7$ signal taxa. For example, $N_1$ is the least abundant noise taxon and $N_{13}$ is the most abundant noise taxon; $S_1$ is the least abundant signal taxon and $S_7$ is the most abundant signal taxon.
data(mock2) Counts <- mock2$Counts taxa <- c("Bradyrhizobiaceae.cluster49", "Propionibacterium.acnes", "Gemella.OTU86", "Comamonadaceae.cluster57", "Streptococcus.gordonii", "Caulobacter.leidyi", "Finegoldia.magna", "Aerococcus.christensenii", "Agromyces.cluster54", "Mycoplasma.orale_faucium", "Enterobacteriaceae.cluster31", "Lactobacillus.jensenii", "Fusobacterium.cluster48", "Methylophilus.cluster11", "Coriobacteriaceae.OTU27", "Prevotella.cluster2", "Delftia.acidovorans_lacustris_tsuruhatensis", "Hyphomicrobium.methylovorum", "Bifidobacterium.longum_infantis_suis", "Streptococcus.parasanguinis") Ind <- apply(Counts[,taxa], 2, function(x) which(x!=0)) sampleID <- c(31,78, 39, 5, 9, 76, 22, 105, 34, 82) data <- Counts[sampleID,c(taxa, mock2$TrueTaxa)] data <- data[,apply(data, 2, Matrix::nnzero) >0] data<- data[,-3] #sort by abundance data <- data[, NP_Order(data)] #rename columns by noise and signal SignTaxa <- colnames(data) %in% mock2$TrueTaxa colnames(data)[SignTaxa] <-paste0("S~", 1:sum(SignTaxa),"~") colnames(data)[!SignTaxa] <-paste0("N~", 1:(dim(data)[2] -sum(SignTaxa)),"~") rownames(data) <- paste("Sample ", 1:nrow(data)) kable(data[,1:13], format='html', digits=2, row.names=TRUE, escape = FALSE, padding=-3L) %>% kable_styling(font_size = 14, bootstrap_options = "striped", full_width = TRUE, position = "left") kable(data[,14:20], format='html', row.names=TRUE, escape = FALSE, padding=-3L) %>% kable_styling(font_size = 14, bootstrap_options = "striped", full_width = TRUE, position = "left")
The filtering loss measures taxa contribution to the total covariance. We assume that if the set of $J$ taxa is not important, then removing these taxa will not dramatically decrease the OTU table total covariance (defined in section 2.1 of Smirnova et al. 2018). We define the loss due to filtering out the set of $J$ taxa as $$ FL (J_j)= 1- \frac{\text{covariance magnitude after removing $J$ taxa}}{\text{total covariance}}. $$
The filtering loss $FL(J_j)$ is a number between $0$ and $1$, with values close to $0$ if the set of taxa $J$ has small contribution to the total covariance and $1$ otherwise.
To find filtering loss, taxa are first re-arranged in ascending order from left to right according to their number of occurrences in the $n$ samples, i.e. the abundant taxa are to the right of the table. Once taxa are ordered, the filtering loss is calculated sequentially by removing the taxa from left to right.
We evaluate the $j^{th}$ taxon contribution to the signal using the difference in filtering loss statistic $$ DFL(j + 1) = FL(J_{j+1}) - FL(J_j). $$ The values of $DFL(j + 1)$ are close $0$ if the $j^{th}$ taxon is included erroneously, and significantly different from $0$ if this taxon has high contribution to the signal.
To illustrate the ability of this filtering loss to distinguish between noise and signal taxa, the filtering loss values for a noise taxon, a signal taxon and a combination of noise and signal taxa are shown in the table below. Specifically, this table compares the filtering loss values and their corresponding log values due to removing:
1) The $10^{th}$ least abundant noise taxon, $J_1 = {N_{10}}$ 2) The first least abundant signal taxon $J_2 = {S_1}$ 3) All noise taxa, $J_3 = {N_1, \dots, N_{13}}$ 4) All noise taxa and the first least abundant signal taxon, $J_4 = {N_1, \dots, N_{13}, S_1}$.
From the table, one can notice a large difference between log filtering loss value of the $10^{th}$ least abundant noise taxon and that of the first least abundant signal taxon ($-14.786$ v.s. $-6.970$ respectively). This implies that the data loss due to removing $N_{10}$ is minimal and dramatically less than the loss due to removing $S_1$. Moreover, since the loss from omitting all noise taxa is still less the loss from omitting one signal taxon, it shows that all noise taxa cumulatively have less contribution than a single signal taxon.
X <- as.matrix(data) Netw <- t(as.matrix(X))%*%as.matrix(X) den <- psych::tr(t(Netw)%*%Netw) #removing 10th noise taxon X_R <- X[,-which(colnames(X) == "N~10~")] #calculate corresponding norm Netw_R <- t(X_R)%*%X_R num <- psych::tr(t(Netw_R)%*%Netw_R) FL <- 1 - (num/den) #df <- data.frame(num = num, den = den, FL = FL, log_FL = round(log(FL), 3)) df <- data.frame(FL = FL, log_FL = round(log(FL), 3)) #removing first least abundant signal taxon X_R <- X[,-which(colnames(X) == "S~1~")] Netw_R <- t(X_R)%*%X_R num <- psych::tr(t(Netw_R)%*%Netw_R) FL <- 1 - (num/den) #df <- rbind(df, c(num, den, FL, round(log(FL), 3))) df <- rbind(df, c(FL, round(log(FL), 3))) #removing all noise taxa X_R <- X[,-which(substr(colnames(X), start = 1, stop = 2) %in% "N~")] Netw_R <- t(X_R)%*%X_R num <- psych::tr(t(Netw_R)%*%Netw_R) FL <- 1 - (num/den) #df <- rbind(df, c(num, den, FL, round(log(FL), 3))) df <- rbind(df, c(FL, round(log(FL), 3))) #removing all noise + first signal taxon X_R <- X[,-c(which(substr(colnames(X), start = 1, stop = 2) %in% "N~"), which(colnames(X) == "S~1~"))] Netw_R <- t(X_R)%*%X_R num <- psych::tr(t(Netw_R)%*%Netw_R) FL <- 1 - (num/den) #df <- rbind(df, c(num, den, FL, round(log(FL), 3))) df <- rbind(df, c(FL, round(log(FL), 3))) rownames(df) <- c("N~10~", "S~1~", "Noise taxa", "Noise taxa and S~1~") colnames(df) <- c("Filtering Loss","Log Filtering Loss") df[,1] <- format(df[,1], width = 1, digits = 5, format = "g") kable(df, format='html', row.names=TRUE, escape = FALSE, padding=-3L) %>% kable_styling(font_size = 14, bootstrap_options = "striped", full_width = TRUE, position = "left")
#function to output results for each simulation run resSummary <-function(X, filtX, time = NA){ rank_pvals = NULL rank_pres = NULL ntotal <- dim(X)[2] npres <- dim(filtX)[2] #number of taxa left pfilt <- (ntotal - npres)/ntotal #percentage of taxa filtered #combine into results vector res <- c(ntotal, npres, pfilt) names(res) <- c("ntotal", "npres", "pfilt") return(list(res = res, time = time)) }
This algorithm's objective is to remove noise taxa while identifying and retaining signal taxa. It takes an OTU table and a test critical value $\alpha$ as inputs and produces a reduced OTU table with less taxa. The following example illustrates this algorithm.
Input: OTU table above, test critical value $\alpha = 0.1$
The simultaneous PERFect algorithm yields preliminary taxa significance. Taxa can also be ordered by abundance and this step is not necessary; however method evaluation on mock data has shown that ordering taxa by simultaneous PERFect p-values improves permutation PERFect performance.
######################### #quantiles from fit a ########################## start <- Sys.time() res_sim <- PERFect_sim(X=data) end <- Sys.time()-start summary_sim <- resSummary(X = data, filtX = res_sim$filtX, time = end)
The table below shows taxa and their corresponding simultaneous PERFect p-values. Some taxa that were initially less important according to the the abundance ordering gained higher rank according to simultaneous PERFect p-values. For example, the $2^{nd}$ least important taxon $N_2$ in the abundance ordering becomes the $7^{th}$ least important taxon in the simultaneous PERFect p-values ordering.
#add p-values column to the data pvals <- c(1, round(res_sim$pvals, 2)) pvals.sort <- sort.int(pvals, decreasing = TRUE, index.return = TRUE) #sort data by p-values data <- data[, pvals.sort$ix] #save X for illustrating method steps X = as.matrix(data) #add p-values to the data data <- rbind(pvals.sort$x, data) rownames(data)[1] <- "Sim pvalues" data[1,1] <- NA kable(data[1,1:13], format='html', digits=2, row.names=TRUE, escape = FALSE, padding=-3L) %>% kable_styling(font_size = 11, bootstrap_options = "striped", full_width = TRUE, position = "left") kable(data[1,14:20], format='html', row.names=TRUE, escape = FALSE, padding=-3L) %>% kable_styling(font_size = 14, bootstrap_options = "striped", full_width = TRUE, position = "left")
Let $J_j = {1, \dots, j}$
Calculate $DFL(j+1) = FL(J+1) - FL(J)$
end
The test statistic $DFL$ and its corresponding values on the log scale for each taxon are displayed below. The values of $\log(DFL)$ range between $-23.71$ and $-13.98$ for the noise taxa and increase dramatically to $-6.79$ for the value of the first signal taxon $S_6$. The $\log(DFL)$ values for signal taxa range between $-7.5$ and $-0.12$, which is much larger compared to corresponding statistic values for the noise taxa.
Order.vec <- pvals_Order(data, res_sim) Order_Ind <- rep(1:length(Order.vec))#convert to numeric indicator values DFL <- DiffFiltLoss(X = X, Order_Ind, Plot = TRUE, Taxa_Names = Order.vec) #add log DFL values to the data data <- rbind(c(NA, round(log(DFL$DFL), 2)), data) rownames(data)[1] <- "log(DFL)" #rearrange rows to have DFL below sim-pvalues data <- data[c(2,1, 3:nrow(data)),] #data #display DFL and log(DFL) values df <- rbind(c(NA,DFL$DFL),c(NA, log(DFL$DFL))) df[1,] <- format(df[1,], width = 1, digits = 3, format = "g") df[2,] <- round(as.numeric(df[2,]), 2) rownames(df) <- c("DFL", "log(DFL)") colnames(df)[1]<- "N~1~" kable(df[,1:8], format='html', row.names=TRUE, escape = FALSE, padding=-3L) %>% kable_styling(font_size = 14, bootstrap_options = "striped", full_width = TRUE, position = "left") kable(df[,9:13], format='html', row.names=TRUE, escape = FALSE, padding=-3L) %>% kable_styling(font_size = 14, bootstrap_options = "striped", full_width = TRUE, position = "left") kable(df[,14:20], format='html', row.names=TRUE, escape = FALSE, padding=-3L) %>% kable_styling(font_size = 14, bootstrap_options = "striped", full_width = TRUE, position = "left")
For permutation $1, \dots, k$
Randomly select $J^_{j+1} \subset {1, \dots, p}$ with $|J^_{j+1}| =j+1$
Calculate $DFL^(j+1) = FL(J^_{j+1}) - FL(J^*)$
end
end
In this step of the algorithm, we construct the permutation distribution for each set of $j$ taxa to evaluate significance of corresponding $\log(DFL)$ values. To build the distribution of $d\mathcal{F}{j+1}$ and test statistical hypothesis \begin{equation} {\small H_0: d\mathcal{F}{j+1} = 0 \hspace{10mm}\mbox{vs}\hspace{10mm} H_A: d\mathcal{F}{j+1} > 0, } \end{equation} we randomly select $k$ sets $J^_{j+1}$ of taxa labels and calculate the corresponding $DFL^(j+1)$ value for each sample. For example, to obtain 6 permutations ($k = 6$) for $5$ taxa, we draw sets of size $|J^_{j+1}| = 5$. Then, the $DFL^$ for these samples are calculated according to this permutation ordering. In particular, for the first permutation, $$ DFL^(5) = FL^({N_8, N{11}, S_2, S_6, N_4}) - FL^*({N_8, N_{11}, S_2, S_6}). $$
The values of $DFL^*$ for other sets of taxa are calculated similarly.
## Some internal functions ######################################## #Function to calculate j^th DFL loss ######################################## DiffFiltLoss_j <- function(Perm_Order,Netw, j){ J_jp1 <- Perm_Order[seq_len(j+1)] #calculate corresponding norm ratios this is the value of the matrix || DFL <- (2*t(Netw[J_jp1[max(NROW(J_jp1))],-J_jp1])%*%Netw[J_jp1[max(NROW(J_jp1))],-J_jp1]+ Netw[J_jp1[max(NROW(J_jp1))], J_jp1[max(NROW(J_jp1))]]^2) return(DFL) } ################################## #Randomly permute labels n times for a fixed j ################################# Perm_j_s <- function(j, Netw, k,p, p2 = NULL){ #Netw = X'X - data matrix with taxa in columns and samples in rows #k - number of permutations used #create a list of k*p arrangements of orders if(is.null(p2)){p2 <- p} labl <- sapply(seq_len(k),function(x) NULL) labl <- lapply(labl,function(x) sample(seq_len(p),p2)) FL_j <- sapply(labl,DiffFiltLoss_j, Netw = Netw, j=j) return(FL_j) } ################################################### #Obtain sampling distribution using permutations of DFL ################################################### sampl_distr <- function(X, k){ p <- dim(X)[2] Netw <- t(X)%*%X #full_norm <- psych::tr(t(Netw)%*%Netw)#this is full norm value full_norm <- sum(Netw*Netw) #For each taxon j, create a distribution of its DFL's by permuting the labels res_all <- lapply(seq_len(p-1),function(x) x) # Calculate the number of cores no_cores <- parallel::detectCores()-1 # Initiate cluster, start parrallel processing cl <- parallel::makeCluster(no_cores) #load variables for each core parallel::clusterExport(cl,c("DiffFiltLoss_j","Perm_j_s","Netw","k","p"),envir=environment()) #parallel apply FL_j <- parallel::parLapply(cl, res_all, function(x) Perm_j_s(j = x, Netw =Netw, k=k, p =p, p2 = x+1)) #FL_j <- lapply(res_all, function(x) Perm_j_s(j = x, Netw =Netw, k=k, p =p, p2 = x+1)) # End the parallel processing parallel::stopCluster(cl) #divide by the full matrix norm values res_pres <- lapply(FL_j, function(x) {x/full_norm}) return(res_pres) } Perm_j_s_il <- function(j, Netw, k,p, p2 = NULL){ #Netw = X'X - data matrix with taxa in columns and samples in rows #k - number of permutations used #create a list of k*p arraments of orders if(is.null(p2)){p2 <- p} labl <- sapply(1:k,function(x) NULL) labl <- lapply(labl,function(x) sample(1:p,p2)) FL_j <- sapply(labl,DiffFiltLoss_j, Netw = Netw, j=j) return(list(labl = labl, FL_j = FL_j)) }
#we illustrate computation of the first k=6 permutations for each taxon p <- dim(X)[2] Netw <- t(X)%*%X full_norm <- psych::tr(t(Netw)%*%Netw)#this is full norm value #For each taxon j, create a distribution of its DFL's by permuting the labels res_all <- lapply(1:(p-1),function(x) x) FL <- lapply(res_all, function(x) Perm_j_s_il(j = x, Netw =Netw, k=6, p =p, p2 = x+1)) #extract every 2nd subelement for each element of list FL to get FL_j values FL_j <- lapply(FL, "[[", 2) #divide by the full matrix norm values to get DFL*(j+1) values res_pres <- lapply(FL_j, function(x) {x/full_norm}) #extract every 1st subelement for each element of list of selected labels labl <- lapply(FL, "[[", 1) labl<- lapply(labl, function(x) matrix(unlist(x), ncol = 6, byrow = FALSE)) #take sets of 5 taxa T5 <- apply(labl[[4]], 2, function(x) Order.vec[x]) df <- T5 df <- rbind(df,round(log(res_pres[[4]]),2)) rownames(df) <- c(paste0("Taxa ", 1:5),"log(DFL*)") colnames(df) <- paste0("Permutation ",1:6) kable(df, format='html', row.names=TRUE, escape = FALSE, padding=-3L) %>% kable_styling(font_size = 11, bootstrap_options = "striped", full_width = TRUE, position = "left")
Using quantile matching fit the normal distribution to the
logarithm of the sample $DFL^*(j+1), j = 1, \dots, p-1$ to obtain
the null distribution $X_j \sim \mbox{SN}(\widehat\xi_j, \widehat\omega_j^2, \widehat\alpha_j)$
end
Here, after $k$ values of $\log(DFL^)$ for each group of $j$ taxa are obtained, they are fit to a Skew Normal distribution with three parameters $\xi,\omega$ and $\alpha$ using quantile matching method. For example, to estimate the parameters of $5$ taxa distribution, we would use $\log(DFL^)$ values from the $6$ columns in the table above to fit Skew Normal distribution to the sample ${-21.25,-23.71,-4.7,-13.98,-3.95,-6.79}$.
This step is necessary to disentangle the null and alternative distributions contributing to the observed permutation distribution. When we use $k$ permutations to build the reference distribution of filtering loss differences, it always includes both the values from the null and from the alternative hypotheses as illustrated above in the example of step 4 of the algorithm. Therefore, we need to make the assumption that the resulting permutation distribution is a mixture of observed differences under the null and alternative distributions. The p-value is a measure of how extreme a value of statistic (here $DFL(5)$) is relative to the null distribution. If we use fully non-parametric approach and simply construct the p-value as the proportion of times when $\log DFL(5)$ is above $\log DFL^*(5)$, then we use mixture of null and alternative as the null distribution. Under this approach, the p-value would not be appropriately calculated and we may lose more taxa than necessary.
#distribution based on k = 1000 permutations dfl_distr <- sampl_distr(X = X, k=1000) #, sample = FALSE #distribution fit to skew-normal lfl <- lapply(dfl_distr, function(x) log(x[!x==0])) lp <- list(xi = mean(lfl[[1]]), omega = sd(lfl[[1]]), alpha = 1.5) fit <- lapply(lfl, function(x) fitdistrplus::qmedist(x, "sn", probs=c(0.10, 0.25, 0.5), start=lp)) est <- lapply(fit, function(x) x$estimate) #histogram of log(DFL^*) values for the 5th taxon i=4 lfl <- data.frame(log(dfl_distr[[i]][!dfl_distr[[i]]==0])) if(length(dfl_distr[[i]][dfl_distr[[i]]==0])>0){ print(paste("taxon", i, "number of zeroes = ", length(dfl_distr[[i]][dfl_distr[[i]]==0])))} names(lfl) <- c("DFL") #plot histogram x = "DFL" ord_map = aes_string(x = x) #hist <- ggplot(lfl, ord_map) + geom_histogram() hist <- ggplot(lfl, ord_map) + geom_histogram(bins = 30, aes(y=..density..), col = "red", fill = "green", alpha =0.2)+ theme(panel.background = element_rect(fill = "white"), panel.grid.major = element_line(colour = "grey90"), axis.text.x = element_text( size=10))+ ggtitle("") + xlab("log differences in filtering loss") + ylab("Density") #add skew-normal fit hist <- hist + stat_function(fun = dsn, args = list(xi = est[[i]][1], omega = est[[i]][2], alpha = est[[i]][3]), colour="blue") hist
In this example, we increase the number of permutations to $k=1000$, which is necessary to get a reasonable distribution for the values of $DFL^*(j+1)$. The histogram of $5$ taxa distribution is given below with the blue line indicating Skew Normal fit with parameters ($\hat{\xi}, \hat{\omega},\hat{\alpha}$) = (r round(c(est[[i]][1],est[[i]][2],est[[i]][3]),2)
). Since we have only $20$ taxa in this example, the distribution fit is not as accurate as for the larger number of taxa, but it illustrates the idea of the algorithm.
Calculate the p-value $p_j$ for $DFL(j+1), j = 1, \dots, p-1$ as
$p_j := P[X_j > \log{DFL(j+1)}]$
end
The $\log(DFL)$ value for the $5^{th}$ taxon in simultaneous p-values ordering was calculated in step 3 as r round(log(DFL$DFL[[4]]),2)
. Therefore, we calculate the $5$th taxon p-value as $p_{5} = P[X_{5} > -19.72]$ = r round(1 - psn(x=-19.72, xi = -25.26, omega = 10.09, alpha = 34.41),4)
, where $X_{5} \sim \mbox{SN}(\widehat\xi_{5} = -25.26, \widehat\omega_{5}^2 = 10.09, \widehat\alpha_{5} = 34.41)$.
#all p-values for(i in 1:(p-1)){ pvals[i] <- 1 - psn(x=log(DFL$DFL[i]), xi = est[[i]][1], omega = est[[i]][2], alpha = est[[i]][3])} data <- rbind(c(NA, round(pvals,2)), data) rownames(data)[1] <- "Perm pvalues"
The $4$ rows of the example OTU table below combines taxa PERFect simultaneous p-values, their corresponding $\log(DFL)$ values, raw PERFect permutation p-values, and their corresponding averaged values. The p-values of noise taxa are large and the p-values for the signal taxa are small as expected.
pvals_avg <- zoo::rollmean(pvals, k=3, align="left", fill=NA ) #replace na's with original values pvals_avg[is.na(pvals_avg)] <- pvals[is.na(pvals_avg)] data <- rbind(c(NA, round(pvals_avg, 2)), data) rownames(data)[1] <- "Avg Perm pvalues" #rearrange rows data <- data[c(3,4,2,1, 5:nrow(data)),] #nice display kable(data[1:4,1:13], format='html', digits=2, row.names=TRUE, escape = FALSE, padding=-3L) %>% kable_styling(font_size = 10, bootstrap_options = "striped", full_width = TRUE, position = "left") kable(data[1:4,14:20], format='html', row.names=TRUE, escape = FALSE, padding=-3L) %>% kable_styling(font_size = 14, bootstrap_options = "striped", full_width = TRUE, position = "left")
The OTU table above indicates that the first significantly small averaged p-value is $p_{14}= 0.02 \leq 0.1 :=\alpha$, which occurs at the first signal taxon $S_6$. Thus the algorithm has successfully filtered noise taxa and preserved all the true signal taxa.
#select taxa that are kept in the data set at significance level alpha Ind <- min(which(pvals_avg <=0.1)) #if jth DFL is significant, then throw away all taxa 1:j filtX <- X[,-(1:Ind)] # kable(filtX, # format='html', # row.names=TRUE, # escape = FALSE, # padding=-3L) %>% kable_styling(font_size = 12) #
The filtering methods for this package are wrapped into two main functions, PERFect_sim() (performing simultaneous filtering) and PERFect_perm() (performing permutation filtering). First, we load the OTU matrix with 240 samples and 46 taxa.
data(mock2) Counts <- mock2$Counts dim(Counts)
By default, the function PERFect_sim() takes the data table, $X$, as a matrix or data frame, orders it by the taxa abundance, uses 10%, 25% and 50% quantiles for matching the log of DFL to a Skew-Normal distribution and then calculates the p-value for each taxon at the significance level of $\alpha$ = 0.1. The function PERFect_sim() only needs a taxa table as the input, and other parameters are set to default.
res_sim <- PERFect_sim(X = Counts)
The "filtX" object from the result stores the filtered OTU matrix, which consists of 240 samples and 10 taxa in this example. We can identify the remaining taxa by looking into the column of the OTU matrix.
dim(res_sim$filtX) colnames(res_sim$filtX) # signal taxa
The p-values for all taxa can be plot using the function pvals_Plots().
p <- pvals_Plots(PERFect = res_sim, X = Counts, quantiles = c(0.25, 0.5, 0.8, 0.9), alpha=0.05) p$plot + ggtitle("Simultanenous Filtering")
Alternatively, we can use permutation filtering PERFect_perm() which is more robust than simultaneous filtering. By default, this function generates k = 10000 permutations for each taxa, thus it can be computationally expensive for a large OTU matrix. We offer user a fast algorithm which employs an unbalanced binary search that optimally finds the cutoff taxon without building the permutation distribution for all taxa. The codes for these options are shown below.
res_perm <- PERFect_perm(X = Counts, Order = "pvals", pvals_sim = res_sim, algorithm = "full") res_perm2 <- PERFect_perm(X = Counts, Order = "pvals", pvals_sim = res_sim, algorithm = "fast") p1 <- pvals_Plots(res_perm, Counts) p1 <- p1$plot + ggtitle("Full Algorithm") p2 <- pvals_Plots(res_perm2, Counts) p2 <- p2$plot + ggtitle("Fast Algorithm") ggpubr::ggarrange(p1,p2,ncol = 2)
The figure above illustrates the plot of permutation PERFect p-values calculated by the full and fast algorithm for the mock2 dataset. Although both methods achieve the similar cutoff taxon, the fast algorithm only calculate 11 out 46 p-values hence is more computationally efficient.
Smirnova, E., Huzurbazar, S., and Jafari, F.(2018). PERFect: PERmutation Filtering test for microbiome data, Biostatistics, kxy020, pp. 1-17. https://doi.org/10.1093/biostatistics/kxy020.
Brooks, J. P., Edwards, D. J., Harwich, M. D., Rivera, M. C., Fettweis, J. M., Serrano, M. G., Reris, R. A., Sheth, N. U., Huang, B., Gigerd, P. and others. (2015). The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies. BMC Microbiology, 15, pp. 1–14.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.