Description Usage Arguments Details Value Author(s)
View source: R/imputeDataMFA.R
Impute the missing rows of data tables using the alternating least
squares algorithm used in PCA. This function is internally called by
MIMFA
and is not usually called directly by a user.
1 | imputeDataMFA(datasets, U, missRows, comp, maxIter=500, tol=1e-10)
|
datasets |
a list containing the data tables with missing rows. Tables in the list should be arranged in samples x variables, with samples order matching in all data tables. |
U |
the compromise configuration, a matrix with the individuals
coordinates as returned by |
missRows |
a list containing character vectors with the name of the missing individuals (rows) per table. |
comp |
a number of components kept for imputation. |
maxIter |
integer, maximum number of iterations for the iterative algorithm. |
tol |
positive value, the threshold for assessing convergence. |
Since the core of MFA is a PCA of the merged data tables K, the algorithm suggested to estimate MFA axes and impute missing values is inspired from the alternating least squares algorithm used in PCA. This consists in finding matrices F and U which minimize the following criterion:
||K-M-FU||^2 = ∑_{i}∑_{k}( K_{ik} - M_{ik} - ∑_{d=1}^D F_{id} U_{kd})^2,
where M is a matrix with each row equal to a vector of the mean of each variable and D is the kept dimensions in PCA. The solution is obtained by alternating two multiple regressions until convergence, one for estimating axes (loadings \hat{U}) and one for components (scores \hat{F}):
\hat{U}' = (\hat{F}'\hat{F})^{-1}\hat{F}'(K - \hat{M})
\hat{F} = (K - \hat{M})\hat{U}(\hat{U}'\hat{U})^{-1}.
The imputeDataMFA
algorithm first consists in imputing missing
values in K with initial values (the column means on the
non-missing entries), then \hat{M} is computed. The second step
of the iterative algorithm is to calculate
\hat{F} = (K - \hat{M})U(U'U)^{-1} on the completed dataset by
using D components of U. Missing values are estimated as
\hat{K} = \hat{M} + \hat{F}U'. The new imputed data set K
is obtained by replacing the missing values of the original K
matrix with the corresponding elements of \hat{K}, whilst keeping
the observed values unaltered. These steps of estimation of the
parameters and imputation of the missing values are iterate until
convergence. The number D of components used in the algorithm can
be estimated setting the estim.ncp
argument to TRUE
in the
function MIMFA
.
A list containing components with the imputed rows for each data table.
Ignacio González
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.