R/909.CellIndexMaps.R

#########################################################################/**
# @RdocDocumentation "9. Advanced - Cell-index maps for reading and writing"
#
# \description{
#   This part defines read and write maps that can be used to remap 
#   cell indices before reading and writing data from and to file,
#   respectively.
#
#   This package provides methods to create read and write (cell-index) 
#   maps from Affymetrix CDF files.  These can be used to store the cell
#   data in an optimal order so that when data is read it is read in
#   contiguous blocks, which is faster.
#
#   In addition to this, read maps may also be used to read CEL files that
#   have been "reshuffled" by other software.  For instance, the dChip 
#   software (\url{http://www.dchip.org/}) rotates Affymetrix Exon,
#   Tiling and Mapping 500K data.  See example below how to read 
#   such data "unrotated".
#
#   For more details how cell indices are defined, see 
#   @see "2. Cell coordinates and cell indices".
# }
#
# \section{Motivation}{
#   When reading data from file, it is faster to read the data in
#   the order that it is stored compared with, say, in a random order.
#   The main reason for this is that the read arm of the hard drive
#   has to move more if data is not read consecutively.  Same applies
#   when writing data to file.  The read and write cache of the file
#   system may compensate a bit for this, but not completely.
#
#   In Affymetrix CEL files, cell data is stored in order of cell indices.
#   Moreover, (except for a few early chip types) Affymetrix randomizes
#   the locations of the cells such that cells in the same unit (probeset)
#   are scattered across the array.  
#   Thus, when reading CEL data arranged by units using for instance 
#   @see "readCelUnits", the order of the cells requested is both random
#   and scattered.  
#   
#   Since CEL data is often queried unit by unit (except for some
#   probe-level normalization methods), one can improve the speed of
#   reading data by saving data such that cells in the same unit are
#   stored together.  A \emph{write map} is used to remap cell indices
#   to file indices.  When later reading that data back, a 
#   \emph{read map} is used to remap file indices to cell indices.
#   Read and write maps are described next.
# }
#
# \section{Definition of read and write maps}{
#   Consider cell indices \eqn{i=1, 2, ..., N*K} and file indices 
#   \eqn{j=1, 2, ..., N*K}.
#   A \emph{read map} is then a \emph{bijective} (one-to-one) function
#   \eqn{h()} such that
#   \deqn{
#     i = h(j),
#   }
#   and the corresponding \emph{write map} is the inverse function
#   \eqn{h^{-1}()} such that
#   \deqn{
#     j = h^{-1}(i).
#   }
#   Since the mapping is required to be bijective, it holds that 
#   \eqn{i = h(h^{-1}(i))} and that \eqn{j = h^{-1}(h(j))}. 
#   For example, consider the "reversing" read map function
#   \eqn{h(j)=N*K-j+1}.  The write map function is \eqn{h^{-1}(i)=N*K-i+1}.
#   To verify the bijective property of this map, we see that
#   \eqn{h(h^{-1}(i)) = h(N*K-i+1) = N*K-(N*K-i+1)+1 = i} as well as
#   \eqn{h^{-1}(h(j)) = h^{-1}(N*K-j+1) = N*K-(N*K-j+1)+1 = j}.
# }
#
# \section{Read and write maps in R}{
#   In this package, read and write maps are represented as @integer 
#   @vectors of length \eqn{N*K} with \emph{unique} elements in
#   \eqn{\{1,2,...,N*K\}}.
#   Consider cell and file indices as in previous section.
#
#   For example, the "reversing" read map in previous section can be
#   represented as
#   \preformatted{
#     readMap <- (N*K):1
#   }
#   Given a @vector \code{j} of file indices, the cell indices are
#   the obtained as \code{i = readMap[j]}. 
#   The corresponding write map is
#   \preformatted{
#     writeMap <- (N*K):1
#   }
#   and given a @vector \code{i} of cell indices, the file indices are
#   the obtained as \code{j = writeMap[i]}.
#
#   Note also that the bijective property holds for this mapping, that is
#   \code{i == readMap[writeMap[i]]} and \code{i == writeMap[readMap[i]]} 
#   are both @TRUE.
#
#   Because the mapping is bijective, the write map can be calculated from
#   the read map by:
#   \preformatted{
#     writeMap <- order(readMap)
#   }
#   and vice versa:
#   \preformatted{
#     readMap <- order(writeMap)
#   }
#   Note, the @see "invertMap" method is much faster than \code{order()}.
#
#   Since most algorithms for Affymetrix data are based on probeset (unit)
#   models, it is natural to read data unit by unit.  Thus, to optimize the
#   speed, cells should be stored in contiguous blocks of units.
#   The methods @see "readCdfUnitsWriteMap" can be used to generate a 
#   \emph{write map} from a CDF file such that if the units are read in
#   order, @see "readCelUnits" will read the cells data in order.  
#   Example:
#   \preformatted{
#     Find any CDF file
#     cdfFile <- findCdf()
#
#     # Get the order of cell indices
#     indices <- readCdfCellIndices(cdfFile)
#     indices <- unlist(indices, use.names=FALSE)
#
#     # Get an optimal write map for the CDF file
#     writeMap <- readCdfUnitsWriteMap(cdfFile)
#
#     # Get the read map
#     readMap <- invertMap(writeMap)
#
#     # Validate correctness
#     indices2 <- readMap[indices]    # == 1, 2, 3, ..., N*K
#   }
#
#   \emph{Warning}, do not misunderstand this example.  It can not be used
#   improve the reading speed of default CEL files.  For this, the data in 
#   the CEL files has to be rearranged (by the corresponding write map).
# }
#
# \section{Reading rotated CEL files}{
#   It might be that a CEL file was rotated by another software, e.g.
#   the dChip software rotates Affymetrix Exon, Tiling and Mapping 500K
#   arrays 90 degrees clockwise, which remains rotated when exported 
#   as CEL files.  To read such data in a non-rotated way, a read
#   map can be used to "unrotate" the data.  The 90-degree clockwise 
#   rotation that dChip effectively uses to store such data is explained by:
#   \preformatted{
#     h <- readCdfHeader(cdfFile)
#     # (x,y) chip layout rotated 90 degrees clockwise
#     nrow <- h$cols
#     ncol <- h$rows
#     y <- (nrow-1):0
#     x <- rep(1:ncol, each=nrow)
#     writeMap <- as.vector(y*ncol + x)
#   }
#
#   Thus, to read this data "unrotated", use the following read map:
#   \preformatted{
#     readMap <- invertMap(writeMap)
#     data <- readCel(celFile, indices=1:10, readMap=readMap)
#   }
# }
#
# @author "HB"
#
# @keyword internal
#*/#########################################################################  
HenrikBengtsson/affxparser documentation built on Feb. 9, 2024, 3:13 a.m.