suppressPackageStartupMessages({ library(knitr) library(grid) library(png) library(fatemapper) })
Biological understanding of cell differentiation during development in model organisms is growing but incomplete. Considerable energy is devoted to using genome-scale assays to fill gaps in knowledge.
Schematics of a 'cell fate map' of Drosophila melanogaster (hosted by the Society for Developmental Biology) are provided in Figure \@ref(fig:fig1). Labels on subregions of the map describe anatomical structures that will develop from the cells in the subregion.
im = readPNG(system.file("pngs/fate-map.png", package="fatemapper")) grid.raster(im)
data(planiTerms) kable(planiTerms, caption="Terms abbreviated in Figure 1.")
The purpose of this package is to lay a groundwork for integration of Bioconductor data structures for genomic assays and genomic annotation with information on anatomical role and cell/organ/organism development to help refine molecular-level understanding of cell differentiation and organ development. The Berkeley Drosophila Genome Project provides information on systematic annotation of stages of fly development. This package makes use of resources in this project, inspired by the PNAS paper of @Wu2016.
@Wu2016 describe how spatially organized data on gene expression patterns can be analyzed to identify "principal patterns" that align meaningfully with fate map regions. Given measures of expression intensity at $M$ locations for $G$ genes, the $M \times G$ expression matrix $X$ is approximated as the product of a "pattern dictionary" $D \geq 0$ (elementwise) and a coefficient matrix $A \geq 0$ for with $||X - DA||_F$ is small, with $||\cdot||_F$ denoting the Frobenius norm. Two basic problems of application of this idea are (a) obtaining criteria for limiting the complexity of the dictionary $D$, and (b) using the patterns identified in $D$, with feature coefficients estimated in $A$, to enhance biological knowledge. Problem (a) is primarily a problem of statistical model selection, and problem (b) is a problem of integrating model parameter interpretation with biological theory and knowledge.
The fatemapper package includes a representation of the
principal patterns identified by @Wu2016, and a function,
ggBlast
, that renders aspects of this model. See
Figure \@ref(fig:doviz1) for a display using landmarks
defined in @campos2013embryonic, and placed approximately
"by eye" on the digital template for the blastoderm.
The landmark symbols used here are decoded in Table \@ref(tab:tab2).
data(PP) data(template405) PPmat = data.matrix(PP) blanken = function() theme(axis.ticks.y=element_blank(),axis.text.y=element_blank(), axis.ticks.x=element_blank(), axis.text.x=element_blank()) ggBlast(PPmat, thresh=.5, template=template405) + xlab("<-- anterior") + ylab("dorsal -->") + blanken() + geom_text(data=DmLandmarks(), aes(x=x,y=y,label=landm))
data(dmMapTerms) kable(dmMapTerms, caption="Terms abbreviated in Figure 2.")
ggBlast
and CFMexplorer
In this section we review basic components of the fatemapper package leading to Figure \@ref(fig:doviz1).
The gene expression data are derived through registration and digitization of images of the Berkeley Drosophila Genome Project. An excel spreadsheet is used to create the data.frame 'expressionPatterns' in the fatemapper package, which has 405 rows corresponding to a linearization of the blastoderm ellipse, and 1640 columns representing unevenly replicated data on 701 unique genes exhibiting spatially restricted expression patterns in the blastoderm. A small excerpt:
data(expressionPatterns) kable(expressionPatterns[1:4,1:5])
Utilities are provided to map the 405 rows to positions in an elliptical template for the blastoderm.
mnum = matit(1:405, tmpl=template405)[16:1,] gnum = getXY(t(mnum), threshold=0) plot(gnum[,1], gnum[,2], pch=" ", xlab="<-- anterior", ylab="dorsal -->") text(gnum[,1], gnum[,2], gnum[,3], cex=.5)
This can be used to verify that the numerical representation of the expression patterns agree with observed spatial patterns.
```{r dovr,fig=TRUE,fig.cap="Contours of expression intensity for hbn.", fig.height=3} vizRestriction(expressionPatterns, "hbn", threshold=.1)
This can be checked against the [BDGP set of images for hbn]( http://insitu.fruitfly.org/cgi-bin/ex/report.pl?ftype=1&ftext=FBgn0008636). For this gene, the qualitative agreement is reasonable. ## The Wu et al. (2016) dictionary The pattern "dictionary" presented by @Wu2016 has 21 principal patterns that summarize coordinated variation in gene expression over the blastoderm. We can use 'vizRestriction' to sketch the region of the blastoderm occupied by a principal pattern. ```r vizRestriction(PP, "PP1", threshold=.1)
The coefficients estimated for genes contributing to principal patterns are available in 'sPPcoefGenes'. We can visualize the relative magnitudes of contributions from genes to a principal pattern using 'genebar'.
data(sPPcoefGenes) genebar(1, 20, sPPcoefGenes)
The function 'CFMexplorer' starts a shiny app that allows alteration in the thresholding leading to subregion formation in the ggBlast visualization. An independently generated NMF analysis of the expression data into 21 principal patterns is available as 'exNmf21'. Additional dictionaries with larger or smaller pattern sets can be generated and viewed easily through this app.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.