teff
is a software package to predict the effect of treating an individual on an outcome given the individual's profile in some feature data. The package focuses on transcriptomic features for which surrogate covariates need to be estimated. The estimation of treatment effects is based on inferences using random causal forest as implemented in the package grf by Tibshirani et al.
Here, we show how to use the package to estimate transcriptomic profiles with positive and negative sexual dimorphism on immune cell count. We use sex as a treatment and therefore search for the transcription profiles for which the effect of sex is high on the amount of immune cells in blood.
We have inferred the amount of T cells in blood in the GTEx individuals and have preselected the transcriptomic features with a differential expression analysis of the interaction between T cell count and sex (more details in Caceres et al. 2021, under preparation).
library(teff)
Sys.setenv(VROOM_CONNECTION_SIZE=5000072)
The
names(tcell) head(tcell$teffdata)
The transcriptomic feature data contains the expression levels of 14 genes, found significant in the differential expression analysis of the interaction between T cell count and sex.
dim(tcell$features) head(tcell$features)
We then use predictteff
to estimate the effect of sex on T cell count in a subsample of test individuals. The function randomly selects 80\% of individuals to grow the forest using the feature data, adjusted by covariates. The estimated effect of sex is the estimated difference of T cell count between sexes when the transcriptomic data across these 14 genes are kept constant. We refer to the estimated effect of sex of an individuals as the individual's associated sexual dimorphism.
The predictor of sexual dimorphism is applied on the \20% of left-out individuals, who are used to estimate the effect of sex on each of them, given their gene expression levels across the genes. For the immune sexual dimorphism, the genes within the features belong to sex chromosomes. For those that are coupled with homologous genes between the sexual chromosomes, we estimate the average levels between homologs and use those as the features in the random forest. The matrix homologous
couples the homologs between rows.
homologous<- matrix(c("DDX3Y","DDX3X","KDM5D","KDM5C", "PRKY","PRKX","RPS4Y1","RPS4X", "TXLNGY", "TXLNG", "USP9Y", "USP9X", "XIST", "XIST", "TSIX", "TSIX"), nrow=2) pred <- predicteff(tcell, featuresinf=homologous, profile=TRUE) pred
We can plot the prediction of the associated sex effect with confidence intervals. Treated refers to females, and not treated to males.
plotPredict(pred, ctrl.plot=list(lb=c("Male", "Female"), wht="topleft", whs = "bottomright"))
For each individual predicteff
predicts a sex effect with 95\% confidence intervals. Therefore, individuals that have a significantly negative effect of sex are those for which their specific transcription levels would result in lower T cell count when observed in females than in males. Those with significantly positive effects of sex are those for which their transcription profiles would result in higher T cell count when observed in females than males.
predicteff
allows for a parameter profile
that when it is set to TRUE
it attempts to create a profile of gene expression levels across all individuals with significantly negative sex effect. It also creates a profile for the group of individuals with significantly positive sex effects.
pred$profile
The profile for positive dimorphism means, for instance, that the group of individuals who at the same time up-regulate XIST and TSIX and down-regulate PRKY-PRKX, TSIX-TSIX, KDM5D-KDM5C, USP9Y-USP9X, RPS4Y1-RPS4X, and DDX3Y-DDX3X presents more T cell counts in females than in men.
The sexual dimorphic profiles of immune cell count can be used to target individuals from other studies and test whether the profiles present different sex-associated risks or survival of different diseases. The following code shows how to set up data for the transcriptomic study in arthritis GSE17755 from GEO.
library(GEOquery) gsm <- getGEO("GSE17755", AnnotGPL = TRUE) gsm <- gsm[[1]] data4teff <- feateff(gsm, tname="gender:ch1", reft=c("male", "female"), effname="disease:ch1", refeff=c("healthy","arthritis"), covnames="age:ch1", covtype="n", sva=TRUE, UsegeneSymbol=TRUE)
The result has been saved in the data structure data4teff
within the teff
package, with only the genes within the sexual dimorphism profiles. We see how the data structure is a list with fields
data(data4teff) names(data4teff) names(data4teff$teffdata)
with variables:
The
res <- target(data4teff, pred, plot=TRUE, effect="positive", featuresinf=homologous, nmcov="age.ch1", model="binomial") res
When the outcome is continuous a plot of the interaction can also be obtained. We now show the targeting of the sexual dimorphism on Tcell count on the entire tcell
dataset.
data(tcell) homologous<- matrix(c("DDX3Y","DDX3X","KDM5D","KDM5C","PRKY","PRKX","RPS4Y1","RPS4X","TXLNGY", "TXLNG", "USP9Y", "USP9X", "XIST", "XIST", "TSIX", "TSIX"), nrow=2) pf <- predicteff(tcell, featuresinf=homologous, profile=TRUE) res <- target(tcell, pf, effect="positiveandnegative", featuresinf=homologous, nmcov="age", model="log2") res
Clearly, for this case, the association of the effect with the interaction between sex and the groups of sexual dimorphism is high because this data and this effect were used to infer the treatment effect groups. The highly significant interaction is illustrated by the interaction plot of mean estimates
plotTarget(res, labs=c("Sexual dimorphism", "T cell count", "Condition", "Male", "Female"))
or by the boxplot of T cell count quartiles
boxPlot(res, labs=c("Sexual dimorphism", "T cell count", "Condition", "Male", "Female"), lg="bottomleft")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.