In vjcitn/CSHstats: discussions and exercises on classical statistics for CSHL 2022

suppressPackageStartupMessages({
suppressMessages({
library(hca)
library(DT)
library(rols)
library(SingleCellExperiment)
library(ontoProc)
})
})

Road map

Bioconductor reads a paper
ontoProc and rols help with ontologies
shiny helps communicate

A paper on the atlas of the prostate

overview

normals

The hca package

Surveying projects of the HCA

library(hca)
p = projects(size = 200); p = dplyr::bind_rows(p, hca_next(p)) # workaround bug upstream to 1.4.0
library(DT)
datatable(as.data.frame(p))

Picking a project; Enumerating and downloading loom files

projectId = "53c53cd4-8127-4e12-bc7f-8fe1610a715c"
file_filter <- filters(
    projectId = list(is = projectId),
    fileFormat = list(is = "loom")
)
pfile = files(file_filter)
pfile$projectTitle[1]
#pfile |> files_download()

Working with loom

Very superficial filtering (to 60000 cells) and development of PCA

library(LoomExperiment)
f1 = import("/home/stvjc/.cache/R/hca/36e582f7c6e_36e582f7c6e.loom")
f1
names(colData(f1))
library(scater)
sf1 = as(f1, "SingleCellExperiment")
sf1
library(scuttle)
assay(sf1[1:4,1:4])
assayNames(sf1) = "counts"
litsf1 = sf1[,1:60000]
z = DelayedArray::rowSums(assay(litsf1))
mean(z==0)
todrop = which(z==0)
litsf2 = litsf1[-todrop,]
assay(litsf2)
litsf2 = logNormCounts(litsf2)
litsf2 = runPCA(litsf2)

This code is blocked until a bucket is made available with the filtered data.

library(SingleCellExperiment)
if (!exists("litsf2")) load("litsf2.rda") # run code above, must have HDF5 in cache
metadata(litsf2)

> str(litsf2) # 22MB on disk (no quantifications)
Formal class 'DelayedMatrix' [package "DelayedArray"] with 1 slot
  ..@ seed:Formal class 'DelayedAperm' [package "DelayedArray"] with 2 slots
  .. .. ..@ perm: int [1:2] 2 1
  .. .. ..@ seed:Formal class 'DelayedSubset' [package "DelayedArray"] with 2 slots
  .. .. .. .. ..@ index:List of 2
  .. .. .. .. .. ..$ : int [1:60000] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. ..$ : int [1:23420] 13 20 22 23 31 33 34 35 36 37 ...
  .. .. .. .. ..@ seed :Formal class 'HDF5ArraySeed' [package "HDF5Array"] with 7 slots
  .. .. .. .. .. .. ..@ filepath : chr "/home/stvjc/.cache/R/hca/36e582f7c6e_36e582f7c6e.loom"
  .. .. .. .. .. .. ..@ name     : chr "/matrix"
  .. .. .. .. .. .. ..@ as_sparse: logi FALSE
  .. .. .. .. .. .. ..@ type     : chr NA
  .. .. .. .. .. .. ..@ dim      : int [1:2] 382197 58347
  .. .. .. .. .. .. ..@ chunkdim : int [1:2] 64 64
  .. .. .. .. .. .. ..@ first_val: int 0

stvjc@stvjc-XPS-13-9300:~/CSAMA_HCA$ ls -tl /home/stvjc/.cache/R/hca/36e582f7c6e_36e582f7c6e.loom
-rw-rw-r-- 1 stvjc stvjc 1206062245 Jun 21 22:37 /home/stvjc/.cache/R/hca/36e582f7c6e_36e582f7c6e.loom

Working with iSEE

Question: where is the "stop/exit" button?
Question: can we embed iSEE (or components) in a vignette? Or is there an iSEE server?

context

fgf2

Upshots

easy to survey HCA with hca package
easy to get experiments, metadata, quantifications for projects of interest
iSEE really accelerates exploration and elaboration of data and claims

Ontologies, EBI OLS, rols (thanks Laurent Gatto!), ontoProc::ctmarks

Definition: from Wikipedia

In computer science and information science, an ontology encompasses a representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of concepts and categories that represent the subject.

Every academic discipline or field creates ontologies to limit complexity and organize data into information and knowledge. Each uses ontological assumptions to frame explicit theories, research and applications. New ontologies may improve problem solving within that domain. Translating research papers within every field is a problem made easier when experts from different countries maintain a controlled vocabulary of jargon between each of their languages.

Applications in genomics

Gene Ontology: Genes and gene products in subdomains of BP, MF, CC -- biological process, molecular function, cellular component
Human Phenotype Ontology
UBERON - cross-species anatomy
Cell ontology
Cell line ontology
EFO - experimental factor ontology

Tags from any of these can be encountered in various annotation resources.

rols: Basic idea

OLS is ontology lookup service
has API
rols package help to interrogate the service
ontologies are everywhere

Learn about 'smooth muscle' with rols

library(rols)
ss = OlsSearch("smooth muscle", rows=100)
ss
tt = olsSearch(ss)
dd = as(tt, "data.frame")
datatable(dd)

ontoProc -- capitalizing on ontologyIndex (thanks Daniel Greene!), Rgraphviz (thanks Kasper Hansen!)

library(ontoProc)
co = getOnto("cellOnto")
head(co$name)

The ctmarks app: walk through linked ontologies such as PR and present additional facets about the concept in focus

Limitation: the OBO representation in use is outdated and out-links are sparse

chk = ctmarks(co)

Projects: - use rols to get more interesting information about terms into the app - update the ontology resources - go beyond OBO ... but not all the way to OWL? Evaluate the UI/UX needed to broaden ontology usage - impacts: data integration, precision of annotation, cognitive efficiency

shinywow2 - check out vjcitn.shinyapps.io/tnt4dn8 but be patient and don't do it while i am doing it ...

vjcitn/CSHstats documentation built on Jan. 9, 2025, 8:37 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

vjcitn/CSHstats
discussions and exercises on classical statistics for CSHL 2022

In vjcitn/CSHstats: discussions and exercises on classical statistics for CSHL 2022

Road map

A paper on the atlas of the prostate

The hca package

Surveying projects of the HCA

Picking a project; Enumerating and downloading loom files

Working with loom

Working with iSEE

Upshots

Ontologies, EBI OLS, rols (thanks Laurent Gatto!), ontoProc::ctmarks

Definition: from Wikipedia

Applications in genomics

rols: Basic idea

Learn about 'smooth muscle' with rols

ontoProc -- capitalizing on ontologyIndex (thanks Daniel Greene!), Rgraphviz (thanks Kasper Hansen!)

shinywow2 - check out vjcitn.shinyapps.io/tnt4dn8 but be patient and don't do it while i am doing it ...

R Package Documentation

Browse R Packages

We want your feedback!

vjcitn/CSHstats discussions and exercises on classical statistics for CSHL 2022

In vjcitn/CSHstats: discussions and exercises on classical statistics for CSHL 2022

Road map

A paper on the atlas of the prostate

The hca package

Surveying projects of the HCA

Picking a project; Enumerating and downloading loom files

Working with loom

Working with iSEE

Upshots

Ontologies, EBI OLS, rols (thanks Laurent Gatto!), ontoProc::ctmarks

Definition: from Wikipedia

Applications in genomics

rols: Basic idea

Learn about 'smooth muscle' with rols

ontoProc -- capitalizing on ontologyIndex (thanks Daniel Greene!), Rgraphviz (thanks Kasper Hansen!)

shinywow2 - check out vjcitn.shinyapps.io/tnt4dn8 but be patient and don't do it while i am doing it ...

R Package Documentation

Browse R Packages

We want your feedback!

vjcitn/CSHstats
discussions and exercises on classical statistics for CSHL 2022