The HiCAGE
(Hi-C Annotation and Graphics Ensemble) package offers users the
ability to annotate and visualize 3C-based genomic data at whole-genome scale.
This document describes the functionalities of the package and provides users
with detailed descriptions on its use.
HiCAGE
has a variety of features designed for efficient annotation, analysis,
and visualization of interacting chromatin regions.
HiCAGE
relies on the following dependencies:
library(HiCAGE) library(readr) library(tidyr) library(dplyr) library(GenomicRanges) library(biomaRt) library(IRanges) library(magrittr) library(circlize) library(shiny) library(grDevices) library(graphics) library(utils) library(stats) library(plotrix) library(UpSetR) library(topGO) library(org.Hs.eg.db) library(org.Mm.eg.db)
HiCAGE
is designed to handle tab-delimited data as input. 3C-based genomic
data, segmentation data, and RNA-seq data can all be input. RNA-seq data,
however, is optional. Files can be in txt, tsv, bed, or other formats.
HiCAGE
is written to require the least amount of data manipulation prior to
loading files by allowing the user to specify the columns containing the
necessary data from each data file. Default column selection is setup to handle
common data layouts.
Example format for initial data input:
hic_chr20 <- system.file("extdata", "hic_chr20.txt", package = "HiCAGE") segment_chr20 <- system.file("extdata", "segment_chr20.bed", package = "HiCAGE") rna_chr20 <- system.file("extdata", "rna_chr20.tsv", package = "HiCAGE") example <- overlap(hicfile = hic_chr20, segmentfile = segment_chr20, rnafile = rna_chr20, bio_mart = "ensembl", martset = "hsapiens_gene_ensembl", webhost = "http://Feb2014.archive.ensembl.org")
The overlap
function defaults to selecting the appropriate columns using the
format found in the example HiCCuPs looplist Hi-C file. If columns do not match
this order or if the Hi-C file contains additional unneeded columns, the
hic.columns
argument in the overlap
function can be used to select the
proper columns.
Hi-C data files need to contain the following columns in order (specified using
hic.columns
argument in the overlap
function):
overlap(hicfile = hic_chr20, segmentfile = segment_chr20, rnafile = rna_chr20, hic.columns = c(1:6, 8))
Chrom1 | Chrom1Start | Chrom1End | Chrom2 | Chrom2Start | Chrom2End | Score ------ | ----------- | --------- | ------ | ----------- | --------- | ----- "V1" | "V2" | "V3" | "V4" | "V5" | "V6" | "V8" 20 | 13300000 | 13310000 | 20 | 13520000 | 13530000 | 35 20 | 17520000 | 17530000 | 20 | 17590000 | 17600000 | 71
Segmentation data files need to contain the columns in the following example:
Chrom | ChromStart | ChromEnd | Mark | Score ----- | ---------- | -------- | ---- | ----- "V1" | "V2" | "V3" | "V4" | "V5" chr20 | 62218 | 62675 | EWR | 0.0000 chr20 | 117995 | 118433 | HET | 781.1476
StateHub/StatePaintR segmentation files all use the above format. However,
users can still select columns containing the necessary information using
segment.column
in the overlap
function:
overlap(hicfile = hic_chr20, segmentfile = segment_chr20, rnafile = rna_chr20, segment.columns = c(1:5))
HiCAGE
uses Segmentation Score found in the segmentation file to prioritze
state calls in the genomic regions defined in the Hi-C file. Alternatively,
users can manually select state prioritization, if scores do not exit or are not
desired, using the manual.priority
argument in the overlap
function. In this
instance, users select columns in the segmentation file containing Chrom,
ChromStart, ChromEnd, and State using the segment.columns
argument. Then define priority by entering state calls in the manual.priority
argument in order of highest priority to lowest priority.
overlap(hicfile = hic_chr20, segmentfile = segment_chr20, rnafile = rna_chr20, segment.columns = c(1:4) manual.priority = c("PAR", "PPR", "EAR", "EPR", "HET"))
HiCAGE
uses Segmentation Score found in the segmentation file to prioritze
state calls in the genomic regions defined in the Hi-C file. However,
occasionally two states in one region may have identical segementation scores.
In this instance, the genomic region will be annontated with both calls in the
final datatable, creating duplicate rows of interacting regions with all the top
state calls. If this is not desired, prune.priority
can be set in overlap
to prioritize states after prioritizing using segmentation scores. A
concatenated list is entered in the form c("PAR", "EAR", "AR", "PPR", "EPR",
"TRS", "HET") ordered from highest priority to lowest priority. If
manual.priority
is set, this argument will have no effect.
overlap(hicfile = hic_chr20, segmentfile = segment_chr20, rnafile = rna_chr20, segment.columns = c(1:4) prune.priority = c("PAR", "PPR", "EAR", "EPR", "HET"))
RNA-seq data files need to contain only Ensembl gene ID and gene expression data. User can decide to use FPKM or TPM at their discretion:
Ensembl ID | FPKM | ---------- | ---- | "V1" | "V7" | ENSG00000101138.7 | 24.69| ENSG00000101162.3 | 2.22 |
User can select columns in the RNA-seq data file using rna.column
in the
overlap
function:
overlap(hicfile = hic_chr20, segmentfile = segment_chr20, rnafile = rna_chr20, rna.columns = c(1, 7))
HiCAGE
enables users to select various biomaRts using the biomaRt
package,
allowing for flexibility in species and genome build for annotating
genomic regions. Default genome selection is "hsapiens_gene_ensembl"
. Genome
selection must match the genome used to compile the 3C-based data file.
The biomaRt can be specified in the overlap
function and will be passed
on to the biomaRt
package
overlap <- function(hicfile, segmentfile, rnafile, bio_mart = "ensembl", martset = "hsapiens_gene_ensembl", webhost = "www.ensembl.org")
dataset | description | version --------- | --------- | -------- hsapiens_gene_ensembl | Human genes | GRCh38.p7 mmusculus_gene_ensembl | Mouse genes | GRCm38.p5 rnorvegicus_gene_ensembl | Rat genes | Rnor_6.0 dmelanogaster_gene_ensembl| Fruitfly genes | BDGP6 scerevisiae_gene_ensembl | Saccharomyces cerevisiae genes | R64-1-1 celegans_gene_ensembl | Caenorhabditis elegans genes | WBcel235 ocuniculus_gene_ensembl | Rabbit genes | OryCun2.0 xtropicalis_gene_ensembl | Xenopus genes | JGI 4.2 drerio_gene_ensembl | Zebrafish genes | GRCz10
A full list of currently available datasets can be found using:
ensembl <- useMart(biomart = "ensembl") listDatasets(ensembl)
dataset | description | version --------- | --------- | ------ mwsbeij_gene_ensembl | Mouse WSBEiJ genes | WSB_EiJ_v1 mc3hhej_gene_ensembl | Mouse C3HHeJ genes | C3H_HeJ_v1 mc57bl6nj_gene_ensembl | Mouse C57BL6NJ genes | C57BL_6NJ_v1 mnzohlltj_gene_ensembl | Mouse NZOHlLtJ genes | NZO_HlLtJ_v1 mpwkphj_gene_ensembl | Mouse PWKPhJ genes | PWK_PhJ_v1 mfvbnj_gene_ensembl | Mouse FVBNJ genes | FVB_NJ_v1 mcbaj_gene_ensembl | Mouse CBAJ genes | CBA_J_v1 mcasteij_gene_ensembl | Mouse CASTEiJ genes | CAST_EiJ_v1 mlpj_gene_ensembl | Mouse LPJ genes | LP_J_v1 makrj_gene_ensembl | Mouse AKRJ genes | AKR_J_v1 mbalbcj_gene_ensembl | Mouse BALBcJ genes | BALB_cJ_v1 mnodshiltj_gene_ensembl | Mouse NODShiLtJ genes | NOD_ShiLtJ_v1 m129s1svimj_gene_ensembl| Mouse 129S1SvImJ genes | 129S1_SvImJ_v1 mspreteij_gene_ensembl | Mouse SPRETEiJ genes | SPRET_EiJ_v1 mdba2j_gene_ensembl | Mouse DBA2J genes | DBA_2J_v1 maj_gene_ensembl | Mouse AJ genes | A_J_v1
A full list of currently available datasets in "ENSEMBL_MART_MOUSE" can be found using:
ensembl <- useMart("ENSEMBL_MART_MOUSE") listDatasets(ensembl)
listEnsemblArchives()
overlap
Data OutputData from the overlap
function is output as a data table
head(example)
gogenelist
Data OutputThe gogenelist
function allows the user to conveniently select a chromatin
mark of interest (proximalmark) interacting with another chromatin mark
(distalmark) and generates an ordered list of all genes and expression data
associated with the proximalmark. HGNC symbols can be included by setting
gene.symbol
argument to TRUE. A gene expression cutoff can be set with
expression_cutoff
argument to filter gene list.
gogenelist(datafile = example, proximalmark = "PAR", distalmark = "EAR", gene.symbol = TRUE, species = "human", bio_mart = "ensembl", martset = "hsapiens_gene_ensembl", webhost = "http://feb2014.archive.ensembl.org", geneOnto = FALSE, expression_cutoff = 1)
If geneOnto
argument is set to TRUE
, gene ontology analysis will be run on
genes found near the proximal mark interacting with the distal mark. GO analysis
is run using the TopGO
package. The background gene set is all genes found in
the overlap
data output file.
go.analysis <- gogenelist(datafile = example, proximalmark = "PAR", distalmark = "EAR", gene.symbol = FALSE, species = "human", bio_mart = "ensembl", martset = "hsapiens_gene_ensembl", webhost = "http://feb2014.archive.ensembl.org", geneOnto = TRUE, expression_cutoff = 0.1) head(go.analysis$GO_Results)
circleplot(datatable = example, display.legend = TRUE)
plotup(datafile = example)
A graphical user interface of HiCAGE
can be launched locally with:
hicageshiny()
Alternatively, a HiCAGE GUI can be accessed online at https://junkdnalab.shinyapps.io/hicage/ without the need to install any software
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.