We are searching data for your request:
Upon completion, a link will appear to access the found materials.
I'm searching for the most comprehensive source of chromatin interactions to support enhancer targets (data such like Hi-C, ChIA-PET, IM-PET, 5C, 4C, 3C, etc.) and my question according that could be split into two following parts:
On page http://epigenie.com/epigenetic-tools-and-databases/ I've found out links to two databases, but the reference to 4D Genome doesn't work. Does anybody know whether this database http://4dgenome.int-med.uiowa.edu was moved somewhere else or maybe its only internally available for U Iowa community? (I'm asking about probably exclusive access, because on Tan's lab page http://www.healthcare.uiowa.edu/labs/tan/software.html only the 4DGenome has the trailing part of the link int-med.uiowa.edu (presumably this int-med strands for internal/general medicine or maybe in the sense of internal/intranet access), whereas all other services like IM-PET has its link starting with http://www.healthcare.uiowa.edu/labs/tan/.
Maybe you know even better database for chromatin interactions covering as much as possible cell lines/tissue types for GRCh38 assembly or any other whose coordinates could be easily converted, e.g. by liftOver? I need data only for human and mouse, but I would be grateful also for solely human data.
Thank you in advance for your answer.
This problem has been finally solved with help from prof. Kai Tan. It occurred that his lab has been moved from University of Iowa to Children's Hopspital of Philadelphia, so has been this service. Now, the working link is: http://4dgenome.research.chop.edu
Iowa Research Online
Transcriptional enhancers represent the primary basis for differential gene expression. These elements regulate cell type specificity, development, and evolution, with many human diseases resulting from altered enhancer activity. To date, a key gap in our knowledge is how enhancers select specific promoters for activation.
To fill this gap, in this thesis, I first developed an Integrated Method for Predicting Enhancer Targets (IM-PET). Leveraging abundant “omics” data, I devised and characterized multiple genomic features for distinguishing true enhancer-promoter (EP) pairs from non-interacting pairs. I integrated these features into a probabilistic predictor for EP interactions. Multiple validation experiments demonstrated a significant improvement over extent state-of-the-art approaches. Systematic analyses of EP interactions across twelve human cell types reveals global features of EP interactions.
Second, we used a well-established viral infection model to map the dynamic changes of enhancers and super-enhancers during the CD8+ T cell responses. Our analysis illustrated the complexity and dynamics of the underlying EP interactome during cell differentiation. Taking advantage of the predicted EP interactions, we constructed stage-specific transcriptional regulatory networks, which is critical for understanding the regulatory mechanism during CD8+ T cell differentiation.
Third, recent progress in mapping technologies for chromatin interactions has led to a rapid increase in this type of interaction data. However, there is a lack of a comprehensive depository for chromatin interactions identified by all major technologies. To address this problem, we have developed the 4DGenome database through comprehensive literature curation of experimentally derived interactions. We envision a wide range of investigations will benefit from this carefully curated database.
Transcriptional enhancers are arguably the most important class of non-coding regulatory elements in our genome. These elements regulate cell type specificity, development, and evolution, with many human diseases resulting from altered enhancer activity. To date, a key gap in our knowledge is how enhancers select specific promoters for activation.
To fill this gap, I first developed an Integrated Method for Predicting Enhancer Targets (IM-PET), a data integration tool to identify EP pairs. Capitalizing on the wealth of available ENCODE data, I devised multiple genomic features and integrated them probabilistically to make robust predictions of EP pairs. I applied IM-PET algorithm to generate a comprehensive catalog of the EP interactome across multiple human cell types, and revealed global features of EP interactions.
Second, I applied our tools to explore the EP interactomes of three stages during CD8 + T cell differentiation. The analysis illustrated the complexity and dynamics of the underlying EP interactome during cell differentiation. Taking advantage of the predicted EP interactions, we constructed the transcriptional regulatory networks, which is critical for understanding the regulatory mechanism during CD8 + T cell differentiation.
Finally, I developed the 4DGenome database, a general repository for chromatin interactions. A comprehensive depository for chromatin interactions will help the annotation of EP pairs, and facilitate the investigation of genome structure/function relationships.
publicabstract, CD8 T cell, chromatin interaction, computational biology, database, enhancer, transcriptional regulation
4DGenome or another comprehensive database of chromatin interactions - Biology
A (continuously updated) collection of references to Hi-C data. Predominantly human/mouse Hi-C data, with replicates. Please, contribute and get in touch! See MDmisc notes for other programming and genomics-related notes.
3DIV - database of uniformly processed 315 Hi-C datasets, 80 human cell/tissue types. Bait-centric (SNP rsID, gene name, hg19 coordinates) visualization of long-range interactions in context of epigenomic (histone, enhancers) signals, numerical results. Custom BWA-MEM pipeline, Bias, distance effect removed. Coordinates of significant interactions, with annotations, are available for (FTP) download, http://kobic.kr/3div/download
- Yang, Dongchan, Insu Jang, Jinhyuk Choi, Min-Seo Kim, Andrew J Lee, Hyunwoong Kim, Junghyun Eom, Dongsup Kim, Inkyung Jung, and Byungwook Lee. “3DIV: A 3D-Genome Interaction Viewer and Database.” Nucleic Acids Research 46, no. D1 (January 4, 2018)
Chorogenome resource: Processed data (Hi-C, ChIP-seq) for Drosophila, Mouse, Human, http://chorogenome.ie-freiburg.mpg.de/
- Ramírez, Fidel, Vivek Bhardwaj, Laura Arrigoni, Kin Chung Lam, Björn A. Grüning, José Villaveces, Bianca Habermann, Asifa Akhtar, and Thomas Manke. “High-Resolution TADs Reveal DNA Sequences Underlying Genome Organization in Flies.” Nature Communications 9, no. 1 (December 2018).
GITAR: An Open Source Tool for Analysis and Visualization of Hi-C Data - Includes a large collection of standardized processed data from 4D Nucleome. 20 hg38 and 2 mm10 datasets normalized by Yaffe-Tanay method, downloadable, include directionality index, HMM states, TAD analysis results. Text and HDF5 formats. https://www.genomegitar.org/processed-data.html
4DGenome - 3D significant interactions, from different literature sources
All HiC data released by Lieberman-Aiden group. Links to Amazon storage and GEO studies. http://aidenlab.org/data.html
Vian, Laura, Aleksandra Pękowska, Suhas S.P. Rao, Kyong-Rim Kieffer-Kwon, Seolkyoung Jung, Laura Baranello, Su-Chen Huang, et al. “The Energetics and Physiological Impact of Cohesin Extrusion.” Cell 173, no. 5 (May 2018) - Architectural stripes, created by extensive loading of cohesin near CTCF anchors, with Nipbl and Rad21 help. Little overlap between B cells and ESCs. Architectural stripes are sites for tumor-inducing TOP2beta DNA breaks. ATP is required for loop extrusion, cohesin translocation, but not required for maintenance, Replication of transcription is not important for loop extrusion. Zebra algorithm for detecting architectural stripes, image analysis, math in Methods. Human lymphoblastoid cells, mouse ESCs, mouse B-cells activated with LPS, CH12 B lymphoma cells, wild-type, treated with hydroxyurea (blocks DNA replication), flavopiridol (blocks transcription, PolII elongation), oligomycin (blocks ATP). Many other data types (e.g., ChIP-seq, ATAC-seq) GSE82144, GSE98119
Lieberman-Aiden, Erez, Nynke L. van Berkum, Louise Williams, Maxim Imakaev, Tobias Ragoczy, Agnes Telling, Ido Amit, et al. “Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome.” Science (New York, N.Y.) 326, no. 5950 (October 9, 2009) Gm12878, K562 cells. HindIII, NcoI enzymes. Two-three replicates. GSE18199
Rao, Suhas S. P., Miriam H. Huntley, Neva C. Durand, Elena K. Stamenova, Ivan D. Bochkov, James T. Robinson, Adrian L. Sanborn, et al. “A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping.” Cell 159, no. 7 (December 18, 2014) - Human Gm12878, K562, IMR90, NHEC, HeLa cells, Mouse CH12 cells. Different digestion enzymes (HindIII, NcoI, Mbol, DpnII), different dilutions. Up to 35 biological replicates for Gm12878. GSE63525, Supplementary Table S1. Hi-C meta-data
Sanborn, Adrian L., Suhas S. P. Rao, Su-Chen Huang, Neva C. Durand, Miriam H. Huntley, Andrew I. Jewett, Ivan D. Bochkov, et al. “Chromatin Extrusion Explains Key Features of Loop and Domain Formation in Wild-Type and Engineered Genomes.” Proceedings of the National Academy of Sciences of the United States of America 112, no. 47 (November 24, 2015). HAP1, derived from chronic myelogenous leukemia cell line. Replicates. GSE74072
Rao, Suhas S.P., Su-Chen Huang, Brian Glenn St Hilaire, Jesse M. Engreitz, Elizabeth M. Perez, Kyong-Rim Kieffer-Kwon, Adrian L. Sanborn, et al. “Cohesin Loss Eliminates All Loop Domains.” Cell 171, no. 2 (2017) - HCT-116 human colorectal carcinoma cells. Timecourse, replicates under different conditions. GSE104334
Data from multiple studies, in one place, in .cool format: ftp://cooler.csail.mit.edu/coolers
Convert to any other format with cooler https://cooler.readthedocs.io/
Depletion of the cohesin-loading factor Nipbl. Three conditions: wild-type, tamoxifen control and deltaNipbl mice liver. TADs disappear, A/B compartments reinforced, minimal nonspecific effect on gene expression. Disappearing TADs unmask finer level of chromatin organization that is better associated with epigenetic landscape. TADs and compartments are independent types of chromosomal organization, but overlapping. Ideas: Excluding low-coverage bins using MAD-max procedure (Methods). Global compartmentalization. Lavaburst - TAD detection using Filippova method. Tethered Hi-C, H3K4me3, H3K27ac, CTCF, Rad21, Smc3 ChIP-seq and RNA-seq data and visualization, GEO GSE93431
- Schwarzer, Wibke, Nezar Abdennur, Anton Goloborodko, Aleksandra Pekowska, Geoffrey Fudenberg, Yann Loe-Mie, Nuno A. Fonseca, et al. “Two Independent Modes of Chromatin Organization Revealed by Cohesin Removal.” Nature, (02 2017)
Raw and normalized chromatin interaction matrices and TADs defined with DomainCaller. Mouse ES, cortex, Human ES, IMR90 fibroblasts. Two replicates per condition. GEO accession: GSE35156, GSE43070
3D variability between 20 humans, lymphoblastoid cell lines, associated with variation in gene expression, histone modifications, transcription factor binding. Genetic variation (SNPs) is associated with loop strength, contact insulation, directionality, density of local contacts, SNPs in CTCF binding sites - QTLs for these. WASP approach to address allelic mapping biases, HiCNorm normalization to remove GC, mappability, fragment length biases, BNBC quantile normalization across samples. 40kb data, detecting A/B compartments (PC1), directionality index (DI), insulation score (INS), frequently interacting regions (FIRE score). Variability detected using limma:eBayes function. IWH for multiple testing correction. Power calculation for QTL detection in Hi-C data. Data and code: Hi-C BAM files, matrices, full QTL results, 3D variable regions, SNPs at http://renlab.sdsc.edu/renlab_website/download/iqtl/, http://renlab.sdsc.edu/iQTL/
- Gorkin, David U., Yunjiang Qiu, Ming Hu, Kipper Fletez-Brant, Tristin Liu, Anthony D. Schmitt, Amina Noor, et al. “Common DNA Sequence Variation Influences 3-Dimensional Conformation of the Human Genome.” Preprint. Genomics, March 30, 2019.
Normal human cells, brain (dorsolateral prefrontal cortex, hippocampus), adrenal, bladder, lung, ovary, pancreas, etc. 21 human cell lines and primary tissues. Some replicates. GSE87112. Used in HiCDB paper
- Schmitt, Anthony D., Ming Hu, Inkyung Jung, Zheng Xu, Yunjiang Qiu, Catherine L. Tan, Yun Li, et al. “A Compendium of Chromatin Contact Maps Reveals Spatially Active Regions in the Human Genome.” Cell Reports 17, no. 8 (November 2016)
Dixon, Jesse R., Siddarth Selvaraj, Feng Yue, Audrey Kim, Yan Li, Yin Shen, Ming Hu, Jun S. Liu, and Bing Ren. “Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions.” Nature 485, no. 7398 (April 11, 2012)
Jin, Fulai, Yan Li, Jesse R. Dixon, Siddarth Selvaraj, Zhen Ye, Ah Young Lee, Chia-An Yen, Anthony D. Schmitt, Celso A. Espinoza, and Bing Ren. “A High-Resolution Map of the Three-Dimensional Chromatin Interactome in Human Cells.” Nature 503, no. 7475 (November 14, 2013)
- - Details of the 50 cancer Hi-C datasets, references to GEO and 4DNucleome. - Coordinates (hg38) of large SVs detected in each sample. - Genomic coordinates of the detected neoloops in each sample. - List of neoloop-involved genes identified in each sample. - List of the annotated enhancer-hijacking events in 11 cancer cell lines: A549, K562, LNCaP, MCF7, T47D, HepG2, SK-MEL-5, NCI-H460, PANC-1, HT-1080 and C4-2B.
3D Genome Browser - Classical datasets for TAD/loop identification, provided as raw and normalized matrices, genomic coordinates of TADs/loops, tools for various 3C data analysis.
Iyyanki, Tejaswi. “Subtype-Associated Epigenomic Landscape and 3D Genome Structure in Bladder Cancer,” Genome Biology, 15 April 2021 - 3D genomics of bladder cancer. 4 cancer cell lines (luminal: RT4 and SW780 basal: SCABER and HT1376), 5 patients. H3K27ac ChIP-seq, RNA-seq (DESeq2), ATAC-seq (TCGA), Hi-C data (Arima, hg19). Peakachu for loop prediction, CNVs with HiNT and Hi-Cbreakfinder.
- - downloadable data from key chromosome conformation capture papers. - alphabetical list of Hi-C software.
Changes in 3D genome are associated with CNVs in multiple myeloma cells (RPMI-8226 trt- and tetraploid, U266 nearly diploid). The number of TADs increases by
20% switch compartment. ICE normalization better accounts for CNVs than HiCNorm. CNV breakpoints overlap with TAD boundaries. 40kb resolution, replicates. Code, Hi-C, WGS, RNA-seq data GSE87585
- Wu, Pengze, Tingting Li, Ruifeng Li, Lumeng Jia, Ping Zhu, Yifang Liu, Qing Chen, Daiwei Tang, Yuezhou Yu, and Cheng Li. “3D Genome of Multiple Myeloma Reveals Spatial Genome Disorganization Associated with Copy Number Variations.” Nature Communications 8, no. 1 (December 2017)
BRCA gene targets regulated by SNPs - Capture-C of chromatin interactions centered on causal variants and promoters of causal genes (Variant- and Promoter Capture Hi-C) in six human mammary epithelial (B80T5, MCF10A) and breast cancer (MCF7, T47D, MDAMB231, Hs578T) cell lines. HindIII fragments, CHiCAGO and Peaky for significant interaction calling. PCA on interactions separates cell types, significant interactions enriched in epigenomic elements. 651 target genes at 139 independent breast cancer risk signals. Table 1 - top priority target genes. HiCUP-processed capture Hi-C data (hg19), code, Supplementary tables, Tables S11 - 651 target genes,
- Beesley, Jonathan, Haran Sivakumaran, Mahdi Moradi Marjaneh, Luize G. Lima, Kristine M. Hillman, Susanne Kaufmann, Natasha Tuano, et al. “Chromatin Interactome Mapping at 139 Independent Breast Cancer Risk Signals.” Genome Biology 21, no. 1 (December 2020)
Curtaxins drugs affect 3D genome by DNA intercalation but without inducing DNA damage, compromise enhancer-promoter interactions, suppress oncogene expression, including MYC family genes, downregulates survival genes, partially disrupt TAD borders, decreases short-range interactions, the level of spatial segregation of the A/B compartments, depletes CTCF but not other factors. Hi-C in HT1080 fibrosarcoma cells. Data: Hi-C and CTCF ChIP-seq in duplicates GSE122463, gene expression in MM1.S and HeLa S3 cells GSE117611, H3K27ac GSE117409, nascent RNA transcription GSE107633
- Kantidze, Omar L., Artem V. Luzhin, Ekaterina V. Nizovtseva, Alfiya Safina, Maria E. Valieva, Arkadiy K. Golov, Artem K. Velichko, et al. “The Anti-Cancer Drugs Curaxins Target Spatial Genome Organization.” Nature Communications 10, no. 1 (December 2019).
3D genomics of glioblastoma. Replicate samples from three patients. Sub-5kb-resolution Hi-C data, integration with ChIP- and RNA-seq. Data: Six Hi-C replicates, EGAS00001003493, ChIP-seq GSE121601, RNA-seq data EGAS00001003700. Processed data
- Johnston, Michael J., Ana Nikolic, Nicoletta Ninkovic, Paul Guilhamon, Florence M.G. Cavalli, Steven Seaman, Franz J. Zemp, et al. “High-Resolution Structural Genomics Reveals New Therapeutic Vulnerabilities in Glioblastoma.” Genome Research 29, no. 8 (August 2019)
Ten non-replicated Hi-C datasets. Two human lymphoblastoid cell lines with known chromosomal translocations (FY1199 and DD1618), transformed mouse cell line (EKLF), six human brain tumours: five glioblastomas ( GB176, GB180, GB182, GB183 and GB238) and one anaplastic astrocytoma (AA86), a normal human cell line control (GM07017). GSE81879
Harewood, Louise, Kamal Kishore, Matthew D. Eldridge, Steven Wingett, Danita Pearson, Stefan Schoenfelder, V. Peter Collins, and Peter Fraser. “Hi-C as a Tool for Precise Detection and Characterisation of Chromosomal Rearrangements and Copy Number Variation in Human Tumours.” Genome Biology 18, no. 1 (December 2017).
Prostate cancer, normal. RWPE1 prostate epithelial cells transfected with GFP or ERG oncogene. Two biological and up to four technical replicates. GSE37752
- Rickman, David S., T. David Soong, Benjamin Moss, Juan Miguel Mosquera, Jan Dlabal, Stéphane Terry, Theresa Y. MacDonald, et al. “Oncogene-Mediated Alterations in Chromatin Conformation.” Proceedings of the National Academy of Sciences of the United States of America 109, no. 23 (June 5, 2012)
Taberlay, Phillippa C., Joanna Achinger-Kawecka, Aaron T. L. Lun, Fabian A. Buske, Kenneth Sabir, Cathryn M. Gould, Elena Zotenko, et al. “Three-Dimensional Disorganization of the Cancer Genome Occurs Coincident with Long-Range Genetic and Epigenetic Alterations.” Genome Research 26, no. 6 (June 2016)
Cancer, normal Hi-C. Prostate epithelial cells, PC3, LNCaP. Two-three replicates. GSE73785
Breast cancer. Epithelial (MCF-10A) and breast cancer (MCF-7) cells. Tumor vs. normal comparison, replicate comparison. Two replicates for each. GSE66733. The data was reanalyzed in Fritz, Andrew J., Prachi N. Ghule, Joseph R. Boyd, Coralee E. Tye, Natalie A. Page, Deli Hong, David J. Shirley, et al. “Intranuclear and Higher-Order Chromatin Organization of the Major Histone Gene Cluster in Breast Cancer.” Journal of Cellular Physiology 233, no. 2 (February 2018) GSE98552
Breast cancer. T47D-MTLV cell line. 3D response to progesterone, integrative analysis, effect of cutting enzymes. Hi-C at 0h and 1h time points, with different enzymes. RNA-seq and ChIP-Seq available. No replicates. GSE53463
Breast cancer. MCF-7 cell line. 3D response to estrogen, time course (0, 0.5h, 1h, 4h, 24h), replicate comparison. GSE51687
- Tordini, Fabio, Marco Aldinucci, Luciano Milanesi, Pietro Liò, and Ivan Merelli. “The Genome Conformation As an Integrator of Multi-Omic Data: The Example of Damage Spreading in Cancer.” Frontiers in Genetics 7 (November 15, 2016).
- ChIA-PET loops and gene expression in 24 human cell types. RAD21, H3K27ac, RNA-seq. 28% of loops are variable, distinguish cells by tissue of origin, shorter, depleted of housekeeping genes, coincide with different chromatin states. Genes that have more interactions are depleted in housekeeping functions and enriched for pathogenic variants. Supplementary material has hg19 coordinates of RAD21 peaks, Pan-cell type cohesin-mediated chromatin loops, H3K27ac peaks, and more
- Grubert, Fabian, Rohith Srivas, Damek V Spacek, Maya Kasowski, Mariana Ruiz-Velasco, Nasa Sinnott-Armstrong, Peyton Greenside, et al. “Landscape of Cohesin-Mediated Chromatin Loops in the Human Genome.” Nature 583, no. 7818 (July 2020)
Search query for any type of Hi-C data, e.g., human brain Hi-C
Won, Hyejung, Luis de la Torre-Ubieta, Jason L. Stein, Neelroop N. Parikshak, Jerry Huang, Carli K. Opland, Michael J. Gandal, et al. “Chromosome Conformation Elucidates Regulatory Relationships in Developing Human Brain.” Nature, (October 27, 2016) - Two brain regions: the cortical and subcortical plate (CP), consisting primarily of post-mitotic neurons and the germinal zone (GZ), containing primarily mitotically active neural progenitors. Three replicates per condition. GEO GSE77565. Controlled access.
Bonev, Boyan, Netta Mendelson Cohen, Quentin Szabo, Lauriane Fritsch, Giorgio L. Papadopoulos, Yaniv Lubling, Xiaole Xu, et al. “Multiscale 3D Genome Rewiring during Mouse Neural Development.” Cell, (October 2017)
- Data: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE96107. Four HiC replicates in each condition. Mouse embryonic stem cells (ESs), neural progenitors (NPCs), and cortical neurons (CNs), purified NPC and CN populations from neocortex (ncx_NPC, ncx_CN). Replicated RNA-seq and ChIP-seq (H3K4me3, H4K9me3, H3K27ac, H3K36me3).
- [Bonev-Cavalli_mmc1.xlsx] - Table S1. Summary Statistics for the Datasets, http://www.cell.com/cms/attachment/2111760282/2083800642/mmc1.xlsx
Fraser, J., C. Ferrai, A. M. Chiariello, M. Schueler, T. Rito, G. Laudanno, M. Barbieri, et al. “Hierarchical Folding and Reorganization of Chromosomes Are Linked to Transcriptional Changes in Cellular Differentiation.” Molecular Systems Biology, (December 23, 2015)
- - mouse embryonic stem cells (ESC), neuronal progenitor cells (NPC) and neurons. Two datasets per cell type, digested using HindIII and NcoI enzymes. Genomic coordinates for TADs identified from NcoI datasets
5C libraries generated in Beagan et al. in pluripotent mouse ES cells and multipotent neural progenitor cells were downloaded from GEO accession numbers GSM1974095, GSM1974096, GSM1974099, and GSM1974100 (Beagan et al. 2016). GEO GSE68582
Haarhuis, Judith H.I., Robin H. van der Weide, Vincent A. Blomen, J. Omar Yáñez-Cuna, Mario Amendola, Marjon S. van Ruiten, Peter H.L. Krijger, et al. “The Cohesin Release Factor WAPL Restricts Chromatin Loop Extension.” Cell, (May 2017) - WAPL, cohesin's antagonist, DNA release factor, restricts loop length and prevents looping between incorrectly oriented CTCF sites. Together with SCC2/SCC4 complex, WAPL promotes correct assembly of chromosomal structures. WAPL WT and KO Hi-C, RNA-seq, ChIP-seq for CTCF and SMC1. Also, SCC4 KO and combined SCC4-WAPL KO Hi-C. Potential role of WAPL in mitosis chromosome condensation. Tools: HiC-Pro processing, HICCUPS, HiCseq, DI, SomaticSniper for variant calling. Data (Hi-C in custom paired BED format) : GEO GSE95015
Grubert, Fabian, Judith B. Zaugg, Maya Kasowski, Oana Ursu, Damek V. Spacek, Alicia R. Martin, Peyton Greenside, et al. “Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions.” Cell, (August 2015) - seven Hi-C replicates on Gm12878 cell line, GEO GSE62742
Naumova, Natalia, Maxim Imakaev, Geoffrey Fudenberg, Ye Zhan, Bryan R. Lajoie, Leonid A. Mirny, and Job Dekker. “Organization of the Mitotic Chromosome.” Science (New York, N.Y.), (November 22, 2013) - E-MTAB-1948 - 5C and Hi-C chromosome conformation capture study on metaphase chromosomes from human HeLa, HFF1 and K562 cell lines across the cell cycle. Two biological and two technical replicates. ArrayExpress E-MTAB-1948
Jessica Zuin et al., “Cohesin and CTCF Differentially Affect Chromatin Architecture and Gene Expression in Human Cells,” Proceedings of the National Academy of Sciences of the United States of America, (January 21, 2014) - CTCF and cohesin (RAD21 protein) are enriched in TAD boundaries. Depletion experiments. Different effect on inter- and intradomain interactions. Loss of cohesin leads to loss of local interactions, but TADs remained. Loss of CTCF leads to both loss of local and increase in inter-domain interactions. Different gene expression changes. TAD structures remain largely intact. Data: Hi-C, RNA-seq, RAD21 ChIP-seq for control and depleted RAD21 and CTCF in HEK293 hepatocytes. Two replicates in each condition. GEO GSE44267
tagHi-C protocol for low-input tagmentation-based Hi-C. Applied to mouse hematopoiesis 10 major blood cell types. Changes in compartments and the Rabl configuration defining chromatin condensation. Gene-body-associating domains are a general property of highly-expressed genes. Spatial chromatin loops link GWAS SNPs to candidate blood-phenotype genes. HiC-Pro to Juicer. GEO GSE142216 - RNA-seq, replicates, GEO GSE152918 - tagHi-C data, replicates, combined .hic files
Single-nucleus Hi-C data (scHi-C) of 88 Drosophila BG3 cells. 2-5M paired-end reads per cell, 10kb resolution. ORBITA pipeline to eliminate the effect of Phi29 DNA polymerase template switching. Chromatin compartments approx. 1Mb in size, non-hierarchical conserved TADs can be detected. Lots of biology, integration with other omics data. Raw and processed data in .cool format at GEO GSE131811
- Ulianov, Sergey V., Vlada V. Zakharova, Aleksandra A. Galitsyna, Pavel I. Kos, Kirill E. Polovnikov, Ilya M. Flyamer, Elena A. Mikhaleva, et al. “Order and Stochasticity in the Folding of Individual Drosophila Genomes.” Nature Communications 12, no. 1 (December 2021)
TADs in Drosophila, Hi-C and RNA-seq in four cell lines of various origin. dCTCF, SMC3, and Su(Hw) are weakly enriched at TAD boundaries. Transcription and active chromatin (H3K27ac, H3K4me1, H3K4me3, H3K36me3, H4K16ac) are associated with TAD boundaries. Also, BEAF-32 and CP190. Hierarchical TADs. Housekeeping genes tend to be near TAD boundaries and in inter-TAD regions. TAD boundary prediction using regression, modeling to associate TADs with bands, investigation of the hierarchy. Heavy use of the Armatus TAD caller. RNA-seq and replicate Hi-C data, high correlation, merged into 20kb resolution. GEO GSE69013
- Ulianov, Sergey V., Ekaterina E. Khrameeva, Alexey A. Gavrilov, Ilya M. Flyamer, Pavel Kos, Elena A. Mikhaleva, Aleksey A. Penin, et al. “Active Chromatin and Transcription Play a Key Role in Chromosome Partitioning into Topologically Associating Domains.” Genome Research 26, no. 1 (January 2016)
3D chromatin reorganization during different types of cellular senescence, replicative (RS) and oncogene-induced (OIS over time course). Senescence-associated heterochromatin loci (SAHFs), formed with the help of DNMT1 via regulation of MMGA2 expression. WI38 primary fibroblasts. OIS - gain in long-range contacts. diffHiC analysis, differential regions enriched in H3K9me3. TADkit for 3D modeling, visualization. Data (Hi-C replicates, different conditions, timecourse, H3K4me3/H3K9me3/H3K27ac ChIP-seq, RNA-seq) GEO GSE130306
- Sati, Satish, Boyan Bonev, Quentin Szabo, Daniel Jost, Paul Bensadoun, Francois Serra, Vincent Loubiere, et al. “4D Genome Rewiring during Oncogene-Induced and Replicative Senescence.” Molecular Cell, March 2020
X chromosome sex differences in Drosophila. Male X chromosome has two-fold upregulation of gene expression, more mid/long-range interactions, weaker boundaries marked by BEAF-32, CP190, Chromator, and CLAMP, a dosage compensation complex cofactor. Less negative slope in distance-dependent decay of interactions, less clustered top scoring interactions (more randomness), more open structure overall. Local score differentiator (LSD-score) to call differential TAD boundaries in CNV-independent manner - more non-matching boundaries than autosomes,
35% disappearing boundaries. Enrichment in epigenomic marks identified stronger boundary association with MSL (male-specific lethal complex) and CLAMP binding. Many other experimental observations. hiclib, hicpipe processing. R implementation of LSD differential TAD analysis, Hi-C data in bedGraph format GEO GSE94115, Tweet
- Pal, Koustav, Mattia Forcato, Daniel Jost, Thomas Sexton, Cédric Vaillant, Elisa Salviato, Emilia Maria Cristina Mazza, Enrico Lugli, Giacomo Cavalli, and Francesco Ferrari. “Global Chromatin Conformation Differences in the Drosophila Dosage Compensated Chromosome X.” Nature Communications, (December 2019)
Hi-C TAD comparison between normal prostate cells (RWPE1) and two prostate cancer cells (C42B, 22Rv1). TADs (TopDom-called) become smaller in cancer, switch epigenetic states. FOXA1 promoter has more loop anchors in cancer. Androgen receptor (AR) locus has chromatin structure changed around it (Figure 6). Loop investigation called with Fit-HiC, motifs (NOMe-seq) enriched in loop-associated enhancers different between normal and cancer. HiTC visualization. Figure 1a, Supplementary Figure 3, 5 - examples/coordinates of TAD boundary/length changes.
Data For RWPE1, C42B, 22Rv1 cell lines: GEO GSE118629. In situ Hi-C, 4-cutter MboI, replicated, text-based sparse matrices at 10kb and 40kb resolution, raw and ICE-normalized, hg19. H3K9me3, H3K27me3, H3K36me3, RNA-seq.
Supplementary data: Data 2 - TAD coordinates and annotations Data 3 - differentially expressed genes in smaller TADs Data 4 - gene expression changes in TADs switching epigenomic state Data 5 - enhancer-promoter loops Data 6 - coordinates of nucleosome-depleted regions Data 7 - all differentially expressed genes Data 8 - target genes of FOXA1-bound enhancers Data 9 - overexpressed genes with more enhancer-promoter loops
- Rhie, Suhn Kyong, Andrew A. Perez, Fides D. Lay, Shannon Schreiner, Jiani Shi, Jenevieve Polin, and Peggy J. Farnham. “A High-Resolution 3D Epigenomic Map Reveals Insights into the Creation of the Prostate Cancer Transcriptome.” Nature Communications, (December 2019)
DNA methylation linked with 3D genomics. Methylation directs PRC-dependent 3D organization of mouse ESCs. Hypomethylation in mouse ESCs driven to naive pluripotency in two inhibitors (2i) is accopmanied by redistribution of polycomb H3K27me3 mark and decompaction of chromatin. Focus on HoxC, HoxD loci. Hi-C data processed with distiller and other cool-related tools. RNA-seq, H3K37me3 ChIPseq of Mouse ESCs grown in serum and 2i conditions. Hi-C data in replicates GEO GSE124342
- McLaughlin, Katy, Ilya M. Flyamer, John P. Thomson, Heidi K. Mjoseng, Ruchi Shukla, Iain Williamson, Graeme R. Grimes, et al. “DNA Methylation Directs Polycomb-Dependent 3D Genome Re-Organization in Naive Pluripotency.” Cell Reports 29, no. 7 (November 2019)
RNA transcription inhibition minimally affects TADs, weakens TAD boundaries. K562, RNAse inhibition before/after crosslinking (bXL/aXL), actinomycin D (complete transcriptional arrest) treatment. Processing using cword, 40kb resolution. Data with replicates of each condition, GEO GSE114337
Comparison of the 3D structure of human and chimpanzee induced puripotent stem cells. Lower-order pairwise interactions are relatively conserved, but higher-order, such as TADs, differ. HiCUP and HOMER for Hi-C data processing to 10kb resolution. cyclic loess normalization, limma for significant interaction definition, Arrowhead on combined replicated wot detect TADs. Association of differential chromatin interactions with gene expression. PyGenomeTracks for visualization. Workflowr code, Processed Hi-C data (4 human and 4 chimp iPSCs) GEO GSE122520
- Eres, Ittai E., Kaixuan Luo, Chiaowen Joyce Hsiao, Lauren E. Blake, and Yoav Gilad. “Reorganization of 3D Genome Structure May Contribute to Gene Regulatory Evolution in Primates.” PLOS Genetics 15, no. 7 (July 19, 2019)
In situ HiC libraries in biological replicates (n=2) for several hematopoietic celltypes (200mio reads per replicate) with a focus on the B cell lineage in mice. The authors investigate the role of the transcription factor Pax5 towards its supervisiory role of organizing the 3D genome architecture throughout B cell differentiation. The raw data are available via GEO GSE99151
- Timothy M. Johanson, Aaron T. L. Lun, Hannah D. Coughlan, Tania Tan, Gordon K. Smyth, Stephen L. Nutt & Rhys S. Allan. "Transcription-factor-mediated supervision of global genome architecture maintains B cell identity." Nature Immunology, (2018)
DNA loop changes during macrophage development (THP-1 monocyte to macrophage development under 72h PMA treatment). In situ Hi-C (pbn reads, 10kb resolution), RNA-seq, ATAC-seq, CTCF and H3K27ac ChIP-seq. Formation of multi-hubs at key macrophage genes. Differential (dynamic, DESeq2-detected) loops are enriched for AP-1, more enriched in H3K27ac, in contrast to static loops. Association between local H3K27ac and transcription level with distal DNA elements with elevated H3K27ac. Very few genes and lower H3K27ac signal in lost loops, more genes and H3K27ac signal in gained loops. Fold changes in H3K27ac signal positively correlate with DNA looping. Macrophage development-specific gene ontology enrichments. Network analysis for multi-loop multi-enhancer activation hubs identification. GEO GSE96800 ChIP-seq, ATAC-seq, RNA-seq, Two Hi-C samples, THP-1 PMA-treated and untreated, SRA PRJNA385337.
- Table S1 . DNA Loops in Untreated THP-1 Cells, 16067. Text, hg19 genomic coordinates, columns: anchor1_chrom anchor1_start anchor1_end anchor2_chrom anchor2_start anchor2_end sample -log10(P) anchor1_strand anchor2_strand
- Table S2 . DNA Loops in PMA-Treated THP-1 Cells, 16335.
- Table S3 . Differential Loops
Vara, Covadonga, Andreu Paytuví-Gallart, Yasmina Cuartero, François Le Dily, Francisca Garcia, Judit Salvà-Castro, Laura Gómez-H, et al. “Three-Dimensional Genomic Structure and Cohesin Occupancy Correlate with Transcriptional Activity during Spermatogenesis.” Cell Reports, (July 2019) - 3D structure changes during spermatogenesis in mouse. Hi-C, RNA-seq, CTCF/REC8/RAD21L ChIP-seq. Description of biology of each stage (Fibroblasts, spermatogonia, leptonema/zygonema, pachynema/diplonema, round spermatids, sperm), and A/B compartment and TAD analysis (TADbit, insulation score), data normalized with ICE. Integration with differential expression. Changes in distribution of CTCF and cohesins (REC8 and RAD21L). Key tools: BBDuk (BBMap), TADbit, HiCExplorer, HiCRep, DeepTools. Data (no replicates) GEO GSE132054
Paulsen, Jonas, Tharvesh M. Liyakat Ali, Maxim Nekrasov, Erwan Delbarre, Marie-Odile Baudement, Sebastian Kurscheid, David Tremethick, and Philippe Collas. “Long-Range Interactions between Topologically Associating Domains Shape the Four-Dimensional Genome during Differentiation.” Nature Genetics, April 22, 2019 - Long-range TAD-TAD interactions form cliques (>3 TAD interacting) are enriched in B compartments and LADs, downregulated gene expression. Graph representation of TAD interactions. Quantifying statistical significance of between-TAD interactions. TAD boundaries are conserved. TAD cliques are dynamic. Permutation test preserving distances. Armatus for TAD detection. hiclib for data processing, Juicebox for visualization. Data: Time course differentiation or human adipose stem cells (day 0, 1, and 3). Hi-C (two replicates), Lamin B1 ChIP-seq, H3K9me3. GEO GSE109924. Also used mouse ES differentiation (Bonev 2017), mouse B cell reprogramming (Stadhouders 2018), scHi-C (Nagano 2017)
Du, Zhenhai, Hui Zheng, Bo Huang, Rui Ma, Jingyi Wu, Xianglin Zhang, Jing He, et al. “Allelic Reprogramming of 3D Chromatin Architecture during Early Mammalian Development.” Nature, (12 2017) - Developmental time course Hi-C. Mouse early development. low-input Hi-C technology (sisHi-C). TADs are initially absent, then gradually appeared. HiCPro mapping, Pearson correlation on low-resolution matrices, allele resolving. Data: GEO GSE82185
Hug, Clemens B., Alexis G. Grimaldi, Kai Kruse, and Juan M. Vaquerizas. “Chromatin Architecture Emerges during Zygotic Genome Activation Independent of Transcription.” Cell, (06 2017) - TADs appearing during zygotic genome activation, independent of transcription. TAD boundaries are enriched in housekeeping genes, colocalize in 3D. Drosophila. Insulation score for boundary detection. Overlap analysis of TAD boundaries. Processed Hi-C matrices at 5kb resolution (replicates merged, .cool format) and TAD boundaries at nuclear cycle 12, 13, 14, and 3-4 hours post fertilization
Ke, Yuwen, Yanan Xu, Xuepeng Chen, Songjie Feng, Zhenbo Liu, Yaoyu Sun, Xuelong Yao, et al. “3D Chromatin Structures of Mature Gametes and Structural Reprogramming during Mammalian Embryogenesis.” Cell, (July 13, 2017) - 3D timecourse changes during embryo development, from zygotic (no TADs, many long-range interactions) to 2-, 4-, 8-cell, blastocyst and E7.5 mature embryos (TADs established after several rounds of DNA replication). A/B compartments associated with un/methylatied CpGs, respectively. PC1, directionality index, insulation score to define compartments and TADs, these metrics increase in magnitude/strength during maturation. Enrichment in CTCF, SMC1, H3K4me3, H3K27ac, H3K9ac, H3K4me1, depletion in H3K9me3, H3K36me3, H3K27me3. The compartment strength is weaker in maternal vs. paternal genomes. Covariance for each gene vs. boundary score across the timecourse. Relative TAD intensity changes. Hi-C and RNA-seq data at different stages, some replicates
SIPs, super-interactive promoters in five hematopoietic cell types (Erythrocyte, Macrophage/monophage, megakaryocyte, naive CD4 T-cells, Neutrophils). Reanalysis of promoter-capture Hi-C data from Javierre et al., “Lineage-Specific Genome Architecture Links Enhancers and Non-Coding Disease Variants to Target Gene Promoters.” study. CHiCAGO pipeline. Promoter-interacting regions (PIRs) interacting with SIPs are more enriched in cell type-specific ATAC-seq peaks, GWAS variants for relevant cell types. SIP-associated genes are higher expressed in relevant cells. Some SIPs are shared across cell lines. Super-SIPs.
- - Cell type-specific SIPs and genes. - Cell type-specific SIPs and GWAS variants
- Lagler, Taylor M., Yuchen Yang, Yuriko Harigaya, Vijay G. Sankaran, Ming Hu, Alexander P. Reiner, Laura M. Raffield, Jia Wen, and Yun Li. “Super Interactive Promoters Provide Insight into Cell Type-Specific Regulatory Networks in Blood Lineage Cell Types.” Preprint. Genetics, March 16, 2021.
- Nasser, Joseph, Drew T Bergman, Charles P Fulco, Philine Guckelberger, Benjamin R Doughty, Tejal A Patwardhan, Thouis R Jones, et al. “Genome-Wide Maps of Enhancer Regulation Connect Risk Variants to Disease Genes,” bioRxiv, September 03, 2020.
- SGC Epigenetic Chemical Probes: A list of c hemical probes that inhibit or antagonize proteins involved in epigenetic signaling, They’re made available to the research community with no restriction on use.
Genome-wide maps linking disease variants to genes. Activity-By-Contact (ABC) Model. 72 diseases and complex traits (non-specific, no psychiatric), linking 5046 fine-mapped GWAS signals to 2249 genes. 577 genes influence multiple phenotypes. Nearly half enhancers regulate multiple genes.Table S7 - Summary of diseases and traits.Table S9 - ABC-Max predictions for 72 diseases and complex traits.
Promoter-enhancer predictions in 131 cell types and tissues using the Activity-By-Contact (ABC) Model, based on chromatin state (ATAC-seq) and 3D folding (consensus Hi-C). ABC model assumes an element’s quantitative effect on a gene should depend on its strength as an enhancer (Activity) weighted by how often it comes into 3D contact with the promoter of the gene (Contact), and that the relative contribution of an element on a gene’s expression (as assayed by the proportional decrease in expression following CRISPR-inhibition) should depend on that element’s effect divided by the total effect of all elements. Outperforms distance-based methods, 3D-based only, machine learning approaches. Enhancer-promoter predictions for GM12878, K562, liver, LNCAP, mESCs, NCCIT cells, more at Engreitz Lab page. GitHub repository broadinstitute/ABC-Enhancer-Gene-Prediction.
Construction and content
Principles used for quality assessment
LOGIQA is based on the principles applied by the NGS-QC Generator to compute quality descriptors  specifically this involves the assessment of multiple random samplings over long-range interaction readouts to infer numerical local and global quality scores (Fig. 1). In fact, the working hypothesis is that under ideal conditions, the reconstructed chromatin interaction maps from a subset of the mapped paired-end tags (PETs) should present the same patterns than those observed in the original map. Obviously, multiple factors can lead to a deviation from this optimal situation one of them is the sequencing depth. Indeed, sequencing depths below a “saturation point”, as previously described for ChIP-sequencing assays, will lead to a decreased accuracy of chromatin interaction patterns. Importantly, applying this concept to long-range chromatin interaction assays provides a direct relationship between the sequencing depth and the confidence in predicting chromatin interactions. This confidence is herein referred to as the quality of the dataset under study.
Principles in use for Quality Assessment. Total mapped paired-end tags (PETs) are first classified in intra-chromosomal and inter-chromosomal events. For quality assessment, only intra-chromosomal PETs spanning genome distances longer than 10 kb - referred here as filtered PETs - are considered. Random sub-sampling generates PET subsets corresponding to 90, 70 and 50 % of the original filtered PETs and the numbers of PETs in 5 kb or 25 kb size genomic windows is quantified. By comparing each of the PET counts/window in the various random subsets with that observed on the original dataset, the fraction of recovered PET counts (recPETs) after random sub-sampling and the dispersion from the theoretically expected values are calculated. Note that the expected values correspond to a decrease in the number of PET counts per window that is proportional to the random sub-sampling (e.g. recPETs/window =50 % when 50 % of filtered PETs are random sub-sampled). By evaluating the fraction of genomic windows with recPET count dispersions lower than a defined confidence interval (default value 10 %) global quality descriptors like the density and similarity quality indicators (denQCi, and simQCi respectively), as well as the global QCscore are computed. Overall these quality descriptors reflect the fractions of the observed long-range chromatin interactions (>10 kb), which are considered reproducible. On top of the panel: a chromatin interaction map derived from a HiC assay is depicted on the context of the observed PET counts (heatmap scale). On the bottom: After LOGIQA data treatment, the chromatin interaction map displays the inferred PET counts dispersion (in percent heatmap scale). Notably, the bottom panel recapitulates the genomic contacts observed on the top panel, but in addition it provides a further information concerning their reproducibility over the multiple random sub-sampling assays accomplished during quality assessment
Technically, we first selected unique PETs (excluding potential PCR-generated “clonal” reads), which participate in intra-chromosomal interactions longer than 10 kb. We thereby excluded PETs resulting from short-range chromatin interactions, which dominate chromatin interactomes (forming the diagonal in interaction maps) and would bias the quality assessment due to their over-representation. Indeed, Removal of PETs spanning >10 kb or >25 kb led to a direct correlation between the amounts of PETs per dataset and their associated QCscores (Additional file 1: Figure S1A). This correlated also with an improved visual quality and visibility of Topologically Associating Domains (TADs) in chromatin interaction maps (Additional file 1: Figure S1B). Next we established randomly sampled interaction PET subsets for defined fractions of the original population (90 %, 70 %, 50 % described hereafter as s90, s70 or s50). After random sampling, intra-chromosomal interaction maps were reconstructed by assessing the number of PET counts within 5 kb or 25 kb bins. These two analytical windows enable quality assessment at two different resolutions and facilitate the comparison of different types of datasets this concerns particularly HiC assays that are generated with different restriction enzymes or ChIA-PET assays involving sonication-sheared chromatin.
Finally, global and local quality scores were computed by comparing the recovered PET counts per 5 kb or 25 kb bin after random sampling with those observed in the original dataset (Fig. 2a).
Assessing quality descriptors over long-range genome interaction assays. a Scatter-plot illustrating the fraction of PET counts recovered after random subsampling (Y-axis) relative to the original PET counts in 5 kb genome windows (X-axis). Note that genome windows with high PET counts contain PET levels close to the expected value in contrast, the lower the PET counts, the higher is the deviation from this theoretically expected level. b Recovery scatter-plots assessed from datasets with increasing PET count levels (from 100 to 500 millions). Note that we generated these datasets by random sub-sampling of a large metafile (>600 million reads). c QCscores computed from datasets presenting increasing PET count levels (from 100 to 500 millions). The illustrated QCscores, computed from five independent replicates, present variation coefficients below 3 % (see Additional file 1: Figure S2). d Local displays illustrating chromatin interactions (chromosome 6, mm9) evaluated in the context of PET count dispersion levels (percentage) per genomic window (5 kb) relative to the expected recovery levels. Note that short-range genomic interactions (diagonal) show the lowest dispersion levels
Computing local and global quality indicators
Technically quality assessment is performed by first computing the recovered PET counts after random sampling as follows:
where samPETcounts correspond to PET counts assessed after random sampling and oPETcounts correspond to those retrieved with the original dataset. Then it is used for computing the difference between the observed recovered PET counts after random sampling relative to that ideally expected (samd which is equivalent to the random sampling density (90 %, 70 % or 50 %)):
The recovered PET count dispersion (δPETcounts) per genomic window is referred to as the local QC indicator, such that each evaluated genomic region (5 kb or 25 kb window) can be expressed by this quantitative readout assessed for a given random sampling subset analysis. Importantly, representing genome interaction maps in the context of PET count dispersions (δPETcounts) transforms the display into a uniform scale for comparing datasets generated at variable PET sequencing levels (e.g. PET count dispersion: 5-50 %).
Finally, while δPETcounts interaction maps provide a visual display of the quality associated to a given genomic region, they do not allow evaluation of the quality of the entire dataset. Therefore, we defined the following global quality descriptors:
Density quality indicators (denQCi)
The fraction of genomic regions (5 kb or 25 kb window) in the random sampled datasets presenting δPETcounts lower than a defined threshold which in the context of this study has been fixed at 10 %. Specifically, LOGIQA presents denQCi values computed for 90 %, 70 % and 50 % random samplings (denQC.90, denQC.70 and denQC.50 respectively).
Similarity quality indicators (simQCi)
The ratio between two denQCis is used to evaluate their degree of similarity. Specifically, LOGIQA presents simQCi values computed for denQC.90 and denQC.70 relative to denQC.50 (simQC.90/50 and simQC.70/50 respectively).
Note that denQCi aims at quantifying the proportion of genomic regions that fluctuates in less than 10 % for a given random sampling. In fact, an s90 random sampling presents generally less variation from the original dataset, while the s50 subset will have the highest deviation. The simQCi measures the relative difference between denQC indicators computed at different random sub-sampling conditions. For instance, simQC.90/50 compares the denQC at 90 % to that computed at 50 % sub-sampling. In an ideal situation (saturation of the interactome readout), the fraction of genome interactions affected by the random sampling is identical at 90 % and 50 % and would yield a simQC = 1. While none of the evaluated datasets are at saturation, the closer this indicator is to 1, the lower is the difference of the denQC indicators between the two random sub-samplings and the higher is the dataset quality.
Intuitively, high quality datasets generally contain a high amount of genomics regions that are “robust” to the most severe 50 % random sub-sampling (i.e., they will display high denQC.50 levels) they will also show low differences between denQCis assessed at various random sub-sampling conditions (i.e., their simQC.90/50 and simQC.s70/50 will be close to 1). To integrate these two aspects on a single readout, we defined a global QCscore, which summarizes the previous metrics (denQCi and simQCi) into a single quality descriptor according to the following formula:
The QCscore provides a quality readout, in which the influence of both the denQC.50 and the simQCis computed for s90 relative to s50 (simQC.90/50), and s70 relative to s50 (simQC.70/s50) are represented.
Quality scores computed for a variety of long-range chromatin interaction assays
Because of its universal principle, LOGIQA allows to compute quality scores for chromatin interaction datasets generated from a variety of techniques. Indeed, LOGIQA hosts currently QC scores for >250 publicly available HiC (including several variants of the original protocol, like in situ or capture HiC), but also several ChIA-PET (>50) and 4C-seq (>900) datasets.
Materials and Methods
Generation of CircRNA Datasets
Raw RNA-Seq data was obtained from a previous study (Li et al., 2018b). Raw reads generated by the Illumina HiSeq control software were assessed using FastQC (Andrews, 2010). Samples with poor sequence quality were discarded. Sequence reads were mapped to the hg38 genome using STAR (Dobin et al., 2013). Fusion junction reads were parsed using CIRCexplorer2 and captured back-splicing junction reads were then annotated with UCSC annotation files (Zhang et al., 2016). For additional detections, sequence reads were mapped to the hg38 genome using the Burrows-Wheeler Alignment tool (Li and Durbin, 2009), circRNAs were identified using CIRI (Gao et al., 2015). Gene annotation was performed using org.Hs.eg.db (Carlson, 2017). A list of circRNAs common to both control and LPC-stimulated HAECs was generated using the VLOOKUP function in Microsoft Excel. To filter out significantly changed circRNAs, ratios of control to LPC-stimulated circRNA read numbers were calculated, and base 2 logarithms taken of these ratios. Mean and standard deviation of logarithm values were calculated using MATLAB and used to generate a 95% confidence interval (mean ± 2 ∗ SD). Using previously obtained mRNA expression data (Li et al., 2018b), an mRNA fold change confidence interval was similarly calculated using the ten housekeeping genes: C1orf43, CHMP2A, GAPDH, EMC7, GPI, PSMB2, PSMB4, RAB7A, SNRPD3, and VPS29 and it was used to separate circRNAs by corresponding mRNA expression change.
Comparison of Flanking Intron Sequences
The generated circRNA data included flanking intron coordinates a Python script was used to parse these coordinates and feed them into an NIH NCBI BLAST+ command line query (blastn -max_hsps 1 -outfmt pident e-Value bitscore”) comparing the sequences of 3′ and 5′ flanking intron sequences using local copies of hg38 human genome primary assembly chromosome sequences 1 (Camacho et al., 2009). This returned percent identity, the expected value (number of matches of equal strength expected by chance), and bitscore values (normalized score for match strength) for each pair of flanking introns. These were then exported to a text file and imported into a Microsoft Excel spreadsheet for tabulation. Bitscore values were further used for statistical analysis (described below under “Statistical Analysis”).
Matching CircRNAs to Database Entries
The genomic length of each significantly changed circRNA was calculated by subtracting its genomic start coordinate from its genomic end coordinate. A MATLAB script was then used to find circRNAs with matching length and gene locus from a local copy of the circRNAdb database and to export all results to a Microsoft Excel spreadsheet (Schliebs et al., 1996 Chen et al., 2016). Notably, genomic coordinates in circRNAdb were given with respect to the older hg37 human genomic annotation. A local copy of the hg37 genomic annotation was therefore obtained for comparison 2 . Then, NIH NCBI BLAST+ queries between original hg38 coordinates and CircRNAdb hg37 coordinates (blastn -max_hsps 1 -outfmt pident”) were performed using a MATLAB script (Schliebs et al., 1996 Camacho et al., 2009), taking only the single best match for each pair of sequences, the percent identity values of which confirmed sequence alignment for all length matches found. CircInteractome also uses the hg37 human genome annotation but a different set of IDs matches in CircInteractome were determined by hand due to incomplete downloadable data sets.
Analysis of Long–Range Interactions
A complete list of long–range chromatin interaction sites in the human genome was obtained from the 4DGenome database 3 as a tabulated text file (Teng et al., 2015). The grep command line utility was used to filter for entries detected using Hi-C methodology and involving at least one circRNA-related gene. The resulting filtered data was imported into Microsoft Excel and raw interaction distances calculated as the differences between gene start coordinates. An AWK script was used to determine whether the circRNA-related gene was downstream or upstream of its partner in each interaction pair and to add this information to the data file. The signs of distance values were then updated, with downstream entries designated as positive and upstream values designated as negative, using a Python script. These updated distance values were separated into the same six groups as their corresponding circRNA-related genes and used in pairwise two-sample Kolmogorov–Smirnov tests (described further below under “Statistical Analysis”). Distance distributions for all upregulated and all downregulated circRNAs were compared by groups overall as well as by only upregulated and downregulated circRNAs with increased, unchanged, or decreased corresponding mRNA expression, respectively.
Characterizing Open Reading Frame Sequences
Open reading frame (ORF) peptide sequences and internal ribosomal entry site (IRES) data were obtained to identify significantly changed circRNAs from circRNAdb (Chen et al., 2016). A MATLAB script was used to call remote NIH NCBI BLAST+ peptide alignments for all ORF sequences against the NCBI non-redundant protein sequences database (nr 4 ) (Schliebs et al., 1996 Camacho et al., 2009), restricted to entries for Homo sapiens (blastp -db nr -remote -entrez_query “Homo sapiens [Organism]” -max_target_seqs 1 -outfmt sacc pident e-Value”). This returned the subject accession, percent identity, and expected value of the single best database match, which was then exported with the gene name and circRNAdb ID to a Microsoft Excel spreadsheet. The obtained accessions were individually and manually searched on the NCBI protein database 5 to determine if they corresponded to canonical mRNA transcripts from the same gene locus as the corresponding circRNA. A modified version of the hg38/hg37 comparison MATLAB script was also used to run NIH NCBI BLAST+ searches for the Kozak consensus sequence gccRccAUGG in genomic sequences of all significantly changed circRNAs (Kozak, 1987 Schliebs et al., 1996 Camacho et al., 2009).
QIAGEN Ingenuity Pathway Analysis (IPA) software, which constructs predicted upstream and downstream causal networks for input datasets from a curated research literature base, was used to elucidate potential downstream pathways for sponged miRNAs (Krämer et al., 2014). Lists of miRNAs were input into IPA and ran through the miRNA target filter to generate a list of potential mRNA targets. The list of mRNA target genes was then run through an IPA core analysis. All canonical downstream pathways returned in the resulting output were exported into a Microsoft Excel spreadsheet. The top ten pathways were extracted for further qualitative consideration.
Graphical Figure Generation
For three-group Venn diagrams, lists of genes were input into an online Venn diagram generator ( 6 Evolutionary Genomics, Ghent University, Gent, Belgium) (Draw Venn Diagram). This tool was used to produce both diagrams and lists of overlapped genes between groups. For six-group Venn diagrams, gene lists were input into the InteractiVenn online tool 7 to generate diagrams (Heberle et al., 2015), while the above Ghent University tool was again used to produce lists of overlapped genes between groups (Draw Venn Diagram). Explanatory and conceptual graphics were produced using Microsoft Paint.
Descriptive summary statistics were reported by group. Data were checked for normality assumption and, if found to be not normally distributed, subsequently transformed using various functions such as the log10 and cubic-root to find the optimal transformation for the underlying chromatin long-range interaction distance distribution data. Chromatin long-range interaction distance density functions were then estimated and plotted by group under the optimal transformation using the non-parametric kernel density approach with a normal weight function (Jones et al., 1996 Hollander and Wolfe, 1999 Silverman, 2018). Pairwise comparisons of median chromatin long-range interaction distance among the six groups were performed with multiple comparison adjustments using the Dwass, Steel, and Critchlow-Fligner method based on the Wilcoxon test for downstream and upstream data separately (Douglas and Michael, 1991 Conover, 1999 Hollander and Wolfe, 1999 Shalabh, 2011). Pairwise comparisons between groups for chromatin long-range interaction distance distribution, median distance location shift, and distance distribution scale were implemented using the Kolmogorov–Smirnov two-sample test, Hodges–Lehmann estimation method and Fligner-Policello test, and Ansari-Bradley test, respectively, again for downstream and upstream data separately (Lehmann, 1963 Douglas and Michael, 1991 Conover, 1999 Hollander and Wolfe, 1999 Shalabh, 2011). SAS version 9.4 was used to perform these analyses and generate density function plots for the chromatin long-range interaction distance data.
For rest of the data, single-factor ANOVA and non-parametric Kruskal–Wallis tests were conducted for all multi-group analyses using the Real Statistics Resource Pack add-in for Microsoft Excel 8 (Zaiontz, 2013). For confidence intervals, MATLAB was used to calculate mean and standard deviations for data sets (Schliebs et al., 1996). Probabilities of ratios of upregulation to downregulation were calculated by summing binomial coefficients and dividing by the appropriate power of 2 in MATLAB (Schliebs et al., 1996), as given by the formula p = ∑ i = n u ( i u ) 2 u , where u is the total number of genes considered and n is the large number of upregulated or downregulated genes.
Transcription factors and their three-dimensional interactions are crucial to gene regulation [1, 2]. Many distal transcription factor binding sites have been identified by genome-wide chromatin experiments, such as chromatin immunoprecipitation (ChIP)-chip , ChIP-paired-end tag (PET) , and ChIP-Seq , but it is not clear which of these distal transcription factor binding sites are real and functional in gene regulation, and which are non-functional 'parking spots'. Three-dimensional chromatin interactions have been shown to bring distal transcription factor binding sites into close spatial proximity to gene promoters , but global analysis of three-dimensional chromatin interactions has been limited by the lack of techniques for high-resolution and whole-genome analysis.
Recently, we developed a global, de novo, high-throughput method, Chromatin interaction analysis with paired-end tag sequencing (ChIA-PET), for characterizing the three-dimensional structures of long-range chromatin interactions in the nucleus [7–9], which makes it possible to identify transcriptional binding sites involved in long-range interactions at a genome-wide level. The key features in ChIA-PET analysis (Figure 1a) are that the cross-linked chromatin interaction nodes bound by protein factors are enriched by ChIP, and remote DNA elements tethered together in close spatial distance in these chromatin interaction nodes are connected through proximity ligation with oligonucleotide DNA linkers. We designed linker sequences that not only contain MmeI restriction sites for PET extraction, but also include specific nucleotide barcodes to assess the noise level in ChIA-PET data from random ligation. Upon MmeI digestion, the resulting PET construct contains a 20 bp head tag, a 38 bp linker sequence, and a 20 bp tail tag, which is the template for next generation paired-end sequencing, for example, Illumina paired-end sequencing from the two ends in opposite directions (Figure 1b). Each of the paired sequencing reads uncovers the 20 nucleotide tag sequence and the 16 nucleotide sequence from the attached linker sequence including the nucleotide barcodes. When PETs are mapped to the corresponding reference genome sequences, the genomic distance between the two mapped tags will reveal whether a PET is derived from a self-ligation product of a single DNA fragment (short genomic distance) or an inter-ligation product of two DNA fragments (long genomic distance, or inter-chromosomal) (Figure 1c). The overlapping ChIP fragments inferred by PET sequences will reveal true binding sites and long-range chromatin interactions bound by such protein factors, whereas the singletons mostly reflect the random background noise (Figure 1d).
Schematic of ChIA-PET analysis. (a) The ChIA-PET experimental protocol, which includes chromatin preparation, ChIP, linker ligation, proximity ligation, MmeI restriction digestion, and DNA sequencing. (b) The ChIA-PET constructs prepared for sequencing analysis. Each PET construct involves a pair of tags (20 bp each) and a linker (38 bp) between the tag pairs. This full-length linker is derived from ligation of two half-linkers, A or B, each with a unique barcode nucleotide (CG for half-linker A and AT for half-linker B). The barcode nucleotides are highlighted as red letters. Linkers with AB barcodes are considered to be non-specific chimeric proximity ligation products. (c) Mapping tags of PET sequences to reference genome. The categories of 'self-ligation PETs' and 'inter-ligation PETs' were assigned. (d) Clustering of overlapping PET sequences in the same genomic regions to identify enriched protein binding sites by overlapping 'self-ligation PETs' and long-range chromatin interactions by overlapping 'inter-ligation PETs'.
The ChIA-PET approach is very efficient in generating large volumes of PET sequence data for long-range chromatin interactions with different protein factors in complex genomes. Since the detection of long-range chromatin interactions involves high levels of background noise due to the complexity of chromatin structures in nuclear space and the nature of proximity ligation [7, 8], a meaningful analysis requires a comprehensive, efficient pipeline. The immense challenges in the setup of an efficient pipeline to process the huge body of ChIA-PET sequence data include: how to accurately filter the linker sequences from the raw reads how to accurately and efficiently map the tag sequences to reference genomes how to evaluate the noise level in the data how to identify bona fide binding sites and chromatin interactions how to organize the datasets and how to effectively visualize the long-range chromatin interactions identified by ChIA-PET analysis. Many of the bioinformatics challenges faced in the ChIA-PET analysis are unprecedented.
In developing the ChIA-PET data analysis algorithms, we assembled a package of sophisticated bioinformatics solutions called 'ChIA-PET Tool' for processing, analyzing, visualizing, and managing ChIA-PET data quickly, accurately, and automatically. In this report, we describe the design and implementation of ChIA-PET Tool, and demonstrate its efficiency and effectiveness through processing and analyzing an estrogen receptor α (ERα) ChIA-PET library dataset from the MCF-7 cell-line.
Buenrostro JD, Wu B, Litzenburger UM, Ruff D, Gonzales ML, Snyder MP, et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature. 2015523:486–90.
Cusanovich DA, Daza R, Adey A, Pliner HA, Christiansen L, Gunderson KL, et al. Multiplex single cell profiling of chromatin accessibility by combinatorial cellular indexing. Science. 2015348:910–4.
Lake BB, Chen S, Sos BC, Fan J, Kaeser GE, Yung YC, et al. Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain. Nat Biotechnol. 201836:70–80.
Chen X, Miragaia RJ, Natarajan KN, Teichmann SA. A rapid and robust method for single cell chromatin accessibility profiling. Nat Commun. 20189:5345.
Lareau CA, Duarte FM, Chew JG, Kartha VK, Burkett ZD, Kohlway AS, et al. Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nat Biotechnol. 201937:916–24.
Cusanovich DA, Hill AJ, Aghamirzaie D, Daza RM, Pliner HA, Berletch JB, et al. A Single-Cell Atlas of In Vivo Mammalian Chromatin Accessibility. Cell. 2018174:1309–24. e18.
Preissl S, Fang R, Huang H, Zhao Y, Raviram R, Gorkin DU, et al. Single-nucleus analysis of accessible chromatin in developing mouse forebrain reveals cell-type-specific transcriptional regulation. Nat Neurosci. 201821:432–9.
Corces MR, Buenrostro JD, Wu B, Greenside PG, Chan SM, Koenig JL, et al. Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat Genet. 201648:1193–203.
Buenrostro JD, Corces MR, Lareau CA, Wu B, Schep AN, Aryee MJ, et al. Integrated Single-Cell Analysis Maps the Continuous Regulatory Landscape of Human Hematopoietic Differentiation. Cell. 2018173:1535–48. e16.
Satpathy AT, Saligrama N, Buenrostro JD, Wei Y, Wu B, Rubin AJ, et al. Transcript-indexed ATAC-seq for precision immune profiling. Nat Med. 201824:580–90.
Schep AN, Wu B, Buenrostro JD, Greenleaf WJ. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat Methods. 201714:975–8.
Ji Z, Zhou W, Ji H. Single-cell regulome data analysis by SCRAT. Bioinformatics. 201733:2930–2.
Urrutia E, Chen L, Zhou H, Jiang Y. Destin: toolkit for single-cell analysis of chromatin accessibility. Bioinformatics [Internet]. 2019 Available from: https://doi.org/10.1093/bioinformatics/btz141.
Zamanighomi M, Lin Z, Daley T, Chen X, Duren Z, Schep A, et al. Unsupervised clustering and epigenetic classification of single cells. Nat Commun. 20189:2410.
Bravo González-Blas C, Minnoye L, Papasokrati D, Aibar S, Hulselmans G, Christiaens V, et al. cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data. Nat Methods. 201916:397–400.
Baker SM, Rogerson C, Hayes A, Sharrocks AD, Rattray M. Classifying cells with Scasat, a single-cell ATAC-seq analysis tool. Nucleic Acids Res. 201947:e10.
Sinnamon JR, Torkenczy KA, Linhoff MW, Vitak SA, Mulqueen RM, Pliner HA, et al. The accessible chromatin landscape of the murine hippocampus at single-cell resolution. Genome Res. 201929:857–69.
Fang R, Preissl S, Hou X, Lucero J, Wang X. Fast and accurate clustering of single cell epigenomes reveals cis-regulatory elements in rare cell types. bioRxiv [Internet]. biorxiv.org 2019 Available from: https://www.biorxiv.org/content/10.1101/615179v2.abstract.
Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 20089:R137.
Packer JS, Zhu Q, Huynh C, Sivaramakrishnan P, Preston E, Dueck H, et al. A lineage-resolved molecular atlas of C. elegans embryogenesis at single-cell resolution. Science. 2019365(6459).
Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics Narnia. 200925:1754–60.
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 200910:R25.
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 20129:357–9.
Hatem A, Bozdağ D, Toland AE, Çatalyürek ÜV. Benchmarking short sequence mapping tools. BMC Bioinformatics. 201314:184.
Guo Y, Mahony S, Gifford DK. High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints. PLoS Comput Biol. 20128:e1002638.
Thomas R, Thomas S, Holloway AK, Pollard KS. Features that define the best ChIP-seq peak calling algorithms. Brief Bioinform. 201718:441–50.
Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM 3rd, et al. Comprehensive Integration of Single-Cell Data. Cell. 2019177:1888–902. e21.
Otto C, Stadler PF, Hoffmann S. Lacking alignments? The next-generation sequencing mapper segemehl revisited. Bioinformatics. 201430:1837–43.
McInnes L, Healy J, Melville J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction [Internet]. arXiv [stat.ML]. 2018. Available from: http://arxiv.org/abs/1802.03426.
Baek S, Goldstein I, Hager GL. Bivariate genomic footprinting detects changes in transcription factor activity. Cell Rep. 201719:1710–22.
Li Z, Schulz MH, Look T, Begemann M, Zenke M, Costa IG. Identification of transcription factor binding sites using ATAC-seq. Genome Biol. 201920:45.
Korsunsky I, Fan J, Slowikowski K, Zhang F, Wei K, Baglaenko Y, et al. Fast, sensitive, and accurate integration of single cell data with Harmony [Internet]. Available from: https://doi.org/10.1101/461954.
Pliner HA, Packer JS, McFaline-Figueroa JL, Cusanovich DA, Daza RM, Aghamirzaie D, et al. Cicero predicts cis-regulatory DNA interactions from single-cell chromatin accessibility data. Mol Cell. 201871:858–71.e8.
Satpathy AT, Granja JM, Yost KE, Qi Y, Meschi F, McDermott GP, et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat Biotechnol. 201937:925–36.
Amemiya HM, Kundaje A, Boyle AP. The ENCODE blacklist: identification of problematic regions of the genome. Sci Rep. 20199:9354.
Zhu YP, Thomas GD, Hedrick CC. 2014 Jeffrey M. Hoeg award lecture: transcriptional control of monocyte development. Arterioscler Thromb Vasc Biol. 201636:1722–33.
Nechanitzky R, Akbas D, Scherer S, Györy I, Hoyler T, Ramamoorthy S, et al. Transcription factor EBF1 is essential for the maintenance of B cell identity and prevention of alternative fates in committed cells. Nat Immunol. 201314:867–75.
Yu Y, Wang J, Khaled W, Burke S, Li P, Chen X, et al. Bcl11a is essential for lymphoid development and negatively regulates p53. J Exp Med. 2012209:2467–83.
Kurotaki D, Sasaki H, Tamura T. Transcriptional control of monocyte and macrophage development. Int Immunol. 201729:97–107.
Halene S, Gaines P, Sun H, Zibello T, Lin S, Khanna-Gupta A, et al. C/EBPepsilon directs granulocytic-vs-monocytic lineage determination and confers chemotactic function via Hlx. Exp Hematol. 201038:90–103.
Schinnerling K, García-González P, Aguillón JC. Gene expression profiling of human monocyte-derived dendritic cells – searching for molecular regulators of Tolerogenicity [internet]. Front Immunol. 2015 Available from: https://doi.org/10.3389/fimmu.2015.00528.
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 201430:2114–20.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 200925:2078–9.
Lun ATL, Riesenfeld S, Andrews T, Dao TP, Gomes T, participants in the 1st human cell atlas jamboree, et al. EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data. Genome Biol. 201920:63.
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 201415:550.
Ramírez F, Ryan DP, Grüning B, Bhardwaj V, Kilpert F, Richter AS, et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 201644:W160–5.
Chen H, Lareau C, Andreani T, Vinyard ME, Garcia SP, Clement K, et al. Assessment of computational methods for the analysis of single-cell ATAC-seq data. Genome Biol. 201920:241.
Yu W, Uzun Y, Zhu Q, Chen C, Tan K. scATAC-pro: a comprehensive workbench for single-cell chromatin accessibility sequencing data. Source Code GitHub Repository 2020, https://github.com/tanlabcode/scATAC-pro.
Yu W, Uzun Y, Zhu Q, Chen C, Tan K scATAC-pro: a comprehensive workbench for single-cell chromatin accessibility sequencing data Source Code Zenodo Repository 2020, DOI: https://doi.org/10.5281/zenodo.3696036.
Yu W, Uzun Y, Zhu Q, Chen C, Tan K. scATAC-pro: a comprehensive workbench for single-cell chromatin accessibility sequencing data. Analysis Code and Data Github Repository 2020, https://github.com/tanlabcode/scATAC-pro_paper.
Yu W, Uzun Y, Zhu Q, Chen C, Tan K scATAC-pro: a comprehensive workbench for single-cell chromatin accessibility sequencing data Analysis Code and Data Zenodo Repository 2020, DOI: https://doi.org/10.5281/zenodo.3732194.
- : Encyclopedia of DNA Elements. Also available on ENSEMBL and the UCSC genome browser. : The NIH Roadmap Epigenomics Mapping Consortium offers maps of histone modifications, chromatin accessibility, DNA methylation, and mRNA expression across 100s of human cell types and tissues. : The International Human Epigenome Consortium (IHEC) brings forth reference epigenomes relevant to health and disease. View, search, and download all the data. : Store and work with genomic and epigenomic data from a number of international consortiums. : For the UCSC genome browser fans. : A web browser that offers tracks from ENCODE and Roadmap Epigenomics projects.
- : A database of chromatin interactions across five species. Includes data from 3C, 4C, 5C, ChIA-PET, Hi-C, Capture-C, and IM-PET.
- : A knowledgebase of epigenome-wide association studies. : Whole-genome bisulfite sequencing (WGBS) database for many different tissues, pathological conditions, and species. : Hundreds of methylomes from well studied organisms.
- : Published miRNA sequences. : A database of all kinds of noncoding RNAs (except tRNAs and rRNAs) for 16 species. : Linking sequence to trait, check out what polymorphisms in your microRNA can do. : Public circRNA data and custom python scripts for circRNA discovery in your own (ribominus) RNA-seq data.
Got a tool or database you dig? Let us know about it so we can share it in the spirit of open science!
Stay in the know with our twice monthly news update delivered to your inbox.
Hi-C analysis was based on 91.9 million read pairs that passed processing and quality filtering in HOMER. A genome-wide survey of structural aberrations is presented in Figure 2. This heatmap depicts the ratio of observed interaction frequencies and the expected frequencies based on a background model. Translocations are indicated by higher than expected frequencies of interchromosomal interactions (red color). Correspondingly, intrachromosomal interaction frequencies of the chromosomes involved in the translocation are decreased (blue color). The color gradient indicates the orientation of the breakpoint i.e., interaction intensities decrease with distance from chromosomal breakpoints. In total, we identified 22 translocations, from which we were able to fine-map 32 breakpoints to a single HindIII fragment (Table 1 note that only 32 of the 34 breakpoints as listed in Table 1 were considered for the following analysis as in two cases breakpoints mapped to the same HindIII fragment). A comparison of Hi-C data with whole-genome sequencing data generated by a different laboratory using a different batch of Se-Ax cells (33, 34) revealed an overlap of 25 breakpoints. These have been highlighted in Table 1. A comparison of translocation breakpoints with array CGH data generated in a previous study by our laboratory with a resolution of kb (30) revealed that 11 of those breakpoints not identified by whole-genome sequencing were flanked by either deletions (n = 7) or duplications (n = 4). Other translocation breakpoints solely identified by Hi-C analysis were in close vicinity to other translocations, suggesting the presence of a complex rearrangement (t1/t10 t7/t8 t8/t15 and t13/t14). Yet, it has to be emphasized that non-overlapping breakpoints may also be owed to private mutations emerging during cultivation of Se-Ax cells in different laboratories over longer time or other technical reasons, in particular differences in resolution.
Figure 2. Genome-wide interaction frequencies in Se-Ax. Higher and lower than expected normalized interaction frequencies are shown with 2.5 Mb resolution in red and blue, respectively. The chromosome numbers are given at the top and to the right together with information on DNA copy number losses (red) and gains (green) as detected by array comparative genomic hybridization. Translocations are characterized by interchromosomal interactions higher than expected, while their corresponding intrachromosomal interactions are decreased. A more detailed view of selected chromosomes is provided in Figure 3.
Table 1. Translocation breakpoints (hg19).
As an example for the complexity of chromosomal aberrations, a zoom-in depicting interchromosomal interactions for chromosomes 2, 6, and 11 is given in Figure 3. Additionally, chromosomal deletions and duplications identified by arrayCGH analysis of Se-Ax are indicated in both heatmaps.
Figure 3. Heatmap of normalized interchromosomal interaction frequencies between chromosomes 2 and 6 and chromosomes 2 and 11. Two heatmaps are shown, which demonstrate the presence of a translocation t(26) (left) and t(211) (right), respectively. Both derivative chromosomes lead to higher than expected interchromosomal interaction frequencies, which are indicated by the red color gradient. Alterations of DNA copy number state as detected by array comparative genomic hybridization is indicated by coloring of the chromosome ideograms (red = deletion, green = gain). While the breakpoint of reciprocal translocation t(211) is easily identifiable , the identification of t(26)  is complicated by additional deletions of chromosome 2  and chromosome 6  and an inversion of chromosome 2 . Orientation of chromosomal rearrangements can be inferred from the color gradient [interaction intensities (i.e., red color) decrease with distance from chromosomal breakpoints].
Deletions adjacent to transclocation breakpoints have been encountered 12 times (out of 32 breakpoints Figure 4). The Circos plot depicted in Figure 5 demonstrates chained translocations with shared breakpoints between several chromosomes on the example of chromosome 5, 8, and 10.
Figure 4. Deletions adjacent to the translocation breakpoints identified in Se-Ax. Smoothed log2 ratios of DNA copy number within a 2 Mb interval surrounding the translocation breakpoints are shown for Se-Ax (red line). DNA copy numbers of additional cell lines for the very same intervals are displayed for comparison (see insert box for color legend).
Figure 5. Circos plot visualizing chained translocations between chromosomes 5, 8 and 10. In this Circos plot chromosomes are radially aligned. Arcs within this circle indicate significant interchromosomal and long distance intrachromosomal interactions. Following the numbering given in the small insert to the left: (1) significant interchromosomal interactions (blue lines FDR π.001), significant long distance intrachromosomal interactions (gray lines 㸥 Mb, FDR π.001) and translocations as given in Table 1 (black lines) (2) radially aligned chromosome ideograms (3) count of significant interactions per 50 kb bin (all interaction distances max = 10) (4) DNA copy number status in red (deletion) and green (gain) as detected by array comparative genomic hybridization.
In order to evaluate the impact of spatial proximity of chromosomes on the emergence of translocations, we screened public Hi-C datasets for interactions between those chromosomal intervals affected by translocations in Se-Ax. We failed to get any clues on higher interaction probabilities between regions encompassing the translocation regions, neither in the data of the lymphoblastoid cell line GM12878, which we processed the same way as the Se-Ax data, nor in the datasets from the 4DGenome database. Permutation analysis revealed a significant overrepresentation of translocation breakpoint-associated HindIII fragments within genes (p = 0.00208, 100,000 permutations). For one of the possible fusion genes (AIG1/GOSR1), transcripts were identified in the corresponding published RNA-Seq data (33).
In eukaryotes, DNA is densely packed into a higher order structure called chromatin. This has a profound impact on processes that work on DNA, such as replication or gene expression (Campos and Reinberg, 2009 Narlikar et al., 2002). Cells therefore contain various protein complexes that regulate chromatin structure. The basic unit of chromatin is a nucleosome, formed by 147 base pairs of DNA wrapped around an octamer of two copies of the histones H2A, H2B, H3 and H4 (Richmond and Davey, 2003). Chromatin regulators include nucleosome remodelers, histone chaperones and histone modifying complexes (Narlikar et al., 2002). Nucleosome remodeling complexes slide or evict nucleosomes and are also involved in deposition of histones and their variants. Histone modifier complexes covalently modify histone tails with different marks. Besides influencing nucleosome turn-over and altering physical properties such as chromatin condensation, specific modifications also serve as recognition sites for other proteins. These effectors further regulate chromatin structure or facilitate the process of gene expression itself (Campos and Reinberg, 2009).
An elegant model has been put forward to explain the consequences of chromatin modifications (Strahl and Allis, 2000). In the histone code hypothesis, different combinations of modifications form a code that is read by other proteins to influence downstream events. Although the location of many histone marks correlate with particular expression states, stringent evidence for causal relationships is often missing (Rando and Chang, 2009). Furthermore, the discovery that the same histone modification may be bound by different effectors, each mediating different downstream events, also calls into question the existence of a strictly rigid code (Berger, 2007). The consequences of histone modifications are currently being explained by context-dependent binding of effector complexes (Campos and Reinberg, 2009 Lee et al., 2010). The nature of this context is only starting to be investigated. As with the histone code hypothesis itself, proposals about context-dependent binding are based mainly on studies of individual genes. One purpose of this study is therefore to determine to what extent either context-dependency or a code applies to different chromatin interactions when assayed across an entire genome.
A related question is how different chromatin interactions work together. The general architecture of chromatin interaction pathways and how this may vary for different genes, is as yet unexplored. To understand the effects of different chromatin states, the focus of many studies is on the binding of effector proteins. While this is crucial for understanding mechanism, it can result in ignoring the question of whether a binding event has further downstream consequences, for example on gene expression. A second aim of this study is therefore to investigate interactions as manifested by their downstream consequences on gene expression.
Genome-wide expression analysis has previously been applied to study the role of many individual regulators. The use of different microarray platforms, different genetic backgrounds and different growth conditions in these previous studies, confounds proper comparative analyses. Here we analyze the interplay between gene expression and chromatin by determining expression profiles for perturbing the majority of chromatin regulatory machineries in Saccharomyces cerevisiae under identical conditions. This was achieved by DNA microarray expression-profiling 165 yeast strains, each bearing a mutation in a different chromatin factor. The results show a remarkable degree of specificity, also for mutants resulting in loss of widespread histone marks. The data is analyzed at three levels of complexity: analysis of individual profiles to determine cellular roles, analysis of protein complexes to examine subunit relationships and analysis of relationships between complexes to investigate the architecture of interaction pathways. The result is a first function-based network of chromatin interactions. The network reveals that individual chromatin regulators are almost all functionally connected to others and form pathways that branch and interconnect at different levels. The study shows how elements of the histone code and context-dependent binding of chromatin are superimposed to form chromatin interaction pathways. Removal of individual chromatin factors has much more specific and restricted effects on gene expression than is predicted by location. This suggests the presence of additional gene-dependent mechanisms that go beyond context-dependent binding to achieve specificity. The network and underlying data therefore provide a framework for investigating how globally acting chromatin regulators facilitate specific responses.
Bioinformatics and Genomics Program, The Pennsylvania State University, University Park, State College, PA, 16802, USA
Yanli Wang, Fan Song, Bo Zhang & Feng Yue
Department of Biochemistry and Molecular Biology, College of Medicine, The Pennsylvania State Hershey, Hershey, PA, 17033, USA
Lijun Zhang, Jie Xu & Feng Yue
Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, 19104, USA
Department of Genetics, The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, 63108, USA
Daofeng Li, Mayank N. K. Choudhary & Ting Wang
Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA
Department of Biostatistics, University of North Carolina, Chapel Hill, NC, 27599, USA
Department of Computer Science, University of North Carolina, Chapel Hill, NC, 27599, USA
Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, 44195, USA
Center for Computational Biology and Bioinformatics, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, State College, PA, 16802, USA