Single cell epigenomic assays, such as scATAC-seq, provide genome-wide high-resolution maps of cellular activity and identity. Annotating single cells via comparison to ensemble populations is limited to broader cell types and susceptible to inherent noise and sparsity of single cell data. We hypothesized that via quantitative epigenomic annotation of single cells, using over 500 ENCODE assays, we will identify cellular subpopulations with distinct epigenomic profiles.
To this end, we obtained public scATAC-seq on sorted populations of naïve CD4+ T, memory CD4+ T, CD4+ CD26- Sezary, and CD4+ Th17 cells. We also obtained public epigenomic annotations, including histone modification ChIP-seq, HiC chromatin looping, and open chromatin from DNase-seq and ATAC-seq. We then annotated scATAC-seq peaks with their absolute distance to each epigenomic feature. Next, we employed an unsupervised graph-based clustering algorithm based on canonical correlation analysis (Zhang et al 2018, bioRxiv) to leverage the correlated information between the scATAC-seq and high-dimensional epigenomic features.
As a result, we identified heterogeneity within sorted healthy and Sezary syndrome T cell populations. First, we found Th1, Th2, and Th17 precursors among the naïve CD4+ T cells. Second, we found naïve T effectors, effector memory, and central memory cells among the memory CD4+ T cells. Lastly, we separated naive Th17s from mature Th17s and naïve Sezary cells from mature ones.
In conclusion, with this strategy, we may perform better quality control on sorted populations and potentially identify pathogenic populations with large effect sizes in polygenic traits, which may lead to better diagnostics and potentially subtype-targeting treatments.