By performing gene expression meta-analysis of 36,000 patient samples from 103 diseases, we find that only 20% of published gene associations exhibit significant differential expression. We contextualize the gene expression analysis with phenotypic disease similarity learned from electronic health records of two million patients. We compare disease similarity profiles based on molecular and clinical manifestations. We observe that many autoimmune diseases cluster with infectious diseases that are hypothesized to trigger the disease onset. For example, the only non infectious disease in a cluster of viral infections was systemic lupus erythematosus, a complex autoimmune disease whose pathogenesis is associated with viral infection. Similarly, Wegener’s granulomatosis and sarcoidosis are clustered with bacterial infections, which are hypothesized to trigger the onset of those diseases.
To uncover new biological relationships between diseases, we identify surprising disease pairs with significant similarity. One surprising pair of diseases that was positively correlated in both the clinical and molecular data is the autoimmune disease rheumatoid arthritis (RA) and the muscular disorder inclusion body myositis (IBM). We validate clinical utility of our analysis by showing that positively correlated diseases tend to share drug indications. From this therapeutic perspective, the example of RA and IBM is particularly promising for improving treatment of IBM because RA has many approved therapies and IBM has none.
By integrating molecular and clinical data, our analysis identifies diseases with under-appreciated relationships and enables drug repositioning by connecting disparate research communities.