Data Analysis and Informatics
From instrument to decision: improved decision-making on complex data at scale
The increasing popularity of high-content screening (HCS) and phenotypic profiling in preclinical drug discovery is generating enormous amounts of complex imaging data. The flexibility of these imaging-based assays allows researchers to quantify many different biological processes using a single technology. Examples include examining nuclear translocation of proteins, internalization of receptors, and morphological changes in response to tens of thousands to hundreds of thousands of treatments. Despite the experimental throughput of HCS, analyzing and interpreting HCS imaging data remains a key bottleneck in utilizing these systems. Scientists often need to collaborate closely with computer vision experts and data scientists to extract informative measurements (i.e. features) from imaging data and design customized analysis pipelines for each new assay. Machine learning provides a unique opportunity to automate and accelerate many of the steps involved in analyzing HCS screens.
Recent results have shown that deep learning, specifically deep convolutional networks (CNNs) trained directly on raw pixel data, outperform existing approaches at classifying and clustering cellular phenotypes. The tradeoff often associated with these methods is the lack of interpretability of predictions made by deep learning models. We’ve designed several novel machine learning process for HCS that prioritize interpretability, by highlighting regions in the image that are responsible for the model’s predictions. These models combine fully convolutional neural networks, typically used for image segmentation, with convolutional multiple instance learning (convMIL) to aggregate predictions spatially across fields of view. Additionally we’ve correlated predictions made by convMIL models with features extracted from individual cells using traditional feature extraction based analyses. Combining these two methods provides an additional layer of model interability by automatically indicating which features are changing most significantly between classes predicted by the CNN. Finally, we’ve developed a novel approach for exploring single cell phenotypes in HCS screens using weakly-supervised learning models combined with an interactive tool for exploring phenotypes. Weakly supervised models are CNNs trained to predict every unique condition in an HCS screen based on image crops of single cells. Once the model is trained, a feature vector is extracted for every cell in the screen based on outputs from intermediate layers in the CNN. These feature vectors are then converted to 2D using dimensionality reduction techniques like t-SNE and UMAP. The interactive scatterplot we’ve built allows scientists to explore this 2D space while being able to see what individual cell phenotypes look like and which treatment conditions are common in different clusters that appear. We’ve used this tool to discover antibodies and compounds that are active in multiple assays that can include multiple cell-types and 3D culture systems. Taken together, these approaches significantly accelerate and improve phenotypic discovery programs.