Data Analysis and Informatics
From instrument to decision: improved decision-making on complex data at scale
Image based profiling of cellular phenotypes has emerged as a powerful source of information for comparing chemical and genetic treatments. This has opened the door to interrogating the biological impact of a chemical collection at a scale that was not possible before. However, optimizing models to interpret these images generally requires ground truth labels for mechanism of action that are difficult to generate at scale using conventional techniques. Most often what is available for compounds in the discovery stage is the nominal target but that does not capture primary and off-target effects. Moreover, plate, batch and instrument variation can complicate the transfer of methods and analyses across different datasets or instruments limiting the utility of public data for this purpose. Thus, research groups are compelled to build data analysis methods using the same instruments and protocols that they would apply to their own data, even in the absence of corresponding ground truth. In this talk, we describe a reliable system for discerning and labeling distinct image-based cellular phenotypes in such a scenario. We cover a number of methods for feature extraction (hand-engineered features, deep learning) and analysis (clustering, similarity metrics, hit calling, correcting batch effects). Most of our methods are based on the central principle that biological replicates exhibit identical phenotypes. We run follow-up assays to validate our results.