Is brightfield all you need for mechanism of action prediction? Image data, CellProfiler features and Grit scores
Dataset description:
The image data provided here is for U2OS cells treated with compounds belonging to ten MoA classes (MoAs that we believed would be reasonably separable and that had a sufficient number of compounds (n) associated with them in our assay). The 10 MoAs were: ATPase inhibitors (ATPase-i, n = 18); Aurora kinase inhibitors (AuroraK-i, n = 20); HDAC inhibitors (HDAC-i, n = 33); HSP inhibitors (HSP-i, n = 24); JAK inhibitors (JAK-i, n = 21); PARP inhibitors (PARP-i, n = 21); protein synthesis inhibitors (Prot.Synth.-i, n = 23); retinoid receptor agonists (Ret.Rec.Ag, n = 19); topoisomerase inhibitors (Topo.-i, n = 32); and tubulin polymerization inhibitors (Tub.Pol.-i, n = 20). The compounds were administered at a dose of 10 micromolar and exposed for 48 h, in 384 well plates. Each compound-level experiment was replicated 6 times. The compounds were distributed across 18 microplates. Images (16-bit, 2160x2160 pixels) were captured with a 20X objective at five sites/fields-of-view in each well, with five fluorescence channels for the Cell Painting fluorescence (FL) data and six evenly spaced z-planes for the brightfield (BF) data.
Organization of files:
1) Raw image data: The image data for the 18 microplates ['P015076', 'P015077', 'P015080', 'P015081', 'P015082', 'P015083', 'P015084', 'P015085', 'P015090', 'P015091', 'P015092', 'P015093', 'P015094', 'P015095', 'P015096', 'P015097', 'P015098', 'P015099'] are located in the corresponding zipped folders (tar.gz).
2) Data tables: data_tables.tar.gz. This zipped folder contains the metadata pertaining to the FL (fl_data.csv) and the BF (bf_data.csv) images. Therein is given the plate, well, site (field-of-view), compound and MoA for each of the images. For FL the columns C1 to C5 give the image names for the tiff files corresponding to each of the five fluorescence channels and for BF the columns C1 to C6 correspond to the 6 z-planes. Note that the site identifiers are different for the BF and FL data, wherein sites [1,2,3,4,5] for BF correspond to sites [2,4,5,6,8] for FL.
3) CellProfiler pipeline:
- HMPSC_2_ICF_Polynom.cppipe - illumination correction pipeline (to calculate illumination correction function)
- HMPSC_3_FEAT_ICFImg_Cellpose_v1_n50_c150_ft0.8.cppipe - feature extraction pipeline (that applies the illumination correction function and extracts features)
4) CellProfiler features: CP_features.tar.gz. This zipped folder contains the cell-level CellProfiler features used for benchmarking purposes in our analysis (CP_features_cells.csv).
5). Grit scores: grit_scores.tar.gz. This zipped folder contains the grit scores and nuclear counts for the imaging sites (grit_scores.csv). This info is provided for all the compounds for which it could be computed (for 227 of the 231 compounds).
Publications:
The data in this repository supports the following two publications:
1. "Is brightfield all you need for mechanism of action prediction?" by Harrison et al.
2. "Combining molecular and cell painting image data for mechanism of action prediction" by Tian et al.
Abstract for publication 1:
Fluorescence staining techniques, such as Cell Painting, together with fluorescence microscopy have proven invaluable for visualizing and quantifying the effects that drugs and other perturbations have on cultured cells. However, fluorescence microscopy is expensive, time-consuming, and labor-intensive, and the stains applied can be cytotoxic, interfering with the activity under study. The simplest form of microscopy, brightfield microscopy, lacks these downsides, but the images produced have low contrast and the cellular compartments are difficult to discern. Nevertheless, harnessing deep learning, these brightfield images may still be sufficient for various predictive purposes. In this study, we compared the predictive performance of models trained on fluorescence images to those trained on brightfield images for predicting the mechanism of action (MoA) of different drugs. We also extracted CellProfiler features from the fluorescence images and used them to benchmark the performance. Overall, we found comparable and correlated predictive performance for the two imaging modalities. This is promising for future studies of MoAs in time-lapse experiments.
Abstract for publication 2:
The mechanism of action (MoA) of a compound describes the biological interaction through which it produces a pharmacological effect. Multiple data sources can be used for the purpose of predicting MoA, including compound structural information, and various assays, such as those based on cell morphology, transcriptomics and metabolomics. In the present study we explored the benefits and potential additive/synergistic effects of combining structural information, in the form of Morgan fingerprints, and morphological information, in the form of five-channel Cell Painting image data. For a set of 10 well represented MoA classes, we compared the performance of deep learning models trained on the two datasets separately versus a model trained on both datasets simultaneously. On a held-out test set we obtained a macro-averaged F1 score of 0.58 when training on only the structural data, 0.81 when training on only the image data, and 0.92 when training on both together. Thus indicating clear additive/synergistic effects and highlighting the benefit of integrating multiple data sources for MoA prediction.