Category: Data Analysis and Informatics

1220-A - High-Throughput Virtual Screening within KNIME Analytics Platform

Monday, February 5, 2018
2:00 PM - 3:00 PM

Transient receptor potential cation channel subfamily V member 1 (TRPV1) is a nonselective cation channel modulated by both endogenous and exogenous ligands, low pH, temperature and voltage. Many efforts have been made from industry and academia to find novel drugs that modulate the channel activity without undesirable side effects, like hyperthermia. Because this target is widely distributed in sensory nerves, most of the research is focused on inhibiting the channel function to get an antinociceptive effect. In the last few years, it has been proposed as a promising target to treat some forms of epilepsy.
KNIME is a well-known open source analytics platform, mainly used for data-mining, data analysis and data-driven purposes. It allows to construct information workflows connecting differents nodes which performs operations on the data, in a very intuitive manner.
In the present work we have developed two computational QSAR models to identify TRPV1 antagonists with potential antiepileptic activity. The dataset consisting of 583 actives and 208 inactives compounds was rationally partitioned by means of two consecutive clustering instances, using ECFP4 fingerprints. It allow us to exploit the molecular diversity available in the dataset, deriving a balanced Training Set (156 antagonists and 156 inactives), a Test Set of 104 molecules (52 antagonists and 52 inactives) and a simulated database of 30.325 compounds, with a low proportion of actives spreaded within DUD-E generated decoys (1.1 % actives). Both, Fingerprints and Molecular Descriptors were used as features to construct two different ensembles of decision trees (Random Forest). The algorithm hyperparameters were carefully fine tuned by means of k-fold cross validation, using the Area Under the Receiver Operator Curve (AUC-ROC) as the cost function to maximize. Finally the models were subjected to several validation instances, including Internal validations: k-Fold, Out Of Bag and Y-randomization; and External validations: Test Set and a Simulated Database. Additionally, the applicability domain of the models was estimated using a similarity approach based on euclidean distances.
Both models performs very well on all the validations, with an AUC-ROC on the Test Set of 0.983 and 0.992 for the fingerprints and descriptor model, respectively. Enrichment factors of EF-1% = 72.27, EF-10% = 8.85, and BEDROC = 0.843 on the simulated database also supports the model’s usefulness for virtual screening purposes. Therefore, both models could be jointly applied in a Virtual Screening campaign to identify novel TRPV1 antagonists.
The whole pipeline described above is implemented as a Knime Workflow, thus it's quite simple to use or even improve, adding either new data or new functionalities. Moreover, it's also really fast which makes it suitable for High-Throughput Virtual Screening of large chemical databases.

Manuel AUGUSTO. Llanos

Ph.D. Student
National University of La Plata
La Plata, Buenos Aires, Argentina

Manuel A. Llanos. Pharmacist. Ph.D. Student at National University of La Plata, Argentina.
I have obtained a degree in Pharmacy at the National University of La Plata in 2016. Since 2017 I’m a doctoral fellow at this same university, with a fellowship of the National Council of Scientific and Technical Research (CONICET). I work in the field of Medicinal Chemistry, using computational methods to find/design new drugs, in particular I’m focused on finding novel antiepileptic agents acting on Ion Channels. I also hold a position at the university, as a graduate teaching assistant.