The year 2024 is an exciting year for computational science, with the Physics Nobel Prize being awarded for research on “artificial neural networks” and the Chemistry Nobel Prize for “protein structure prediction and design”. Given the rapid advances in computer-aided drug design (CADD) and artificial intelligence-driven drug discovery (AIDD), a paper summarizing their current status and future directions would be timely and informative for the readers of the Journal of Medicinal Chemistry. The aim of this paper is to highlight recent advances, key challenges, and potential synergies between these areas to facilitate relevant discussions in the current literature and science blogs.
Computer-aided drug design (CADD) and artificial intelligence-driven drug discovery (AIDD) have made significant progress in recent years. These fields use physics-based computational methods and machine learning to improve the efficiency and speed of drug design, aiming to revolutionize the way new therapeutics are discovered and optimized. CADD has evolved considerably since the release of the first molecular docking software, DOCK, with improved methods for force-field development and conformational sampling, including small and large molecules. New molecular docking and scoring algorithms have improved the prediction of ligand-receptor interactions and boosted hit rates in virtual screening. Integration of pharmacophore modeling with quantitative conformational relationship (QSAR) methods has made predictive models more robust. Platforms like Schrödinger, Molecular Operating Environment (MOE), and OpenEye Scientific have improved user interface and computational speed, making them easily accessible to researchers who are not computational chemists through cloud computing and interactive modes. Structure-based virtual screening has become a “state-of-the-art” tool for identifying starting chemotypes in early drug discovery. With the continued expansion of “make-to-order” chemical libraries, it will be possible to virtually screen up to a trillion compounds in a matter of weeks in the next few years, providing an unprecedented opportunity for the exploration of novel hit backbones.
In addition, accurate protein structure prediction enabled by deep learning will increasingly fuel the successful application of virtual screening in novel biological targets, whereas ligand discovery through machine learning alone is often difficult to succeed because of the scarcity of chemical information on these targets. Virtual screening of compound libraries covering a wide range of chemical space has been widely used in the last five years.Lyu et al. performed dopamine D4 receptor docking on 138 million compounds and AmpC β-lactamase docking on 99 million compounds, finding a diversity of novel hits for both targets.Gorgulla et al. performed KEAP1 on 1.3 billion compounds Gorgulla et al. docked 1.3 billion compounds with the KEAP1 redox sensor and confirmed 12% binding by surface plasmon resonance (SPR).Sadybekov et al. proposed a “Virtual Synthesizer Hierarchical Enumeration Screening” (V-SYNTHES) methodology, whereby docking calculations were performed on fragmented compounds representing all backbones available for library synthesis.Recursion, Inc. successfully performed virtual screening of compounds for AmpC β-lactamases, discovering a diverse set of novel hits for two targets. Recursion successfully virtually screened approximately 15,000 human proteins containing over 80,000 potential binding pockets using Enamine REAL Space.
Meanwhile, traditional modeling that relies on docking has driven the development of more accurate physics-based prediction methods such as free energy perturbation (FEP) and thermodynamic integration (TI). In recent years, affinity free energy (RBFE) calculations have been extensively validated in the optimization of leading compounds for a wide range of targets such as proteases, kinases and GPCR. Enhanced sampling molecular dynamics methods have been developed to detect cryptic pockets and predict ligand binding kinetics, an important parameter associated with in vivo efficacy. Thanks to advances in computational methods, high-performance computing and the availability of GPU-accelerated simulations, the successful application of the CADD process has led to the discovery of a number of clinical candidates.Nimbus, in collaboration with Schrödinger, has used structure-based drug design strategies to drive the discovery of what may be the best TYK2 inhibitor.Morphic Therapeutic has utilized the “Digital Chemistry” - FEP to design a novel small molecule α4β7 integrin inhibitor Relay Therapeutics performed lengthy MD simulations to design a positive binding mechanism that selectively and covalently binds to the ‘P-loop’ in FGFR2. Cys residues in FGFR2.
Artificial intelligence-driven drug discovery (AIDD) has grown rapidly over the past decade with the rise of machine learning and deep learning applications in drug discovery. Notably, the term “AIDD” appears frequently in the literature, public presentations, and the media. However, in order to better understand the best applications of both fields in drug discovery programs, it is critical to understand the fundamental differences between CADD and AIDD, which utilize large-scale datasets from public or proprietary data repositories to enable new predictions through pattern recognition and pre-trained machine learning models. The 3D structure prediction of the whole proteome by Google DeepMind's AlphaFold platform is a revolutionary contribution to AIDD, enabling structure-based virtual ligand discovery at scales far beyond experimental structures. And the development and application of generative models such as Variable Autocoders (VAE), Generative Adversarial Networks (GAN), Chemical Language Models, Reinforcement Learning, Transformer Models, and Diffusion Models have enabled learning and generation of molecular structures with desired biological and physicochemical properties based on the training set, which were proposed as experimentally validated hypotheses.
Benchmark datasets like MOSES and GuacaMol have been released for comparison and validation of the generated models.In 2019, researchers from Insilico Medicine published a groundbreaking paper describing a breakthrough in the successful discovery of a DDR1 inhibitor in only 21 days using a deep generative tensor reinforcement learning model (GENTRL). Insilico Medicine's AI target discovery engine platform “PandaOmics” proposed TNIK as a new target for idiopathic pulmonary fibrosis (IPF). Through the company's generative chemistry tool “Chemistry42”, novel structure generation and medicinal chemistry work led to the discovery and development of INS018_055, which is currently in Phase II clinical trials.
While more than a dozen AI startups and biotechs have participated in this AI race over the past decade, we have also seen an “ebb and flow” of some drug candidates from AI companies. For example, Exscientia-21546, a highly potent and selective A2AR antagonist, failed to advance further in Phase I/II studies, and BenevolentAI terminated the development of its pan-Trk inhibitor BEN-2293 after Phase II trials. Since 2012, AI and biotech company collaborations have been on the rise, especially evident in major partnerships between big pharma and AI-focused biotechs. There is no doubt that AIDD has become mainstream in the pharmaceutical industry. However, there are still a few questions that need to be addressed before “AI-made” drugs can reach the market: First, will high-quality data be available for AI models to learn meaningful patterns and make predictions based on them? Second, can AI really understand biology of sufficient complexity? This is directly related to the controversial question of whether AI can discover truly new targets. Finally, is generative chemistry capable of designing molecules with activity outside of the training space? Molecular generation based on protein binding pockets has the potential to address limitations in molecular generation models that use ligands as training sets.
Despite the impressive progress and success stories of CADD and AIDD, both still face challenges. There is a need to develop cost-effective methods suitable for academia and start-up biotech companies to prioritize virtual screening hits from very large-scale screening campaigns in order to predict and prioritize virtual hits with higher accuracy. Recently, Wu et al. proposed processing solvation terms and GBMV methods to eliminate “cheaters” in virtual screening. We anticipate that other approaches that address gaps in this area will follow.CADD and AIDD have been demonstrated in delivering new molecules (e.g., FIC or BIC) for targets that are often amenable to computational approaches (e.g., molecules with high-quality structures, good drug-targeting properties, well-studied biology, and available training sets). However, the applicability and success of applying machine learning for hit discovery, hit-to-lead compounds, and lead optimization when little information is available about the target - i.e., highly novel targets - remains a mystery. This is precisely the situation involving drug discovery programs beyond traditional small molecule inhibitors.
For example, 3D structure prediction of RNA remains a major challenge despite the progress being made in the field. While AlphaFold3 has shown breakthrough performance in protein prediction, accurate structure prediction of RNA remains difficult due to the fundamental differences between proteins and RNAs, as well as the scarce structural data available as a training set. In fact, targeting small molecules of RNA may be difficult to achieve given the properties of RNA, which is highly polar and dynamically characterized. Scoring functions suitable for small-molecule RNA docking have yet to be improved. Over the past five years, molecular gels, including bifunctional molecules (PROTACs) and monovalent degraders (intrinsic degraders), have attracted increasing attention in small-molecule drug discovery.Cherkasov's group has developed an integrated 3D modeling and deep-learning computational process for automated design of PROTACs.Monte Rosa has developed an AI algorithm named Monte Rosa has developed an AI algorithm called fAIceit, an ultra-fast engine that scans thousands of proteins and identifies proteins with structural characteristics of “degradons” that can be recognized and degraded by E3 ligases. It remains to be seen whether generative AI will be able to deliver promising small molecules targeting RNA and molecular glues in the future.
Looking ahead, CADD and AIDD are promising, and the future may focus on complementing physics-based approaches with AIDD technologies to capitalize on the strengths of both while mitigating the limitations imposed by these computational approaches. Notably, the 2013 Nobel Prize in Chemistry was awarded to Martin Karplus, Michael Levitt, and Arieh Warshel for their contributions to the development of multiscale models of complex chemical systems. This year, Geoffrey Hinton, who is the father of AI, was awarded the 2024 Nobel Prize in Physics, along with John Hopfield, for his work in artificial neural networks. Meanwhile, the 2024 Nobel Prize in Chemistry was awarded to David Baker, Demis Hassabis and John Jumper for their contributions to protein design and the prediction of complex protein structures. It has certainly been a banner year for AI in science! In the coming years, we will witness the “matching” of CADD and AIDD, providing unprecedented opportunities to accelerate drug discovery and development. As technology continues to advance, the integration of these computational tools with traditional research paradigms promises to transform the pharmaceutical field. Continued research investment, collaboration, and addressing existing challenges will be key to unlocking the full potential of computational approaches to small molecule drug discovery.
Post comments