Molecular optimisation represents a critical challenge in fields ranging from drug discovery to materials science, yet current methods often require substantial computational resources for each new molecular structure. Muhammad bin Javaid, Hasham Hussain, and Ashima Khanna, working with colleagues from RWTH Aachen University (Department of Computer Science and Process Systems Engineering), Technical University of Munich (TUM Campus Straubing for Biotechnology and Sustainability), and Dominik G. Grimm, present a novel approach to address this limitation. Their research introduces GRXForm, a system leveraging a pre-trained Graph Transformer model and Group Relative Policy Optimisation, designed to learn a transferable policy for efficient molecular optimisation. This work is significant because it demonstrates a pathway towards amortised optimisation, enabling generalisation to previously unseen molecular scaffolds without requiring time-consuming oracle calls or refinement steps, and achieving competitive multi-objective optimisation performance compared to established instance optimisers.
Scientists have developed a new approach to molecular design, GRXForm, that significantly accelerates the inconsistent difficulty of starting structures, which introduces high variance into the learning process. Empirical results demonstrate that GRXForm successfully generalises to previously unseen molecular scaffolds, achieving performance competitive with leading instance optimisers in multi-objective optimisation scenarios. The ability to bypass time-consuming oracle calls, external scoring functions or simulations used to evaluate molecular properties, represents a substantial leap forward, as current state-of-the-art methods often require thousands of evaluations per design, creating a bottleneck for high-throughput applications. By front-loading the computational effort into an initial training phase, GRXForm enables scalable, high-throughput molecular design without repeated expensive calculations, promising to accelerate discovery in fields ranging from pharmaceutical development to materials science, particularly when utilising high-fidelity oracles such as protein-ligand docking or free energy perturbation calculations. The model receives an initial molecular scaffold as input and progressively builds upon it, guided by a learned policy to enhance desired properties, diverging from conventional “instance optimizers” by aiming for amortized efficiency, learning a generalizable policy applicable to diverse starting structures. To address the challenge of high variance encountered when applying model-based optimisation to molecules, the research team implemented GRPO, which normalizes reward signals relative to the difficulty of the initial structure, effectively calibrating the optimisation process to account for inherent structural complexity. This technique mitigates the tendency for the model to struggle with particularly challenging starting points, promoting more consistent and reliable performance across a range of molecular scaffolds, ensuring improvements are assessed in the context of the molecule’s initial state. Molecular optimisation using GRXForm achieved an objective score of 0.409 ±0.003 alongside a success rate of 0.000 ±0.000 across three independent test folds, calculated as the average of the top completion found for each of the 500 test scaffolds. In contrast, LibINVENT scored 0.371 ±0.004 with a 0.000 ±0.000 success rate, while DrugEx V3 reached 0.358 ±0.003, also failing to generate any successful molecules. Notably, GRXForm outperformed instance optimizers, attaining a 17.8% ±0.093 success rate, while Mol GA and GenMol both registered 0.000 ±0.000 success, indicating GRXForm was able to satisfy strict multi-parameter success criteria where other methods failed. LibINVENT, designed for optimising single series, struggled with diverse scaffolds, and DrugEx v3, despite its transformer architecture, could not transfer optimisation logic to unseen topologies. Instance optimizers also faltered, with Mol GA’s genetic operators frequently disrupting core scaffold topology and GenMol’s fragment remasking process restricted to side chains due to the topological constraints. GRXForm’s ability to learn a conditional policy modelling valid, scaffold-preserving modifications appears to be key to its success, validated by an ablation study. GRXForm-DeNovo, operating without explicit structural conditioning, achieved a near-zero success rate of 0.1% ±0.001. GRXForm-REINFORCE, utilising a global mean, attained a 9.1% ±0.157 success rate, but with substantial variance, while GRXForm-GRPO achieved 17.8% ±0.093 success, confirming that instance-specific reward normalization, as implemented in GRPO, is crucial for stabilising learning across heterogeneous chemical space. Analysis of the advantage signal during training revealed that GRPO’s normalization yielded a stable learning signal, contrasting with the volatile signal observed with the global baseline. The longstanding challenge of designing molecules with specific properties has traditionally resembled sculpting in the dark, with researchers approaching molecular optimisation as a fresh start for each new structure, a computationally expensive process akin to trial and error. GRXForm offers a potential shortcut, leveraging the power of pre-trained artificial intelligence to intelligently modify existing molecular frameworks. What makes this work notable is the demonstration that a single AI model can learn to adapt and improve a diverse range of molecular starting points, fulfilling the promise of ‘amortised’ efficiency in molecular design. Previous attempts have stumbled because of the inherent variability in the difficulty of optimising different molecules, a problem GRXForm tackles by normalising performance relative to the initial structure, effectively calibrating the AI’s expectations and reducing wasted effort. However, the reliance on automated attachment point selection, a workaround for limitations in the underlying GenMol implementation, introduces a degree of randomness that could influence results. While the method demonstrably competes with established ‘instance optimisers’, the extent to which these gains translate to genuinely novel and valuable molecules remains to be seen. Future work will likely focus on refining this process, perhaps by incorporating more sophisticated methods for guiding molecular exploration and validating the resulting designs with experimental data, with the true measure of success being whether GRXForm can accelerate the discovery of new materials and medicines, bridging the gap between computational prediction and real-world impact.
👉 More information
🗞 Amortized Molecular Optimization via Group Relative Policy Optimization
🧠 ArXiv: https://arxiv.org/abs/2602.12162
