Saturday, 20 June 2026 PDT | 09:34 AM
The 1 News Alt Logo Text Smart News for Global Indians

Bridging three

AI News June 11, 2026 06:00 PM
Bridging three

Large language models have revolutionized many fields by learning from sequential data, yet their application to three-dimensional molecular modelling has been hindered by the lack of effective token-based representations of molecular conformations. Here we present ConfSeq, a conformation description language that addresses this gap by encoding three-dimensional molecular structures into discrete token sequences. ConfSeq integrates molecular SMILES with internal coordinates, including dihedral angles, bond angles and a pseudo-chirality descriptor, thereby ensuring SE(3) invariance and preserving the conciseness and human readability inherent to SMILES. By reformulating core three-dimensional molecular modelling tasks (including conformation prediction, de novo generation and representation learning) as sequence modelling problems, ConfSeq enables standard transformer architectures to achieve state-of-the-art performance across diverse benchmarks. Furthermore, ConfSeq enabled the discovery of multiple novel stimulator of interferon gene (STING) inhibitors and ALDH1B1 inhibitors with half-maximal inhibitory concentrations of 0.338–3.51 μM. Collectively, these findings establish ConfSeq as a robust framework for extending large language models in three-dimensional molecular modelling.

This is a preview of subscription content, access via your institution

Prices may be subject to local taxes which are calculated during checkout

GEOM-Drugs, MOSES, QMugs, DUD-E and PCBA are publicly available datasets. The GEOM-Drugs subsets used for the conformer prediction and unconditional molecular generation tasks were obtained from the following GitHub repositories: SDEGen (https://github.com/OdinZhang/SDEGen) and e3_diffusion_for_molecules (https://github.com/ehoogeboom/e3_diffusion_for_molecules). The QMugs dataset is available at https://doi.org/10.3929/ethz-b-000482129. For the shape-conditioned 3D molecular generation task, the 3D conformations of molecules in MOSES were obtained from the DiffSMol repository available via GitHub (https://github.com/ninglab/DiffSMol). The DUD-E and PCBA benchmarks were obtained from https://dude.docking.org/ and https://drugdesign.unistra.fr/LIT-PCBA/, respectively. All processed datasets used in this work are available via GitHub at https://github.com/jiachengxiong/ConfSeq and via Zenodo at https://doi.org/10.5281/zenodo.19706011 (ref. 81). Source data are provided with this paper.

All code for ConfSeq, as well as the code for model training and inference, is available via GitHub at https://github.com/jiachengxiong/ConfSeq and via Zenodo at https://doi.org/10.5281/zenodo.19706011 (ref. 81).

Xu, Y. et al. Artificial intelligence: a powerful paradigm for scientific research. Innovation 2, 100179 (2021).

Wang, H., Li, J., Wu, H., Hovy, E. & Sun, Y. Pre-trained language models and their applications. Engineering 25, 51–65 (2023).

Chang, Y. et al. A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol. 15, 39 (2024).

Burton, J. W. et al. How large language models can reshape collective intelligence. Nat. Hum. Behav. 8, 1643–1655 (2024).

Demszky, D. et al. Using large language models in psychology. Nat. Rev. Psychol. 2, 688–701 (2023).

OpenAI. GPT-4 technical report (2023); https://cdn.openai.com/papers/gpt-4.pdf

Google Team et al. Gemini: a family of highly capable multimodal models. Preprint at https://arxiv.org/abs/2312.11805 (2023).

Hayes, T. et al. Simulating 500 million years of evolution with a language model. Science 387, 850–858 (2025).

Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).

Article MathSciNet Google Scholar

Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).

Dalla-Torre, H. et al. Nucleotide Transformer: building and evaluating robust foundation models for human genomics. Nat. Methods 22, 287–297 (2025).

Chen, B. et al. xTrimoPGLM: unified 100-billion-parameter pretrained transformer for deciphering the language of proteins. Nat. Methods 22, 1028–1039 (2025).

Nguyen, E. et al. Sequence modeling and design from molecular to genome scale with Evo. Science 386, eado9336 (2024).

Lu, J. & Zhang, Y. Unified deep learning model for multitask reaction predictions with explanation. J. Chem. Inf. Model. 62, 1376–1387 (2022).

Born, J. & Manica, M. Regression Transformer enables concurrent sequence regression and generation for molecular language modelling. Nat. Mach. Intell. 5, 432–444 (2023).

Chen, S. & Zhong, F. GPCRSPACE: a new GPCR real expanded library based on large language models architecture and positive sample machine learning strategies. J. Med. Chem. 67, 16912–16922 (2024).

Zhong, Z. et al. Root-aligned SMILES: a tight representation for chemical reaction prediction. Chem. Sci. 13, 9023–9034 (2022).

Xiong, J. et al. αExtractor: a system for automatic extraction of chemical information from biomedical literature. Sci. China Life Sci. 67, 618–621 (2024).

Bagal, V., Aggarwal, R., Vinod, P. K. & Priyakumar, U. D. MolGPT: molecular generation using a transformer-decoder model. J. Chem. Inf. Model. 62, 2064–2076 (2022).

Ross, J. et al. Large-scale chemical language representations capture molecular structure and properties. Nat. Mach. Intell. 4, 1256–1264 (2022).

Xiong, J. et al. Bridging chemistry and artificial intelligence by a reaction description language. Nat. Mach. Intell. 7, 782–793 (2025).

Xu, F. et al. Toward a unified benchmark and framework for deep learning-based prediction of nuclear magnetic resonance chemical shifts. Nat. Comput. Sci. 5, 292–300 (2025).

Nippa, D. F. et al. Enabling late-stage drug diversification by high-throughput experimentation with geometric deep learning. Nat. Chem. 16, 239–248 (2024).

Luo, Y., Liu, Y. & Peng, J. Calibrated geometric deep learning improves kinase–drug binding predictions. Nat. Mach. Intell. 5, 1390–1401 (2023).

Zhou, G. et al. Uni-Mol: a universal 3D molecular representation learning framework. In Proc. Eleventh International Conference on Learning Representations (ICLR, 2023).

Li, S. et al. Towards 3D molecule-text interpretation in language models. In Proc. Twelfth International Conference on Learning Representations (ICLR, 2024).

Jing, B., Corso, G., Chang, J., Barzilay, R. & Jaakkola, T. S. Torsional diffusion for molecular conformer generation. Adv. Neural Inf. Process. Syst. 35, 24240–24253 (2022).

Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. S. DiffDock: diffusion steps, twists, and turns for molecular docking. In Proc. Eleventh International Conference on Learning Representations (ICLR, 2023).

Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).

Alakhdar, A., Poczos, B. & Washburn, N. Diffusion models in de novo drug design. J. Chem. Inf. Model. 64, 7238–7256 (2024).

Feng, W. et al. Generation of 3D molecules in pockets via a language model. Nat. Mach. Intell. 6, 62–73 (2024).

Wang, J. et al. 3DSMILES-GPT: 3D molecular pocket-based generation with token-only large language model. Chem. Sci. 16, 637–648 (2025).

Zholus, A. et al. BindGPT: a scalable framework for 3D molecular design via language modeling and reinforcement learning. In Proc. AAAI Conference on Artificial Intelligence Vol 39, 26083–26091 (AAAI Press, 2025).

Qian, J., Wang, H., Li, Z., Li, S. & Yan, X. Limitations of language models in arithmetic and symbolic induction. In Proc. 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Rogers, A. et al.) 9285–9298 (Association for Computational Linguistics, 2023).

Zhang, W. et al. Interpreting and improving large language models in arithmetic calculation. In Proc. 41st International Conference on Machine Learning (eds Salakhutdinov, R. et al.) Vol 235, 59932–59950 (PMLR, 2024).

Ganea, O. et al. GeoMol: torsional geometric generation of molecular 3D conformer ensembles. Adv. Neural Inf. Process. Syst. 34, 13757–13769 (2021).

Xu, M. et al. GeoDiff: a geometric diffusion model for molecular conformation generation. In Proc. Tenth International Conference on Learning Representations (ICLR, 2022).

Zhang, Z. et al. Tora3D: an autoregressive torsion angle prediction model for molecular 3D conformation generation. J. Cheminform. 15, 57 (2023).

Axelrod, S. & Gómez-Bombarelli, R. GEOM, energy-annotated molecular conformations for property prediction and molecular generation. Sci. Data 9, 185 (2022).

Isert, C., Atz, K., Jiménez-Luna, J. & Schneider, G. QMugs, quantum mechanical properties of drug-like molecules. Sci. Data 9, 273 (2022).

Zhu, J. et al. Direct molecular conformation generation. Trans. Mach. Learn. Res. https://openreview.net/forum?id=lCPOHiztuw (2022).

Wang, D., Dong, X., Zhang, X. & Hu, L. GADIFF: a transferable graph attention diffusion model for generating molecular conformations. Brief. Bioinform. 26, bbae676 (2024).

Preuer, K., Renz, P., Unterthiner, T., Hochreiter, S. & Klambauer, G. Fréchet ChemNet distance: a metric for generative models for molecules in drug discovery. J. Chem. Inf. Model. 58, 1736–1741 (2018).

Buttenschoen, M., Morris, G. M. & Deane, C. M. PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences. Chem. Sci. 15, 3130–3139 (2024).

Axen, S. D. et al. A simple representation of three-dimensional molecular structure. J. Med. Chem. 60, 7393–7409 (2017).

Morehead, A. & Cheng, J. Geometry-complete diffusion for 3D molecule generation and optimization. Commun. Chem. 7, 150 (2024).

Zdrazil, B. et al. The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res. 52, D1180–D1192 (2024).

Zhang, Z., Yang, L. & Xiang, Z. RISurConv: rotation invariant surface attention-augmented convolutions for 3D point cloud classification and segmentation. In Computer Vision—ECCV 2024: 18th European Conference Part XXVIII (eds Leonardis, A. et al.) Vol 15086, 93–109 (Springer, 2024).

Chen, Z., Peng, B., Zhai, T., Adu-Ampratwum, D. & Ning, X. Generating 3D small binding molecules using shape-conditioned diffusion models with guidance. Nat. Mach. Intell. 7, 758–770 (2025).

Adams, K. & Coley, C. W. Equivariant shape-conditioned generation of 3D molecules for ligand-based drug design. In Proc. Eleventh International Conference on Learning Representations (ICLR, 2023).

Liu, T. et al. BindingDB in 2024: a FAIR knowledgebase of protein-small molecule binding data. Nucleic Acids Res. 53, D1633–D1644 (2025).

Goodsell, D. S. et al. RCSB Protein Data Bank: enabling biomedical research and drug discovery. Protein Sci. 29, 52–65 (2020).

Székely, G. J., Rizzo, M. L. & Bakirov, N. K. Measuring and testing dependence by correlation of distances. Ann. Stat. 35, 2769–2794 (2007).

Mysinger, M. M., Carchia, M., Irwin, J. J. & Shoichet, B. K. Directory of Useful Decoys, Enhanced (DUD-E): better ligands and decoys for better benchmarking. J. Med. Chem. 55, 6582–6594 (2012).

Tran-Nguyen, V.-K., Jacquemard, C. & Rognan, D. LIT-PCBA: an unbiased data set for machine learning and virtual screening. J. Chem. Inf. Model. 60, 4263–4273 (2020).

Devinyak, O., Havrylyuk, D. & Lesyk, R. 3D-MoRSE descriptors explained. J. Mol. Graph. Model. 54, 194–203 (2014).

Hu, J., Liu, Z., Yu, D.-J. & Zhang, Y. LS-align: an atom-level, flexible ligand structural alignment algorithm for high-throughput virtual screening. Bioinformatics 34, 2209–2218 (2018).

Liu, X., Jiang, H. & Li, H. SHAFTS: a hybrid approach for 3D molecular similarity calculation. 1. Method and assessment of virtual screening. J. Chem. Inf. Model. 51, 2372–2385 (2011).

Irwin, J. J. et al. ZINC20—a free ultralarge-scale chemical database for ligand discovery. J. Chem. Inf. Model. 60, 6065–6073 (2020).

Kim, S. et al. PubChem 2025 update. Nucleic Acids Res. 53, D1516–D1525 (2025).

Decout, A., Katz, J. D., Venkatraman, S. & Ablasser, A. The cGAS–STING pathway as a therapeutic target in inflammatory diseases. Nat. Rev. Immunol. 21, 548–569 (2021).

Feng, Z. et al. Targeting colorectal cancer with small-molecule inhibitors of ALDH1B1. Nat. Chem. Biol. 18, 1065–1075 (2022).

Zhang, H. et al. SDEGen: learning to evolve molecular conformations from thermodynamic noise for conformation generation. Chem. Sci. 14, 1557–1568 (2023).

Hoogeboom, E., Satorras, V. G., Vignac, C. & Welling, M. Equivariant diffusion for molecule generation in 3D. In Proc. 39th International Conference on Machine Learning (eds Chaudhuri, K. et al.) Vol 162, 8867–8887 (PMLR, 2022).

Wang, Y. et al. A workflow to create a high-quality protein–ligand binding dataset for training, validation, and prediction tasks. Digit. Discov. 4, 1209–1220 (2025).

The UniProt Consortium et al. UniProt: the Universal Protein Knowledgebase in 2025. Nucleic Acids Res. 53, D609–D617 (2025).

Jiang, Z., Xu, J., Yan, A. & Wang, L. A comprehensive comparative assessment of 3D molecular similarity tools in ligand-based virtual screening. Brief. Bioinform. 22, bbab231 (2021).

Wójcikowski, M., Zielenkiewicz, P. & Siedlecki, P. Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field. J. Cheminform. 7, 26 (2015).

Fan, Z., Yang, Y., Xu, M. & Chen, H. EC-Conf: a ultra-fast diffusion model for molecular conformation generation with equivariant consistency. J. Cheminform. 16, 107 (2024).

Xu, M., Luo, S., Bengio, Y., Peng, J. & Tang, J. Learning neural generative dynamics for molecular conformation generation. In Proc. Ninth International Conference on Learning Representations (ICLR, 2021).

Xu, M. et al. An end-to-end framework for molecular conformation generation via bilevel programming. In Proc. 38th International Conference on Machine Learning (eds Meila, M. et al.) Vol 139, 11537–11547 (PMLR, 2021).

Shi, C., Luo, S., Xu, M. & Tang, J. Learning gradient fields for molecular conformation generation. In Proc. 38th International Conference on Machine Learning (eds Meila, M. et al.) Vol 139, 9558–9568 (PMLR, 2021).

Wang, L. et al. Regularized molecular conformation fields. Adv. Neural Inf. Process. Syst. 35, 18929–18941 (2022).

Luo, S., Shi, C., Xu, M. & Tang, J. Predicting molecular conformation via dynamic graph score matching. Adv. Neural Inf. Process. Syst. 34, 19784–19795 (2021).

Vieira Wyzykowski, A. B., Niazi, F. F. & Dickson, A. AGDIFF: attention-enhanced diffusion for molecular geometry prediction. J. Chem. Inf. Model. 65, 1798–1811 (2025).

Lee, D., Lee, D., Bang, D. & Kim, S. DiSCO: diffusion Schrödinger bridge for molecular conformer optimization. In Proc. AAAI Conference on Artificial Intelligence Vol 38, 13365–13373 (AAAI Press, 2024).

Xu, M., Powers, A. S., Dror, R. O., Ermon, S. & Leskovec, J. Geometric latent diffusion models for 3D molecule generation. In Proc. 40th International Conference on Machine Learning (eds Krause, A. et al.) Vol 202, 38592–38610 (PMLR, 2023).

Feng, S. et al. UniGEM: a unified approach to generation and property prediction for molecules. In Proc. Thirteenth International Conference on Learning Representations (ICLR, 2025).

Vainio, M. J., Puranen, J. S. & Johnson, M. S. ShaEP: molecular overlay based on shape and electrostatic potential. J. Chem. Inf. Model. 49, 492–502 (2009).

Sastry, G. M., Dixon, S. L. & Sherman, W. Rapid shape-based ligand alignment and virtual screening method based on atom/feature-pair similarities and volume overlap scoring. J. Chem. Inf. Model. 51, 2455–2466 (2011).

Xiong, J. & Oopstom. jiachengxiong/ConfSeq: 1.1. Zenodo https://doi.org/10.5281/zenodo.19706011 (2026).

We gratefully acknowledge financial support from the National Natural Science Foundation of China (grant numbers T2225002 and 82273855 to M.Z. and 82474143 to S.Z.), National Key Research and Development Program of China (grant numbers 2022YFC3400504 and 2023YFC2305904 to M.Z.), Strategic Priority Research Program of the Chinese Academy of Sciences (grant numbers XDB0830200 to M.Z. and XDB1260301 to S.Z.), open fund of State Key Laboratory of Pharmaceutical Biotechnology, Nanjing University, China (grant number KF-202301 to M.Z.), Youth Innovation Promotion Association CAS (grant number 2023296 to S.Z.), Shanghai Post-doctoral Excellence Program (grant number 2024707 to J.X.) and Postdoctoral Fellowship Program of CPSF (grant number GZB20250838 to J.X.).

These authors contributed equally: Jiacheng Xiong, Yuqi Shi, Min Wu.

Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China

Jiacheng Xiong, Yuqi Shi, Min Wu, Panpan Shao, Zhaokun Wang, Wei Zhang, Zhiyi Chen, Chuanlong Zeng, Xun Jiang, Duanhua Cao, Sulin Zhang & Mingyue Zheng

University of Chinese Academy of Sciences, Beijing, China

Yuqi Shi, Zhaokun Wang, Wei Zhang, Zhiyi Chen, Chuanlong Zeng, Xun Jiang, Sulin Zhang & Mingyue Zheng

School of Pharmacy, Nanchang University, Nanchang, China

Nanjing University of Chinese Medicine, Nanjing, China

Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden

ProtonUnfold Technology Co. Ltd, Suzhou, China

iHuman Institute, ShanghaiTech University, Shanghai, China

Search author on:PubMed Google Scholar

J.X. proposed the idea, and together with Y.S., conducted the computational experiments and drafted the initial paper. S.Z., M.W., P.S. and Z.W. conducted the biological experiments. W.Z. and R.Z. participated in the analysis of results. Z.C., C.Z. and X.J. contributed to the case analysis. W.Z., D.C., Z.X., Z.F. and M.Z. helped check and improve the paper. M.Z. led the project and designed the study. All authors read and approved the final paper.

Correspondence to Sulin Zhang or Mingyue Zheng.

The authors declare no competing interests.

Nature Machine Intelligence thanks Kenneth Atz and Jannis Born for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Sections A–E, Figs. 1–44, Tables 1–8, Discussion and Methods.

Source data for Supplementary Fig. 27.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Xiong, J., Shi, Y., Wu, M. et al. Bridging three-dimensional molecular structures and artificial intelligence with a conformation description language. Nat Mach Intell (2026). https://doi.org/10.1038/s42256-026-01250-8

Version of record: 11 June 2026

DOI: https://doi.org/10.1038/s42256-026-01250-8