-
Ligand-Controlled Phonon Dynamics in CsPbBr3 Nanocrystals Revealed by Machine-Learned Interatomic Potentials
Authors:
Seungjun Cha,
Chen Wang,
Victor Fung,
Guoxiang Hu
Abstract:
Halide perovskite nanocrystals are leading candidates for next-generation optoelectronics, yet the role of surface ligands in controlling their phonon dynamics remains poorly understood. These dynamics critically govern nonradiative relaxation, energy up-conversion, and phonon-assisted anti-Stokes emission. Conventional ab initio methods, while accurate, are computationally infeasible for experime…
▽ More
Halide perovskite nanocrystals are leading candidates for next-generation optoelectronics, yet the role of surface ligands in controlling their phonon dynamics remains poorly understood. These dynamics critically govern nonradiative relaxation, energy up-conversion, and phonon-assisted anti-Stokes emission. Conventional ab initio methods, while accurate, are computationally infeasible for experimentally relevant nanocrystal sizes that require thousands of atoms to capture realistic ligand shells and dynamic disorder at finite temperatures. Here, we introduce a machine- learned interatomic potential fine-tuned on small CsPbBr3 nanocrystals with diverse ligands, enabling accurate prediction of ligand-induced phonon properties far beyond the spatial and temporal scales of ab initio methods. We find that both cationic and anionic ligands systematically redshift Pb-Br-Pb stretching modes while blueshifting the PbBr64- octahedral rotation mode, with stronger overall effects for anionic passivation. Notably, anionic ligands stiffen the rotation mode non-monotonically with respect to the ligand binding energy. Our findings reveal important roles of cationic and anionic ligands in modulating key dynamic modes of halide perovskite nanocrystals associated with detrimental nonradiative losses, offering mechanistic insights and design principles for high-performance perovskite nanocrystal optoelectronics.
△ Less
Submitted 17 March, 2026;
originally announced March 2026.
-
Reasoning-Driven Design of Single Atom Catalysts via a Multi-Agent Large Language Model Framework
Authors:
Dong Hyeon Mok,
Seoin Back,
Victor Fung,
Guoxiang Hu
Abstract:
Large language models (LLMs) are becoming increasingly applied beyond natural language processing, demonstrating strong capabilities in complex scientific tasks that traditionally require human expertise. This progress has extended into materials discovery, where LLMs introduce a new paradigm by leveraging reasoning and in-context learning, capabilities absent from conventional machine learning ap…
▽ More
Large language models (LLMs) are becoming increasingly applied beyond natural language processing, demonstrating strong capabilities in complex scientific tasks that traditionally require human expertise. This progress has extended into materials discovery, where LLMs introduce a new paradigm by leveraging reasoning and in-context learning, capabilities absent from conventional machine learning approaches. Here, we present a Multi-Agent-based Electrocatalyst Search Through Reasoning and Optimization (MAESTRO) framework in which multiple LLMs with specialized roles collaboratively discover high-performance single atom catalysts for the oxygen reduction reaction. Within an autonomous design loop, agents iteratively reason, propose modifications, reflect on results and accumulate design history. Through in-context learning enabled by this iterative process, MAESTRO identified design principles not explicitly encoded in the LLMs' background knowledge and successfully discovered catalysts that break conventional scaling relations between reaction intermediates. These results highlight the potential of multi-agent LLM frameworks as a powerful strategy to generate chemical insight and discover promising catalysts.
△ Less
Submitted 24 February, 2026;
originally announced February 2026.
-
Determining Atomic Structure from Spectroscopy via an Active Learning Framework
Authors:
Ian Slagle,
Faisal Alamgir,
Victor Fung
Abstract:
Determining atomic structure from spectroscopic data is central to materials science but remains restricted to a limited set of techniques and material classes, largely due to the computational cost and complexity of structural refinement. Here we introduce ActiveStructOpt, a general framework that integrates graph neural network surrogate models with active learning to efficiently determine candi…
▽ More
Determining atomic structure from spectroscopic data is central to materials science but remains restricted to a limited set of techniques and material classes, largely due to the computational cost and complexity of structural refinement. Here we introduce ActiveStructOpt, a general framework that integrates graph neural network surrogate models with active learning to efficiently determine candidate structures that reproduce target spectra with minimal computational expenditure. Benchmarking with X-ray pair distribution function data, and with the more computationally demanding simulations of X-ray absorption near-edge spectra (XANES) and extended X-ray absorption fine structure (EXAFS), demonstrate that ActiveStructOpt reliably determines structures that match closely in spectra across diverse materials classes. Under equivalent computational budgets, ActiveStructOpt outperforms existing structure determination methods. By enabling data-efficient, multi-objective structural refinement across a broad range of computable spectroscopic techniques, ActiveStructOpt provides a flexible and extensible approach to atomic structure determination in complex materials.
△ Less
Submitted 24 February, 2026;
originally announced February 2026.
-
Improving Reliability of Machine Learned Interatomic Potentials With Physics-Informed Pretraining
Authors:
Qianyu Zheng,
Victor Fung
Abstract:
Machine learned interatomic potentials (MLIPs) have emerged as powerful tools for molecular dynamics (MD) simulations with their competitive accuracy and computational efficiency. However, MLIPs are often observed to exhibit un-physical behavior when encountering configurations which deviate significantly from their training data distribution, leading to simulation instabilities and unreliable dyn…
▽ More
Machine learned interatomic potentials (MLIPs) have emerged as powerful tools for molecular dynamics (MD) simulations with their competitive accuracy and computational efficiency. However, MLIPs are often observed to exhibit un-physical behavior when encountering configurations which deviate significantly from their training data distribution, leading to simulation instabilities and unreliable dynamics, thus limiting the reliability of MLIPs for materials simulations. We present a physics-informed pretraining strategy that leverages simple physical potentials which can improve the robustness and stability of graph-based MLIPs for MD simulations. We demonstrate this approach by deploying a pretraining-finetuning pipeline where MLIPs are initially pretrained on data labelled with embedded atom model potentials and subsequently finetuned on the quantum mechanical ground truth data. By evaluating across three diverse material systems (phosphorus, silica, and a subset of Materials Project) and three representative MLIP architectures (CGCNN, M3GNet, and TorchMD-NET), we find that this physics-informed pretraining consistently improves both prediction accuracy as well as stability in MD compared to the baselines.
△ Less
Submitted 23 February, 2026;
originally announced February 2026.
-
Scalable Foundation Interatomic Potentials via Message-Passing Pruning and Graph Partitioning
Authors:
Lingyu Kong,
Jaeheon Shim,
Guoxiang Hu,
Victor Fung
Abstract:
Atomistic foundation models (AFMs) have great promise as accurate interatomic potentials, and have enabled data-efficient molecular dynamics simulations with near quantum mechanical accuracy. However, AFMs remain markedly slower at inference and are far more memory-intensive than conventional interatomic potentials, due to the need to capture a wide range of chemical and structural motifs in pre-t…
▽ More
Atomistic foundation models (AFMs) have great promise as accurate interatomic potentials, and have enabled data-efficient molecular dynamics simulations with near quantum mechanical accuracy. However, AFMs remain markedly slower at inference and are far more memory-intensive than conventional interatomic potentials, due to the need to capture a wide range of chemical and structural motifs in pre-training datasets requiring deep, parameter-rich model architectures. These deficiencies currently limit the practical use of AFMs in molecular dynamics (MD) simulations at extended temporal and spatial scales. To address this problem, we propose a general workflow for accelerating and scaling AFMs containing message-passing architectures. We find that removing low-contribution message-passing layers from AFM backbones serves as an effective pruning method, significantly reducing the parameter count while preserving the accuracy and data-efficiency of AFMs. Once pruned, these models become more accessible for large scale simulations via a graph-partitioned, GPU-distributed strategy, which we implement and demonstrate within the AFM fine-tuning platform MatterTune. We show that this approach supports million-atom simulations on both single and multiple GPUs, and enables task-specific large-scale simulations at nanosecond timescales with AFM-level accuracy.
△ Less
Submitted 25 September, 2025;
originally announced September 2025.
-
Facet: highly efficient E(3)-equivariant networks for interatomic potentials
Authors:
Nicholas Miklaucic,
Lai Wei,
Rongzhi Dong,
Nihang Fu,
Sadman Sadeed Omee,
Qingyang Li,
Sourin Dey,
Victor Fung,
Jianjun Hu
Abstract:
Computational materials discovery is limited by the high cost of first-principles calculations. Machine learning (ML) potentials that predict energies from crystal structures are promising, but existing methods face computational bottlenecks. Steerable graph neural networks (GNNs) encode geometry with spherical harmonics, respecting atomic symmetries -- permutation, rotation, and translation -- fo…
▽ More
Computational materials discovery is limited by the high cost of first-principles calculations. Machine learning (ML) potentials that predict energies from crystal structures are promising, but existing methods face computational bottlenecks. Steerable graph neural networks (GNNs) encode geometry with spherical harmonics, respecting atomic symmetries -- permutation, rotation, and translation -- for physically realistic predictions. Yet maintaining equivariance is difficult: activation functions must be modified, and each layer must handle multiple data types for different harmonic orders. We present Facet, a GNN architecture for efficient ML potentials, developed through systematic analysis of steerable GNNs. Our innovations include replacing expensive multi-layer perceptrons (MLPs) for interatomic distances with splines, which match performance while cutting computational and memory demands. We also introduce a general-purpose equivariant layer that mixes node information via spherical grid projection followed by standard MLPs -- faster than tensor products and more expressive than linear or gate layers. On the MPTrj dataset, Facet matches leading models with far fewer parameters and under 10% of their training compute. On a crystal relaxation task, it runs twice as fast as MACE models. We further show SevenNet-0's parameters can be reduced by over 25% with no accuracy loss. These techniques enable more than 10x faster training of large-scale foundation models for ML potentials, potentially reshaping computational materials discovery.
△ Less
Submitted 10 September, 2025;
originally announced September 2025.
-
A Comprehensive Assessment and Benchmark Study of Large Atomistic Foundation Models for Phonons
Authors:
Md Zaibul Anam,
Ogheneyoma Aghoghovbia,
Mohammed Al-Fahdi,
Lingyu Kong,
Victor Fung,
Ming Hu
Abstract:
The rapid development of universal machine learning potentials (uMLPs) has enabled efficient, accurate predictions of diverse material properties across broad chemical spaces. While their capability for modeling phonon properties is emerging, systematic benchmarking across chemically diverse systems remains limited. We evaluate six recent uMLPs (EquiformerV2, MatterSim, MACE, and CHGNet) on 2,429…
▽ More
The rapid development of universal machine learning potentials (uMLPs) has enabled efficient, accurate predictions of diverse material properties across broad chemical spaces. While their capability for modeling phonon properties is emerging, systematic benchmarking across chemically diverse systems remains limited. We evaluate six recent uMLPs (EquiformerV2, MatterSim, MACE, and CHGNet) on 2,429 crystalline materials from the Open Quantum Materials Database. Models were used to compute atomic forces in displaced supercells, derive interatomic force constants (IFCs), and predict phonon properties including lattice thermal conductivity (LTC), compared with density functional theory (DFT) and experimental data. The EquiformerV2 pretrained model trained on the OMat24 dataset exhibits strong performance in predicting atomic forces and third-order IFC, while its fine-tuned counterpart consistently outperforms other models in predicting second-order IFC, LTC, and other phonon properties. Although MACE and CHGNet demonstrated comparable force prediction accuracy to EquiformerV2, notable discrepancies in IFC fitting led to poor LTC predictions. Conversely, MatterSim, despite lower force accuracy, achieved intermediate IFC predictions, suggesting error cancellation and complex relationships between force accuracy and phonon predictions. This benchmark guides the evaluation and selection of uMLPs for high-throughput screening of materials with targeted thermal transport properties.
△ Less
Submitted 3 September, 2025;
originally announced September 2025.
-
MatterTune: An Integrated, User-Friendly Platform for Fine-Tuning Atomistic Foundation Models to Accelerate Materials Simulation and Discovery
Authors:
Lingyu Kong,
Nima Shoghi,
Guoxiang Hu,
Pan Li,
Victor Fung
Abstract:
Geometric machine learning models such as graph neural networks have achieved remarkable success in recent years in chemical and materials science research for applications such as high-throughput virtual screening and atomistic simulations. The success of these models can be attributed to their ability to effectively learn latent representations of atomic structures directly from the training dat…
▽ More
Geometric machine learning models such as graph neural networks have achieved remarkable success in recent years in chemical and materials science research for applications such as high-throughput virtual screening and atomistic simulations. The success of these models can be attributed to their ability to effectively learn latent representations of atomic structures directly from the training data. Conversely, this also results in high data requirements for these models, hindering their application to problems which are data sparse which are common in this domain. To address this limitation, there is a growing development in the area of pre-trained machine learning models which have learned general, fundamental, geometric relationships in atomistic data, and which can then be fine-tuned to much smaller application-specific datasets. In particular, models which are pre-trained on diverse, large-scale atomistic datasets have shown impressive generalizability and flexibility to downstream applications, and are increasingly referred to as atomistic foundation models. To leverage the untapped potential of these foundation models, we introduce MatterTune, a modular and extensible framework that provides advanced fine-tuning capabilities and seamless integration of atomistic foundation models into downstream materials informatics and simulation workflows, thereby lowering the barriers to adoption and facilitating diverse applications in materials science. In its current state, MatterTune supports a number of state-of-the-art foundation models such as ORB, MatterSim, JMP, and EquformerV2, and hosts a wide range of features including a modular and flexible design, distributed and customizable fine-tuning, broad support for downstream informatics tasks, and more.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
Electronic Structure Guided Inverse Design Using Generative Models
Authors:
Shuyi Jia,
Panchapakesan Ganesh,
Victor Fung
Abstract:
The electronic structure of a material fundamentally determines its underlying physical, and by extension, its functional properties. Consequently, the ability to identify or generate materials with desired electronic properties would enable the design of tailored functional materials. Traditional approaches relying on human intuition or exhaustive computational screening of known materials remain…
▽ More
The electronic structure of a material fundamentally determines its underlying physical, and by extension, its functional properties. Consequently, the ability to identify or generate materials with desired electronic properties would enable the design of tailored functional materials. Traditional approaches relying on human intuition or exhaustive computational screening of known materials remain inefficient and resource-prohibitive for this task. Here, we introduce DOSMatGen, the first instance of a machine learning method which generates crystal structures that match a given desired electronic density of states. DOSMatGen is an E(3)-equivariant joint diffusion framework, and utilizes classifier-free guidance to accurately condition the generated materials on the density of states. Our experiments find this approach can successfully yield materials which are both stable and match closely with the desired density of states. Furthermore, this method is highly flexible and allows for finely controlled generation which can target specific templates or even individual sites within a material. This method enables a more physics-driven approach to designing new materials for applications including catalysts, photovoltaics, and superconductors.
△ Less
Submitted 8 April, 2025;
originally announced April 2025.
-
Pre-training Graph Neural Networks with Structural Fingerprints for Materials Discovery
Authors:
Shuyi Jia,
Shitij Govil,
Manav Ramprasad,
Victor Fung
Abstract:
In recent years, pre-trained graph neural networks (GNNs) have been developed as general models which can be effectively fine-tuned for various potential downstream tasks in materials science, and have shown significant improvements in accuracy and data efficiency. The most widely used pre-training methods currently involve either supervised training to fit a general force field or self-supervised…
▽ More
In recent years, pre-trained graph neural networks (GNNs) have been developed as general models which can be effectively fine-tuned for various potential downstream tasks in materials science, and have shown significant improvements in accuracy and data efficiency. The most widely used pre-training methods currently involve either supervised training to fit a general force field or self-supervised training by denoising atomic structures equilibrium. Both methods require datasets generated from quantum mechanical calculations, which quickly become intractable when scaling to larger datasets. Here we propose a novel pre-training objective which instead uses cheaply-computed structural fingerprints as targets while maintaining comparable performance across a range of different structural descriptors. Our experiments show this approach can act as a general strategy for pre-training GNNs with application towards large scale foundational models for atomistic data.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
Representation-space diffusion models for generating periodic materials
Authors:
Anshuman Sinha,
Shuyi Jia,
Victor Fung
Abstract:
Generative models hold the promise of significantly expediting the materials design process when compared to traditional human-guided or rule-based methodologies. However, effectively generating high-quality periodic structures of materials on limited but diverse datasets remains an ongoing challenge. Here we propose a novel approach for periodic structure generation which fully respect the intrin…
▽ More
Generative models hold the promise of significantly expediting the materials design process when compared to traditional human-guided or rule-based methodologies. However, effectively generating high-quality periodic structures of materials on limited but diverse datasets remains an ongoing challenge. Here we propose a novel approach for periodic structure generation which fully respect the intrinsic symmetries, periodicity, and invariances of the structure space. Namely, we utilize differentiable, physics-based, structural descriptors which can describe periodic systems and satisfy the necessary invariances, in conjunction with a denoising diffusion model which generates new materials within this descriptor or representation space. Reconstruction is then performed on these representations using gradient-based optimization to recover the corresponding Cartesian positions of the crystal structure. This approach differs significantly from current methods by generating materials in the representation space, rather than in the Cartesian space, which is made possible using an efficient reconstruction algorithm. Consequently, known issues with respecting periodic boundaries and translational and rotational invariances during generation can be avoided, and the model training process can be greatly simplified. We show this approach is able to provide competitive performance on established benchmarks compared to current state-of-the-art methods.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
LLMatDesign: Autonomous Materials Discovery with Large Language Models
Authors:
Shuyi Jia,
Chao Zhang,
Victor Fung
Abstract:
Discovering new materials can have significant scientific and technological implications but remains a challenging problem today due to the enormity of the chemical space. Recent advances in machine learning have enabled data-driven methods to rapidly screen or generate promising materials, but these methods still depend heavily on very large quantities of training data and often lack the flexibil…
▽ More
Discovering new materials can have significant scientific and technological implications but remains a challenging problem today due to the enormity of the chemical space. Recent advances in machine learning have enabled data-driven methods to rapidly screen or generate promising materials, but these methods still depend heavily on very large quantities of training data and often lack the flexibility and chemical understanding often desired in materials discovery. We introduce LLMatDesign, a novel language-based framework for interpretable materials design powered by large language models (LLMs). LLMatDesign utilizes LLM agents to translate human instructions, apply modifications to materials, and evaluate outcomes using provided tools. By incorporating self-reflection on its previous decisions, LLMatDesign adapts rapidly to new tasks and conditions in a zero-shot manner. A systematic evaluation of LLMatDesign on several materials design tasks, in silico, validates LLMatDesign's effectiveness in developing new materials with user-defined target properties in the small data regime. Our framework demonstrates the remarkable potential of autonomous LLM-guided materials discovery in the computational setting and towards self-driving laboratories in the future.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Atomic structure generation from reconstructing structural fingerprints
Authors:
Victor Fung,
Shuyi Jia,
Jiaxin Zhang,
Sirui Bi,
Junqi Yin,
P. Ganesh
Abstract:
Data-driven machine learning methods have the potential to dramatically accelerate the rate of materials design over conventional human-guided approaches. These methods would help identify or, in the case of generative models, even create novel crystal structures of materials with a set of specified functional properties to then be synthesized or isolated in the laboratory. For crystal structure g…
▽ More
Data-driven machine learning methods have the potential to dramatically accelerate the rate of materials design over conventional human-guided approaches. These methods would help identify or, in the case of generative models, even create novel crystal structures of materials with a set of specified functional properties to then be synthesized or isolated in the laboratory. For crystal structure generation, a key bottleneck lies in developing suitable atomic structure fingerprints or representations for the machine learning model, analogous to the graph-based or SMILES representations used in molecular generation. However, finding data-efficient representations that are invariant to translations, rotations, and permutations, while remaining invertible to the Cartesian atomic coordinates remains an ongoing challenge. Here, we propose an alternative approach to this problem by taking existing non-invertible representations with the desired invariances and developing an algorithm to reconstruct the atomic coordinates through gradient-based optimization using automatic differentiation. This can then be coupled to a generative machine learning model which generates new materials within the representation space, rather than in the data-inefficient Cartesian space. In this work, we implement this end-to-end structure generation approach using atom-centered symmetry functions as the representation and conditional variational autoencoders as the generative model. We are able to successfully generate novel and valid atomic structures of sub-nanometer Pt nanoparticles as a proof of concept. Furthermore, this method can be readily extended to any suitable structural representation, thereby providing a powerful, generalizable framework towards structure-based generation.
△ Less
Submitted 26 July, 2022;
originally announced July 2022.
-
Inverse design of two-dimensional materials with invertible neural networks
Authors:
Victor Fung,
Jiaxin Zhang,
Guoxiang Hu,
P. Ganesh,
Bobby G. Sumpter
Abstract:
The ability to readily design novel materials with chosen functional properties on-demand represents a next frontier in materials discovery. However, thoroughly and efficiently sampling the entire design space in a computationally tractable manner remains a highly challenging task. To tackle this problem, we propose an inverse design framework (MatDesINNe) utilizing invertible neural networks whic…
▽ More
The ability to readily design novel materials with chosen functional properties on-demand represents a next frontier in materials discovery. However, thoroughly and efficiently sampling the entire design space in a computationally tractable manner remains a highly challenging task. To tackle this problem, we propose an inverse design framework (MatDesINNe) utilizing invertible neural networks which can map both forward and reverse processes between the design space and target property. This approach can be used to generate materials candidates for a designated property, thereby satisfying the highly sought-after goal of inverse design. We then apply this framework to the task of band gap engineering in two-dimensional materials, starting with MoS2. Within the design space encompassing six degrees of freedom in applied tensile, compressive and shear strain plus an external electric field, we show the framework can generate novel, high fidelity, and diverse candidates with near-chemical accuracy. We extend this generative capability further to provide insights regarding metal-insulator transition, important for memristive neuromorphic applications among others, in MoS2 which is not otherwise possible with brute force screening. This approach is general and can be directly extended to other materials and their corresponding design spaces and target properties.
△ Less
Submitted 5 June, 2021;
originally announced June 2021.