Scientists are tackling the persistent challenge of accurately predicting high-order crystal tensor properties directly from material structures. Dian Jin, Yancheng Yuan, and Xiaoming Tao, all from The Hong Kong Polytechnic University, present a novel approach called the Cartesian Environment Interaction Tensor Network (CEITNet) that addresses limitations in existing spherical-harmonic equivariant models. CEITNet constructs a multi-channel Cartesian local environment tensor and utilises learnable channel-space interactions to efficiently build high-order tensors, significantly reducing computational cost and memory requirements. Their method demonstrably outperforms previous techniques on benchmark datasets for dielectric, piezoelectric, and elastic tensor prediction, representing a substantial advance in materials discovery and design.
By performing learning in channel space and using Cartesian tensor bases to assemble equivariant outputs, CEITNet enables efficient construction of high-order tensors. Across benchmark datasets for order-2 dielectric, order-3 piezoelectric, and order-4 elastic tensor prediction, CEITNet surpasses prior high-order prediction methods on key accuracy criteria while offering high computational efficiency.
Computational challenges in first principles tensor property prediction
Scientists are fundamental in advancing many technological fields, leading to significant innovations in device development. As a result, accurate prediction of crystal tensor properties has attracted increasing attention from the research community. First principles Methods, such as density functional theory (DFT), have facilitated the prediction of various properties within an acceptable error margin compared to traditional laboratory experiments.
However, density functional theory calculations can be computationally demanding, especially when evaluating high-order tensor properties of large crystalline systems, as they require iterative self-consistent optimization of both electronic states and atomic configurations with explicit wave-function descriptions Yan et al. [2024a]. Their computational complexity scales cubically or even more steeply with the number of atoms in the system Wu et al., Pathrudkar et al.
Consequently, this leads to difficulties in high-order tensor prediction, limiting the development of related fields. On a broader level, the energy consumption and carbon footprint associated with large-scale computational chemistry and material simulations are garnering increasing attention Schilter et al.
Therefore, developing data-driven prediction Methods that balance accuracy with efficiency is imperative for accelerating material screening and discovery. A central challenge in applying deep learning to tensor prediction lies in satisfying geometric equivariance. Unlike predicting scalar properties like total energy or formation energy Lupo Pasini et al., Davariashtiyani and Kadkhodaei, tensor properties exhibit strict directional dependency: when a crystal structure experiences rotation in 3D space, tensor properties must transform according to the coordinate transformation rules Grisafi et al.
Neglecting this inductive bias leads to physically inconsistent predictions and degraded sample efficiency and generalization capabilities Wen et al., hindering the model’s ability to learn correct physical laws from limited, expensive DFT data. Currently, the dominant approach for achieving this geometric equivariance in deep learning community involves projecting geometric information onto irreducible representations of the rotation group using spherical harmonics, and then using Clebsch, Gordan (CG) tensor products to combine those irreps in an equivariant manner Batatia et al., Villar et al., Geiger and Smidt, Heilman et al.
While this paradigm shows great performance and physical consistency, it comes with a distinct cost: the tensor products and coupling of high-order representations are computationally demanding Passaro and Zitnick. To mitigate the computational and memory burden, two alternative directions have emerged.
The first approach utilizes canonicalization Kaba et al., Hua et al., which maps the structure to a defined and unique canonical frame before employing a non-equivariant network for prediction, subsequently rotating the result back to the original coordinate system. While offering potential engineering speedups, canonicalization mappings may introduce discontinuities and numerical instabilities in degenerate or near-degenerate symmetry scenarios Dym et al.
Although continuous canonicalizations can be constructed in certain restricted scenarios, they are highly dependent on the alignment, thereby potentially compromising training stability and generalization when misaligned. The sec-ond approach involves Cartesian tensors, and replaces spherical tensor products with Cartesian tensor contractions and multiplications, and representative works have demonstrated the feasibility of this paradigm in molecular and tensor prediction tasks Simeon and De Fabritiis, Wang et al. [2024a].
However, existing Methods often rely on stacking deeper tensor message-passing lay-ers or explicitly enumerating angular information Zhong et al., Choudhary and DeCost, leads to high computational cost and limited scalability. CEITNet constructs a multi-channel local environment by aggregating neighbor information on weighted Cartesian bases, where each channel encodes a distinct directional mode of a high-order geometric basis.
It then employs a learnable channel-interaction matrix that couples environment channels, enabling flexible many-body mixing while maintaining computational efficiency. The atomic semantics and the associated coefficients are produced by an invariant message-passing network. This decoupled design improves efficiency by avoiding the propagation of high-order features.
Experiments on multiple high-order tensor prediction benchmarks (order-2 dielectric tensor, order-3 piezoelectric tensor, and order-4 elastic tensor) demonstrate that CEITNet achieves state-of-the-art accuracy on key accuracy criteria while remaining computationally efficient. Preliminaries. A crystal structure is defined by a unit cell containing a set of atoms, which repeats infinitely in three-dimensional space along three periodic lattice vectors.
It can be mathematically represented by the lattice matrix L = [l1, l2, l3] ∈R3×3, which can be constructed from the lattice parameters (a, b, c, α, β, γ), together with the atomic sites {(Zi, xi)}n i=1, where Zi denotes the atomic number and xi = L [xi, yi, zi]⊤∈R3 gives the Cartesian coordinates of the i-th atom within the unit cell Wang et al. [2024b], Hua and Lin. Periodic graph.
We construct a periodic crystal graph G with a cutoff radius rcut from lattice parameters and n atomic sites in the unit cell. The node set V corresponds to the atoms in the unit cell, where each node carries invariant atomic attributes (Zi). For each ordered pair (j, i), if there exists a periodic image of atom j whose distance to atom i is within rcut, we add a directed edge (j →i) ∈E, i.e., messages are passed from atom j to atom i Yan et al.
For each edge (j →i), we associate the displacement vector rij ∈R3, its length dij = ∥rij∥, and the direction encoding nij = rij/∥rij∥, which will be used to construct Cartesian geometric bases in the tensor head. Note that dij is invariant, whereas rij (and nij) is equivariant. High-order Tensor Prediction Problem statement.
Input a periodic crystal graph G = (V, E), the goal is to learn a neural network model fθ that maps the input graph to a Cartesian high-order tensor: fθ: G 7→T, where T ∈R3×···×3 is an order-r tensor representing material properties. Following Yan et al. [2024a] and Hua et al., we focus on three specific tensor properties with orders r ∈{2, 3, 4}, corresponding to the dielectric, piezoelectric, and elastic tensors, respectively.
Given a training set {(G(s), T(s))}S s=1, S for number of samples, we learn the optimal parameters θ by minimizing a regression objective over the dataset: min θ S X s=1 L fθ(G(s)), T(s) , where L(·, ·) denotes regression loss that quantifies the prediction error. Equivariance. Beyond predictive accuracy, we require the model to satisfy geometric consis-tency.
Unlike predicting scalar properties like total energy or formation energy Lupo Pasini et al., Davariashtiyani and Kadkhodaei, tensor properties exhibit strict directional dependency: when a crystal structure experiences rotation in 3D space, tensor properties must transform according to the coordinate transformation rules Grisafi et al. Neglecting this inductive bias leads to physically inconsistent predictions and degraded sample efficiency and generalization capabilities Wen et al., hindering the model’s ability to learn correct physical laws from limited, expensive DFT data.
Currently, the dominant approach for achieving this geometric equivariance in deep learning community involves projecting geometric information onto irreducible representations of the rotation group using spherical harmonics, and then using Clebsch, Gordan (CG) tensor products to combine those irreps in an equivariant manner Batatia et al., Villar et al., Geiger and Smidt, Heilman et al. While this paradigm shows great performance and physical consistency, it comes with a distinct cost: the tensor products and coupling of high-order representations are computationally demanding Passaro and Zitnick.
To mitigate the computational and memory burden, two alternative directions have emerged. The first approach utilizes canonicalization Kaba et al., Hua et al., which maps the structure to a defined and unique canonical frame before employing a non-equivariant network for prediction, subsequently rotating the result back to the original coordinate system.
While offering potential engineering speedups, canonicalization mappings may introduce discontinuities and numerical instabilities in degenerate or near-degenerate symmetry scenarios Dym et al. Although continuous canonicalizations can be constructed in certain restricted scenarios, they are highly dependent on the alignment, thereby potentially compromising training stability and generalization when misaligned.
The second approach involves Cartesian tensors, and replaces spherical tensor products with Cartesian tensor contractions and multiplications, and representative works have demonstrated the feasibility of this paradigm in molecular and tensor prediction tasks Simeon and De Fabritiis, Wang et al. [2024a]. However, existing Methods often rely on stacking deeper tensor message-passing layers or explicitly enumerating angular information Zhong et al., Choudhary and DeCost, leads to high computational cost and limited scalability.
Furthermore, such approaches are seldom extended to end-to-end higher-order tensor prediction, and are mainly built for accurate machine-learning interatomic potentials (MLIPs). CEITNet constructs a multi-channel local environment by aggregating neighbor information on weighted Cartesian bases, where each channel encodes a distinct directional mode of a high-order geometric basis.
It then employs a learnable channel-interaction matrix that couples environment channels, enabling flexible many-body mixing while maintaining computational efficiency. The atomic semantics and the associated coefficients are produced by an invariant message-passing network. This decoupled design improves efficiency by avoiding the propagation of high-order features.
Experiments on multiple high-order tensor prediction benchmarks (order-2 dielectric tensor, order-3 piezoelectric tensor, and order-4 elastic tensor) demonstrate that CEITNet achieves state-of-the-art accuracy on key accuracy criteria while remaining computationally efficient. Channelized Local Environment.
To lift the scalar features into equivariant tensor representations, we construct local environment tensors from edge vectors. Bases construction. For each directed edge (j →i) with relative displacement rij, we first construct the geometric bases: the normalized direction vector nij, the identity matrix I, and the traceless deviatoric tensor Qij: Qij = nijn⊤ ij −1 3I.
These bases serve as building blocks for our equivariant representations. Other bases can be constructed from these bases (e.g., nij ⊗I, Qij ⊗Qij, or nij ⊗Qij). All these bases form Bij.
Crucially, since these bases are derived from geometric vectors, they inherently preserve rotational equivariance, ensuring that the learned representations respect the physical symmetry of the crystal system. Channelized local environment construction. We combine the invariant representations with the equivariant bases to form a Channelized Local Environment.
For each edge, we construct a context vector zij = [hi | hj | eij] and generate K-dimensional channel weights via an MLP MLPw: wij = MLPw(zij). To ensure numerical stability across varying crystal densities, we introduce a learnable normalization deg(i)−p. The channelized local environment tensor Ei for node i is defined as the weighted aggregation of geometric bases at this node Bij: Ei,k = deg(i)−p X j∈N (i) wij,k · Bij, where k = 1, . . . , K.
It is worth noting that we do not limit the node to a single environment tensor. Instead, we construct a set of environments by projecting edge features onto different geometric bases. The key design principle is to keep all learnable components in channel space, while generating tensors through equivariant bases. This yields a flexible equivariant mechanism to model high-order tensors while considering many-body interactions. As Ei denote the compressed K-channel local environments for atom i, we define the atomic tensor generated from this step as Ti: Ti = A(Eτ0 i ) + γ · ψ(EτL i, EτR i; M) + ∆Ti. Specifically, when predicting dielectric tensors, CEITNet achieved a Frobenius-norm distance of 2.87, surpassing existing methods and demonstrating a more accurate reconstruction of tensor structure.
Furthermore, CEITNet attained high-quality prediction rates of 86.1 percent, 63.8 percent, and 39.3 percent for thresholds of 25 percent, 10 percent, and 5 percent, respectively, indicating robust and precise predictions. For piezoelectric tensor prediction, a particularly challenging task, CEITNet achieved a Frobenius-norm distance of 0.517, significantly improving upon previous results.
Correspondingly, CEITNet’s high-quality prediction rates reached 21.98 percent, 5.80 percent, and 2.72 percent at the 25 percent, 10 percent, and 5 percent thresholds, respectively, substantially reducing both absolute and relative error. When applied to elastic tensor prediction, CEITNet achieved a Frobenius-norm distance of 70.11, providing competitive results alongside state-of-the-art baselines.
Notably, CEITNet consistently delivered the best overall performance in high-quality prediction rates, reaching 70.6 percent, 32.2 percent, and 14.4 percent for the 25 percent, 10 percent, and 5 percent thresholds, respectively, demonstrating a substantial reduction in relative error. This method constructs a multi-channel Cartesian local environment tensor and utilises a learnable channel-space interaction to flexibly combine information from multiple atoms.
By decoupling atomic encoding from tensor construction, CEITNet avoids computationally expensive propagation of high-order features, offering a significant advantage over existing techniques. CEITNet demonstrates superior performance across benchmark datasets for predicting tensors of order-2 (dielectric), order-3 (piezoelectric), and order-4 (elastic), establishing its broad applicability to diverse material properties.
Compared to spherical-harmonic based methods, CEITNet provides substantial computational benefits, making it particularly effective in multi-task settings. While the current implementation focuses on three-body interactions, the framework is extensible to model higher-order many-body effects, although this introduces trade-offs in computational cost and numerical stability. Future research will focus on streamlining task transfer to new tensor types and exploring stronger symmetry-aware inductive biases without compromising efficiency.
👉 More information
🗞 Efficient Equivariant High-Order Crystal Tensor Prediction via Cartesian Local-Environment Many-Body Coupling
🧠 ArXiv: https://arxiv.org/abs/2602.04323
