Surface wave dispersion curve inversion, a crucial technique for understanding Earth’s structure from shallow resource exploration to deep geological studies, often struggles with inaccuracies and computational demands. Feng and colleagues at institutions including [Institution names not provided in source] now present OpenSWI, a massive benchmark dataset designed to accelerate progress in this field. This new resource addresses a critical need for large-scale, diverse data to train and evaluate emerging data-driven deep learning methods, which promise to overcome the limitations of traditional approaches. OpenSWI comprises over 23 million velocity profiles and dispersion curves, encompassing both synthetic datasets tailored to different scales and a real-world dataset for assessing model performance, and its release, alongside the associated data preparation toolbox and trained models, represents a significant step towards intelligent and efficient surface wave inversion.
Surface Wave Inversion for Subsurface Structure
This overview details research focused on using surface waves to determine the structure beneath Earth’s surface, a core area of geophysical investigation with applications ranging from resource exploration to understanding deep Earth processes. Research consistently focuses on improving the accuracy and efficiency of methods used to interpret seismic data and create detailed models of subsurface structures. A dominant theme involves refining techniques for using surface waves, particularly Rayleigh waves, to image the subsurface. Recent studies have focused on developing direct search algorithms and innovative approaches, including artificial neural networks, to improve the inversion process and quantify uncertainties in the resulting models.
The field is increasingly embracing machine learning, with researchers developing deep learning algorithms to accelerate inversion, improve accuracy, and address complex problems. A significant development has been the application of ambient seismic noise to extract surface wave information, offering a cost-effective and continuous source of data. Researchers have utilized this technique to create large-scale models of Earth’s structure, both regionally and globally. Furthermore, full waveform inversion, which utilizes the entire seismic waveform, is gaining traction as a powerful tool for high-resolution imaging.
Synthetic Data Generation for Surface Wave Inversion
Researchers addressed limitations in traditional subsurface imaging by developing a novel methodology centred around the creation of a comprehensive benchmark dataset, OpenSWI, and a corresponding data preparation pipeline, SWIDP. Recognizing the reliance of deep learning methods on the quality and diversity of training data, the team constructed a large-scale, varied collection of geological models and corresponding dispersion curves. The methodology involves generating synthetic datasets tailored to different geological scales and research objectives. OpenSWI-shallow contains over 22 million velocity profiles paired with dispersion curves representing a wide range of shallow subsurface structures, including faults, folds, and realistic stratigraphy.
Complementing this, OpenSWI-deep utilizes fourteen global and regional 3-dimensional geological models to create approximately 1. 26 million data pairs for deep-Earth studies. To further enhance the dataset’s utility, the team incorporated a real-world component, OpenSWI-real, compiled from open-source projects. This dataset provides observed dispersion curves alongside reference models, serving as a crucial benchmark for evaluating the generalization ability of deep learning models trained on the synthetic data. The SWIDP pipeline automates the process of generating these paired data, ensuring consistency and scalability. By training deep learning models on the synthetic datasets and validating them against the real-world data, researchers demonstrate the diversity and representativeness of OpenSWI, paving the way for more robust and efficient subsurface imaging techniques. This innovative approach proactively addresses data scarcity, allowing for the development of deep learning models that are less susceptible to biases and more capable of generalizing to unseen geological scenarios.
OpenSWI Dataset Accelerates Subsurface Structure Reconstruction
Researchers introduced OpenSWI, a comprehensive benchmark dataset designed to accelerate advancements in surface wave dispersion curve inversion, a crucial technique used in both shallow resource exploration and deep geological studies. OpenSWI addresses limitations of traditional methods, which often rely on initial model assumptions and computationally intensive processes. The dataset comprises over 22 million velocity profiles and corresponding dispersion curves in OpenSWI-shallow, focusing on shallow geological structures like layers, faults, and realistic stratigraphy. Complementing this is OpenSWI-deep, containing 1.
26 million data pairs for deep-Earth studies, built from a variety of global and regional geological models. Crucially, OpenSWI also includes OpenSWI-real, a set of observed dispersion curves with reference models, allowing researchers to assess how well models trained on synthetic data generalize to real-world scenarios. Testing demonstrates that deep learning models trained on the synthetic data accurately predict velocity models when applied to the real-world data, confirming the dataset’s representativeness and diversity. The availability of OpenSWI, alongside a supporting toolbox and trained models, provides a valuable open resource for the research community, promising to accelerate the development of intelligent surface wave inversion techniques and broaden their application in diverse geological settings.
OpenSWI Dataset Benchmarks Machine Learning for Inversion
OpenSWI represents a significant advance in resources for surface wave inversion, offering the first benchmark dataset at this scale specifically designed for machine learning applications. The research team has created a comprehensive collection of synthetic and real-world data, encompassing both shallow and deep subsurface velocity structures across diverse geological settings. Experimental results demonstrate that models trained using OpenSWI’s synthetic data effectively generalize to real-world observations, confirming the dataset’s practical value and representativeness. Future work will focus on expanding the geographic and geological coverage of the dataset, as well as incorporating additional geophysical data types, with the intention of establishing OpenSWI as a continuously evolving, community-driven resource to promote reproducible research and wider application of machine learning in geophysical imaging. All code, datasets, and experimental results have been made publicly available to encourage further development and validation by the research community.
👉 More information
🗞 OpenSWI: A Massive-Scale Benchmark Dataset for Surface Wave Dispersion Curve Inversion
🧠 ArXiv: https://arxiv.org/abs/2508.10749
