Optimal transport seeks a mapping between probability distributions, and this research introduces an approach to learn this mapping directly from data without iterative updates. Frank Cole (University of Minnesota), Dixi Wang and Yineng Chen (Purdue University), alongside Yulong Lu, Rongjie Lai and colleagues, present a novel method for in-context operator learning on probability measure spaces. This work is significant because it establishes theoretical guarantees for generalisation accuracy based on the size of the input data and model capacity, and provides an explicit architecture for exact map recovery in specific cases, offering a potentially transformative advance for applications in generative modelling and beyond.
We utilise few-shot samples from each distribution as a prompt, performing inference without gradient updates. The solution operator is parameterised and a scaling-law theory is developed in two regimes. In the nonparametric setting, where tasks concentrate on a low-intrinsic-dimension manifold of source, target pairs, we establish generalisation bounds that quantify how in-context accuracy scales with prompt size, intrinsic task dimension, and model capacity. In the parametric setting, for example, Gaussian families, we present an explicit architecture that recovers the exact optimal transport map in context and provide finite-sample excess-risk bounds. Our numerical experiments are conducted on synthetic transports and generative-modelling benchmarks.
ICL Model Training with MNIST and Autoencoders
Based on the provided content, I will summarize and organize key information related to the experiments conducted for training an Inverse Contextual Learning (ICL) model across different datasets. ### Training Protocol: – Optimizer: Adam with CosineAnnealingLR schedule. – Base Learning Rate: 3e-4 for synthetic experiments; 1e-3 for real-world experiments. – Synthetic experiments: 1000 epochs – Others: 3000 epochs ### Datasets and Preprocessing: #### MNIST: – Preprocessing: Scale pixels to [0, 1] and flatten images to length 784. – Latent Space: Generated using an autoencoder with d=256 dimensions. – Autoencoder: – Encoder: Three sequential convolutional layers (channels 64/128/256), each followed by BatchNorm and ReLU, with residual blocks at each resolution. Decoder: Fully connected expansion to 256×7×7, two transposed-convolution layers back to 1×28×28, and a final Sigmoid. – Training: End-to-end training for 100 epochs using Adam (learning rate 1e-3). #### Fashion-MNIST: – Preprocessing: Scale pixels to [0, 1]. – Latent Space: Generated using an autoencoder with d=15 dimensions. – Autoencoder: – Encoder: Three convolutional stages (channels 64/128/256), each followed by BatchNorm and ReLU, with residual blocks at each resolution. The resulting tensor is flattened and mapped to the latent vector by a linear layer. – Decoder: Mirrors the encoder design but in reverse order, producing outputs in [0, 1]. – Training: Optimized using pixel-wise MSE plus SSIM MSE+μ (with μ=0) for 300 epochs with Adam (learning rate 1e-3). #### ModelNet10: – Preprocessing: Convert mesh documents to point clouds by uniform surface sampling. – Latent Space: Generated using an autoencoder with d=10 dimensions. – Autoencoder: – Encoder: Three sequential MLP blocks (Conv1d) with output channels 64/128/256, each followed by BatchNorm and ReLU. A symmetric max-pooling over points to obtain a permutation invariant 256-dimensional global descriptor, and two fully-connected layers (256 → 256 with ReLU, then 256→z) to produce the latent vector. – Decoder: Lightweight MLP (z →512→1024→3npts with Tanh at the output). – Training: Trained end-to-end using permutation invariant Chamfer-l2 loss for 300 epochs with Adam (learning rate 1e-3). ### Additional Results: – Figures 8, 9, and 10 display generated images from the ICL model trained on MNIST, FashionMNIST, and ModelNet datasets respectively. This summary provides a structured overview of the experimental setup and results for training an ICL model across different high-dimensional datasets.
In-context Optimal Transport via Scaling Laws
Scientists achieved a breakthrough in optimal transport (OT) by introducing an in-context operator on probability measure spaces, enabling a single solution operator to map pairs of distributions to the OT map using only 2 samples from each distribution as a prompt and without gradient updates during inference. The team parameterised the solution operator and developed scaling-law theory, investigating both nonparametric and parametric regimes to quantify in-context accuracy. Experiments revealed that, in the nonparametric setting, when tasks concentrate on a low-intrinsic-dimension manifold of source, target pairs, generalization bounds scale with prompt size, intrinsic task dimension, and model capacity. Results demonstrate that, in the parametric setting, specifically with Gaussian families, an explicit transformer architecture recovers the exact OT map in context, with finite-sample excess-risk bounds established.
The research constructs a transformer for set-to-set, variable-size prompts, performing contextual coupling between empirical measures and delivering sample-size-agnostic transport predictions. Under the assumption that the task distribution is supported on a low-dimensional manifold, the study establishes generalization error bounds for in-context learning of transport maps, providing quantitative estimates for both sample complexity and task complexity, as detailed in Theorem 1. Measurements confirm that, when both reference and target measures are centred Gaussians, the constructed transformer architecture learns the exact OT map in context, with a quantitative generalization error bound for the resulting excess loss presented as Theorem 2. The team validated the framework with numerical experiments on synthetic setups and generative-modeling benchmarks, demonstrating accurate, prompt-driven transport consistent with the predicted scaling behaviour.
Data shows the predictive performance of the in-context learning model through applications to generative modelling on both synthetic and real-world datasets. The work casts OT solution operator estimation as an in-context operator learning problem, learning a single operator T that maps (ρ0, ρ1) to Tρ0→ρ1 from few-shot prompts. The research establishes quantitative estimates for sample complexity and task complexity, providing a theoretical foundation for efficient and accurate transport map estimation. This breakthrough delivers a novel approach to optimal transport, moving beyond traditional methods that require recomputation of estimators when measures change, and opening avenues for prompt-driven transport solutions.
In-context learning for optimal transport generalisation
This research introduces an in-context operator learning framework for estimating optimal transport maps between probability measures, utilising few-shot samples as prompts without gradient updates during inference. The work frames optimal transport map estimation as a contextual prediction problem, enabling a single learned operator to generalise across a family of transport tasks. Analysis encompasses both nonparametric and parametric regimes, offering a unified approach to learning these maps. In the nonparametric setting, the authors proved generalisation bounds assuming source-target distribution pairs lie on a low-dimensional task manifold, quantifying rates related to prompt size, intrinsic task dimension, and model capacity.
Furthermore, within the parametric Gaussian setting, an explicit architecture was constructed to recover the exact optimal transport map in context, accompanied by finite-sample excess-risk bounds. Numerical experiments across synthetic transports, image datasets, and 3D point clouds demonstrate accurate conditional mapping inference and validate the predicted scaling behaviour. The authors acknowledge limitations related to the assumption of low-dimensional task manifolds and suggest future research directions including extending the framework to other operators on probability spaces, incorporating structural priors, and integrating in-context transport operators into broader generative modelling pipelines.
👉 More information
🗞 In-Context Operator Learning on the Space of Probability Measures
🧠 ArXiv: https://arxiv.org/abs/2601.09979
