Scientists are tackling the challenge of converting standard images and videos into immersive 360° panoramas without relying on complex geometric calculations. Ziyi Wu, Daniel Watson (Google DeepMind), and Andrea Tagliasacchi (Simon Fraser University, Google DeepMind) alongside David J. Fleet et al, present 360Anything , a novel framework that learns to map perspective views to equirectangular panoramas purely from data, bypassing the need for precise camera calibration. This is significant because it unlocks the potential to create 360° content from readily available ‘in-the-wild’ footage, where camera information is often missing or unreliable, and achieves state-of-the-art results while also addressing and resolving common seam artifacts , paving the way for more accessible and realistic immersive experiences.

Data-driven panorama generation without camera data

By treating both the perspective input and the panorama target as simple token sequences, the team achieved a purely data-driven perspective-to-equirectangular mapping, learned through a pre-trained diffusion transformer. This innovative method surpasses the performance of prior techniques that rely on accurate, ground-truth camera information, marking a substantial leap forward in the field. This elegant solution ensures circular continuity in the latent representation, effectively eliminating the boundary artifacts that plague existing methods and delivering visually consistent 360° experiences. Experiments demonstrate that 360Anything achieves state-of-the-art results on both image and video perspective-to-360° generation, consistently outperforming established techniques.

This breakthrough establishes an end-to-end framework that eliminates the camera estimation step, scaling both model and data capabilities, and offering a robust solution for handling images with varying FoV and videos with complex camera and object motion. The team’s innovative approach not only addresses a critical limitation in existing panorama generation techniques but also unlocks new avenues for immersive 3D content creation and analysis, promising transformative potential across a wide range of applications. Additional results and demonstrations are available at https://360anything. github. io/

Diffusion Transformers for Geometry-Free Panorama Generation offer state-of-the-art

The study pioneered a purely data-driven approach, treating both perspective inputs and panorama targets as token sequences to learn the perspective-to-equirectangular projection mapping. Researchers leveraged pre-trained latent diffusion transformers, adopting a flow matching framework to train a denoiser network, a diffusion transformer (DiT), to reverse the noise addition process and generate panorama data. The training objective minimises the mean squared error between the predicted noise and the actual noise added to the panorama data, conditioned on captions and perspective inputs. Experiments employed a sequence concatenation approach, learning the perspective-to-equirectangular mapping without geometric priors, achieving state-of-the-art performance in panoramic image and video generation.

The team addressed seam artifacts, a common issue in panorama generation, by analysing the panoramic latent space and identifying zero-padding in the VAE encoder as the root cause. To resolve this, scientists introduced Circular Latent Encoding, a technique designed to facilitate seamless generation by modifying the latent space representation. This innovative method enables the creation of consistent 3D scenes from generated panoramic videos, demonstrating a significant advancement over prior techniques. Furthermore, the research harnessed the DiT architecture, implementing the denoiser Gθ as a neural network trained to reverse the forward diffusion process, where noise is added to clean panorama data at varying time steps t, ranging from 0 to 1. The system delivers a robust solution for generating high-quality panoramas, even from in-the-wild data lacking accurate camera calibration. Researchers canonicalised 360° training videos by estimating per-frame camera poses and aligning them with the first frame, subsequently aligning the video’s gravity direction with the vertical axis, a crucial step for consistent panorama generation.

Data-driven perspective to 360° panorama generation is increasingly

The research team achieved state-of-the-art performance in perspective-to-360° generation for both images and video, surpassing prior methods that relied on ground-truth camera information. Experiments revealed that the framework operates by treating perspective inputs and panorama targets as token sequences, enabling a purely data-driven perspective-to-equirectangular mapping without requiring camera metadata. This innovative approach eliminates the need for camera calibration, opening possibilities for processing “in-the-wild” data where such information is often absent or unreliable. The team meticulously traced seam artifacts commonly found at equirectangular projection (ERP) boundaries to zero-padding within the VAE encoder, a critical discovery.

To address this, they introduced Circular Latent Encoding, a technique that facilitates seamless panorama generation by ensuring circular continuity in the latent representation. Measurements confirm that this circular padding effectively eliminates the root cause of these seam artifacts during the training stage itself, a significant improvement over previous inference-time mitigation strategies. Results demonstrate that 360Anything can handle images with varying Field-of-View (FoV) and videos with substantial object and camera motion, consistently producing gravity-aligned panoramas. The breakthrough delivers a fully end-to-end pipeline, scaling both model and data effectively.

Tests prove that generated panoramas enable 3D scene reconstruction via 3D Gaussian Splatting, highlighting the potential for immersive 3D world generation across robotics, augmented reality, virtual reality, and gaming. The work identifies VAE latent encoding as the source of seam artifacts and proposes a simple solution to mitigate the issue. Data shows that 360Anything achieves state-of-the-art performance, outperforming baselines that utilize extra camera information. Additional results and a 360° viewer are available at https://360anything. github. io/

Diffusion models unlock calibration-free 360° panoramas from unposed

This geometry-free approach utilises a pre-trained diffusion model, treating both the input and target as sequences of tokens to learn the perspective-to-equirectangular mapping directly from data. This innovation facilitates seamless panorama creation, enhancing visual quality and fidelity. The authors acknowledge a limitation in handling videos with significant roll and pitch angles, as training the model to generate gravity-aligned panoramas simplifies the task but may introduce challenges in such scenarios. Future research could explore methods to address this, potentially by incorporating techniques for dynamic gravity alignment or view synthesis. This work represents a significant advancement in panorama generation, offering a practical solution for creating immersive 3D experiences from readily available in-the-wild data, and opening avenues for further exploration in areas like virtual and augmented reality.

👉 More information
🗞 360Anything: Geometry-Free Lifting of Images and Videos to 360°
🧠 ArXiv: https://arxiv.org/abs/2601.16192

Tags:

diffusion models

Muhammad Rohail T.

360Anything: AI Generates 360° Images & Video

Data-driven panorama generation without camera data

Diffusion Transformers for Geometry-Free Panorama Generation offer state-of-the-art

Data-driven perspective to 360° panorama generation is increasingly

Diffusion models unlock calibration-free 360° panoramas from unposed

Latest Posts by Muhammad Rohail T.:

Hidden Photon Signals Reveal Optimal Sensing Strategies for Materials

Quantum Walks Find Arcs with 100% Probability on Symmetrical Graphs

Quantum Turbulence Arises from Stochastic Forces Linked to Dissipation