Researchers are developing increasingly sophisticated methods for manipulating the lighting within 3D scenes, and a new approach from Jiangnan Ye, Jiedong Zhuang (Zhejiang University), Lianrui Mu, Wenjie Zheng (Zhejiang University), Jiaqi Hu, and Xingze Zou represents a significant step forward. The team introduces GS-Light, a system that efficiently relights 3D scenes using text prompts to control the appearance of illumination, without requiring additional training. This innovative pipeline interprets user requests, such as specifying lighting direction or colour, and applies them to a 3D scene represented using Gaussian Splatting, producing high-fidelity, artistically relit images and a fully relit 3D environment. By achieving accurate and consistent relighting across multiple viewpoints, GS-Light surpasses existing methods in both quantitative metrics and user evaluations, promising new possibilities for content creation and virtual environment design.
GS-Light introduces an efficient pipeline for text-guided relighting of 3D scenes represented via Gaussian Splatting, a technique for creating realistic and editable 3D environments. The method extends a diffusion model, a type of generative artificial intelligence, to handle multiple viewpoints simultaneously, enabling relighting from various perspectives based on user instructions. Given a text prompt specifying lighting direction, colour, intensity, or reference objects, GS-Light modifies the scene’s illumination without requiring extensive training data or complex optimisation procedures, offering a streamlined solution for interactive relighting. The system effectively integrates textual instructions with 3D scene representation, allowing users to intuitively control the lighting environment.
NeRFs and Gaussian Splatting for 3D Scenes
Recent advances in 3D scene representation centre around Neural Radiance Fields (NeRFs) and, increasingly, Gaussian Splatting. NeRFs create detailed 3D models from 2D images, while Gaussian Splatting offers a faster and more efficient alternative for achieving comparable quality. Researchers are actively developing variations of both techniques to address specific challenges in 3D reconstruction and rendering. Generative models, such as diffusion models, play a crucial role in generating and editing 3D content, particularly in video applications. Large Multimodal Models (LMMs), which combine vision and language understanding, are also becoming integral to grounding, reasoning, and controlling 3D editing processes.
Key research areas include 3D scene editing and manipulation, focusing on techniques like instructable editing, view-consistent editing, and geometry-guided editing. Researchers are also exploring 3D scene understanding and reasoning, improving the ability of models to ground language to specific objects, understand spatial relationships, and perform object detection and segmentation. Advances in rendering and inverse rendering aim to create realistic images by simulating light interactions and estimating scene properties. Specific applications include controllable relighting, material editing, consistent video generation, and video relighting.
A notable trend is the shift from NeRFs to Gaussian Splatting, driven by its faster training and rendering speeds. The integration of LMMs provides high-level control and reasoning for 3D editing tasks. Maintaining temporal consistency in video editing and view consistency in 3D scene editing remain major challenges. Instruction following, enabling edits based on natural language prompts, is a key goal. There is also growing demand for real-time rendering of 3D scenes, driving research into faster rendering techniques and physically plausible rendering, which focuses on simulating realistic light interactions. The mention of OpenAI’s GPT-5 system card highlights a focus on responsible AI development and the potential risks and benefits of large language models.
Text-Guided Relighting with Gaussian Splatting Achieved
Scientists developed GS-Light, a new pipeline for text-guided relighting of 3D scenes represented using Gaussian Splatting. The method achieves multi-view consistency and position-awareness, realistically altering lighting conditions based on user instructions. The work extends a diffusion model to handle multiple views simultaneously, enabling realistic relighting based on text prompts. Experiments demonstrate the system requires approximately three minutes to generate scene-specific information and an additional three minutes to relight each scene. The team employed a large vision-language model to interpret user prompts, extracting lighting parameters such as direction, colour, and intensity.
This information was combined with estimations of scene geometry and semantics, including depth, surface normals, and semantic segmentation, to compute illumination maps. These maps initialise the diffusion model, providing fine-grained control over lighting effects in the generated results. The system accurately interprets spatial cues within the text prompts, aligning relighting results with user intent. Researchers implemented MV-ICLight, a multi-view relighting model based on existing diffusion technology, incorporating an improved epipolar constraint to ensure coherence across different viewpoints.
This constraint enables the system to maintain consistency between views without requiring extensive training. Tests on both indoor and outdoor scenes demonstrate that GS-Light achieves high-quality relighting, faithfully adhering to user instructions while maintaining multi-view coherence. The method delivers significant improvements in reconstruction consistency, semantic editing similarity, and subjective user evaluation.
Text-Guided Relighting of Gaussian Splatting Scenes
Scientists have developed GS-Light, a new method for realistically relighting 3D scenes represented using Gaussian Splatting. The technique efficiently alters lighting conditions in images and 3D models by interpreting text prompts that specify desired changes, such as lighting direction or colour. By combining these prompt-derived instructions with constraints ensuring consistency across multiple viewpoints, GS-Light generates high-quality, multi-view coherent relit images and fully relit 3D scenes, particularly excelling in accurately reflecting lighting direction. The research demonstrates improvements over existing methods for relighting and scene editing, achieving better results in both quantitative metrics and user studies.
While the method operates effectively without requiring training, the authors acknowledge limitations in handling strongly specular or anisotropic materials, and challenges with complex lighting scenarios or occlusions. Future work could address these issues by incorporating material priors, extending the system to handle more complex lighting effects, and improving the handling of shadows and visibility. This work represents a useful step towards more accessible, controllable, and consistent relighting for 3D content creation.
👉 More information
🗞 Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene Relighting
🧠 ArXiv: https://arxiv.org/abs/2511.13684
