Volsplat: Rethinking feed-forward 3d gaussian splatting with voxel-aligned prediction

W Wang, Y Chen, Z Zhang, H Liu, H Wang… - arXiv preprint arXiv …, 2025 - arxiv.org

W Wang, Y Chen, Z Zhang, H Liu, H Wang, Z Feng, W Qin, F Chen, Z Zhu, DY Chen…

arXiv preprint arXiv:2509.19297, 2025•arxiv.org

Feed-forward 3D Gaussian Splatting (3DGS) has emerged as a highly effective solution for novel view synthesis. Existing methods predominantly rely on a \emph{pixel-aligned} Gaussian prediction paradigm, where each 2D pixel is mapped to a 3D Gaussian. We rethink this widely adopted formulation and identify several inherent limitations: it renders the reconstructed 3D models heavily dependent on the number of input views, leads to view-biased density distributions, and introduces alignment errors, particularly when source views contain occlusions or low texture. To address these challenges, we introduce VolSplat, a new multi-view feed-forward paradigm that replaces pixel alignment with voxel-aligned Gaussians. By directly predicting Gaussians from a predicted 3D voxel grid, it overcomes pixel alignment's reliance on error-prone 2D feature matching, ensuring robust multi-view consistency. Furthermore, it enables adaptive control over density based on 3D scene complexity, yielding more faithful Gaussians, improved geometric consistency, and enhanced novel-view rendering quality. Experiments on widely used benchmarks demonstrate that VolSplat achieves state-of-the-art performance, while producing more plausible and view-consistent results. The video results, code and trained models are available on our project page: https://lhmd.top/volsplat.

arxiv.org

Show moreShow less

Save Cite Cited by 9 Related articles All 2 versions View as HTML

Cite

Advanced search

Saved to My library

Volsplat: Rethinking feed-forward 3d gaussian splatting with voxel-aligned prediction