Volsplat: Rethinking feed-forward 3d gaussian splatting with voxel-aligned prediction
arXiv preprint arXiv:2509.19297, 2025•arxiv.org
Feed-forward 3D Gaussian Splatting (3DGS) has emerged as a highly effective solution for
novel view synthesis. Existing methods predominantly rely on a\emph {pixel-aligned}
Gaussian prediction paradigm, where each 2D pixel is mapped to a 3D Gaussian. We
rethink this widely adopted formulation and identify several inherent limitations: it renders
the reconstructed 3D models heavily dependent on the number of input views, leads to view-
biased density distributions, and introduces alignment errors, particularly when source views …
novel view synthesis. Existing methods predominantly rely on a\emph {pixel-aligned}
Gaussian prediction paradigm, where each 2D pixel is mapped to a 3D Gaussian. We
rethink this widely adopted formulation and identify several inherent limitations: it renders
the reconstructed 3D models heavily dependent on the number of input views, leads to view-
biased density distributions, and introduces alignment errors, particularly when source views …
Feed-forward 3D Gaussian Splatting (3DGS) has emerged as a highly effective solution for novel view synthesis. Existing methods predominantly rely on a \emph{pixel-aligned} Gaussian prediction paradigm, where each 2D pixel is mapped to a 3D Gaussian. We rethink this widely adopted formulation and identify several inherent limitations: it renders the reconstructed 3D models heavily dependent on the number of input views, leads to view-biased density distributions, and introduces alignment errors, particularly when source views contain occlusions or low texture. To address these challenges, we introduce VolSplat, a new multi-view feed-forward paradigm that replaces pixel alignment with voxel-aligned Gaussians. By directly predicting Gaussians from a predicted 3D voxel grid, it overcomes pixel alignment's reliance on error-prone 2D feature matching, ensuring robust multi-view consistency. Furthermore, it enables adaptive control over density based on 3D scene complexity, yielding more faithful Gaussians, improved geometric consistency, and enhanced novel-view rendering quality. Experiments on widely used benchmarks demonstrate that VolSplat achieves state-of-the-art performance, while producing more plausible and view-consistent results. The video results, code and trained models are available on our project page: https://lhmd.top/volsplat.
arxiv.org