Computer Science > Computer Vision and Pattern Recognition

arXiv:2510.20385 (cs)

[Submitted on 23 Oct 2025]

Title:Positional Encoding Field

Authors:Yunpeng Bai, Haoxiang Li, Qixing Huang

Abstract:Diffusion Transformers (DiTs) have emerged as the dominant architecture for visual generation, powering state-of-the-art image and video models. By representing images as patch tokens with positional encodings (PEs), DiTs combine Transformer scalability with spatial and temporal inductive biases. In this work, we revisit how DiTs organize visual content and discover that patch tokens exhibit a surprising degree of independence: even when PEs are perturbed, DiTs still produce globally coherent outputs, indicating that spatial coherence is primarily governed by PEs. Motivated by this finding, we introduce the Positional Encoding Field (PE-Field), which extends positional encodings from the 2D plane to a structured 3D field. PE-Field incorporates depth-aware encodings for volumetric reasoning and hierarchical encodings for fine-grained sub-patch control, enabling DiTs to model geometry directly in 3D space. Our PE-Field-augmented DiT achieves state-of-the-art performance on single-image novel view synthesis and generalizes to controllable spatial image editing.

Comments:	8 pages, 9 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2510.20385 [cs.CV]
	(or arXiv:2510.20385v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2510.20385

Submission history

From: Yunpeng Bai [view email]
[v1] Thu, 23 Oct 2025 09:32:37 UTC (14,824 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Positional Encoding Field

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Positional Encoding Field

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators