8000
Skip to content

Conversation

@Swoorup
Copy link
@Swoorup Swoorup commented Feb 2, 2024

No description provided.

@ScanMountGoat
Copy link
Owner

See the discussion for #50 for why this will not be merged. I understand that this feels wrong since on the CPU we have a lot of control over the memory layout of types and more relaxed alignment requirements. Adding padding fields to match the GPU layout encourages wasting memory that could be used by adding a padding field to the WGSL type itself. Packing multiple fields into a single vec4 or rarely a vec2 field is not uncommon in shader code for saving space. There are also packing/unpacking functions you can use if you really want to save memory. vec3 is something of a trap since it's essentially a vec4 with an inaccessible w component.

If you're willing to zero pad smaller types, you should just use encase. I would expect the performance impact compared to bytemuck to be minimal since uniform buffers tend to be small. You also won't have to manually initialize padding fields.

This is for OpenGL, but the tips still apply to wgpu/WebGPU:
https://stackoverflow.com/questions/38172696/should-i-ever-use-a-vec3-inside-of-a-uniform-buffer-or-shader-storage-buffer-o

@Swoorup
Copy link
Author
A72B Swoorup commented Feb 3, 2024

I appreciate your feedback, and I completely understand the concerns about maintaining consistent structures. I'd like to discuss a specific example to better understand the reasoning behind the rejection.

Consider the following structures:

For example.

struct Test1 {
   mat3: mat3x3<f32>,
   eye: vec3<f32>,
}
@group(2) @binding(0) var<storage> test1: array<Test1>;

struct Test2 {
   mat3: mat4x4<f32>,
   eye: vec4<f32>,
}
@group(3) @binding(0) var<storage> test2: array<Test2>;

In this case, the first structure occupies a total of 64 bytes, while the second one occupies 80 bytes. From a performance perspective, it might be more optimal to prefer the first structure, as it aligns with the shape type and avoids the need to promote data to vec4 and mat4 on the CPU side.

I understand that issues may arise in scenarios involving arrays with non-aligned data, like below

struct Test3 {
   mat3: array<mat3x3<f32>, 4>,
   eye: vec4<f32>,
}

While I could potentially use encase, my codebase heavily relies on bytemuck, and my generic utilities are designed around it. On other note, it does seem that the user should only be allowed to choose either bytemuck or encase but not both. Wonder if there was a reason for allowing to enable both on?

@ScanMountGoat
Copy link
Owner

There are no performance advantages for the first example. Each column within a matrix is also aligned. A struct with mat3x4 and vec4 has equivalent size and memory layout. It's possible to use both encase and bytemuck without issue if the types already have the correct layout.
https://www.w3.org/TR/WGSL/#alignment-and-size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

0