8.0 Compliance improvements #3246

Open
opened 2020-03-09 09:29:01 +01:00 by bsdphk · 7 comments
bsdphk commented 2020-03-09 09:29:01 +01:00 (Migrated from github.com)

We have agreed to improve RFC compliance in the next "big" release (7.0/vcl x.y)

This ticket is meant to be the coordination point for this, in particular for the necessary documentation updates.

Each of the subtasks will have their own tickets, listed here:

  • #3016 (status 500, 502, 503 or 504)
  • #3102 (304 headers update)
  • #3121 (status 412)
  • #3205 (status 413/414)
    • We could consider other 4XX like 431
  • #3992 (objcore waiting list, no-cache premises)
  • #3340 (cache-control precedence)
  • #3364 (cache-control must-revalidate)
  • reject h2 req with connection-specific header (see 9210c4685f)
  • add support for doing expect-100-continue dance with backend.
  • #4314 filter headers disallowed in H2

Completed Subtasks for 7.0:

  • #3169 (inconsistent backends)
  • #3251 (range headers)
  • #3638 (quoted headers)
  • #3650 (case sensitivity)
  • #3673 (content-range headers)
  • #3638 (quoted headers)

Discarded Subtasks:

  • add multiple ranges support? RFC7233
  • #3668 (HTTP/1 strict parsing)
  • #3211 (private headers)

Todo list:

  • RFC9110 7.6.1 says that "Proxy-Connection" header should be filtered.
We have agreed to improve RFC compliance in the next "big" release (7.0/vcl x.y) This ticket is meant to be the coordination point for this, in particular for the necessary documentation updates. Each of the subtasks will have their own tickets, listed here: * #3016 (status 500, 502, 503 or 504) * #3102 (304 headers update) * #3121 (status 412) * #3205 (status 413/414) * We could consider other 4XX like 431 * #3992 (objcore waiting list, no-cache premises) * #3340 (cache-control precedence) * #3364 (cache-control must-revalidate) * reject h2 req with connection-specific header (see 9210c4685f7b0ca9bb3ae5598f014843c8c43bd1) * add support for doing expect-100-continue dance with backend. * #4314 filter headers disallowed in H2 Completed Subtasks for 7.0: * #3169 (inconsistent backends) * #3251 (range headers) * #3638 (quoted headers) * #3650 (case sensitivity) * #3673 (content-range headers) * #3638 (quoted headers) Discarded Subtasks: * add multiple ranges support? [RFC7233](https://httpwg.org/specs/rfc7233.html) * #3668 (HTTP/1 strict parsing) * #3211 (private headers) Todo list: * RFC9110 7.6.1 says that "Proxy-Connection" header should be filtered.
dridi commented 2020-06-03 18:21:21 +02:00 (Migrated from github.com)

Registering the idea here: we should try to improve RFC coverage in the code as we make progress in this area.

By that I mean comments like:

github.com/varnishcache/varnish-cache@e896529982/bin/varnishd/cache/cache_req_fsm.c (L414-L420)

Registering the idea here: we should try to improve RFC coverage in the code as we make progress in this area. By that I mean comments like: https://github.com/varnishcache/varnish-cache/blob/e896529982027a4325dc3bf19f97c8ebbff4ef53/bin/varnishd/cache/cache_req_fsm.c#L414-L420
dridi commented 2020-08-03 17:15:24 +02:00 (Migrated from github.com)

I'm wondering whether VIP24 also falls under the compliance umbrella.

I'm wondering whether [VIP24](https://github.com/varnishcache/varnish-cache/wiki/VIP-24%3A-Hitpass-turning-into-hitmiss-after-ttl) also falls under the compliance umbrella.
ThijsFeryn commented 2020-08-03 17:38:06 +02:00 (Migrated from github.com)

I had a look at the Edge Architecture Specification, and noticed something that is not properly implemented in our vcl_backend_response logic.

The Surrogate-Control response header contains several directives that influence entity cacheability; specifically, "no-store", "no-store-remote", and "max-age" (see "Surrogate-Control Header" for more information). Collectively, these directives and their behaviors are described by the capability token

Surrogate/1.0

This token should be included in all requests sent by compliant surrogates (see "Surrogate-Capabilities Header").

When any of these directives are present, they override any HTTP cacheability information present in the response.

Surrogate-Control: no-store

In https://github.com/varnishcache/varnish-cache/blob/master/bin/varnishd/builtin.vcl#L154, we unconditionally allow Surrogate-Control: no-store to have precedence over Cache-Control.

According to the spec Surrogate-Capability: varnish="Surrogate/1.0" should be set first.

If we want to improve compliance, we could change the Surrogate check as follows:

bereq.http.Surrogate-Capability  ~ "(?i)Surrogate\/1\.0" && beresp.http.Surrogate-control ~ "(?i)no-store"

Surrogate-Control: max-age

The Surrogate-Control header also supports max-age. Setting the TTL of an object based on this, is a bit more complicated, as it is not part of builtin.vcl.

This would be a change in the core. But the same check on Surrogate-Capability: varnish="Surrogate/1.0" should happen before allowing Surrogate-Control: max-age=30 to take precedence over Cache-Control: max-age=50, for example.

Setting a Surrogate-Capability header in builtin.vcl

If we really want to support surrogates, it would also make sense to already set a Surrogate-Capability header in the vcl_backend_fetch subroutine of the builtin.vcl:

set bereq.http.Surrogate-Capability = "varnish=\"Surrogate/1.0 ESI/1.0\"";

That would allow the origin to know that we support both the surrogate caching syntax and ESI.

Processing ESI based on the Surrogate-Control header

If we send a proper Surrogate-Capability header announcing ESI support, we can just as wel process ESI when the origin sends us a Surrogate-Control header.

We could add the following to vcl_backend_response:

if(beresp.http.Surrogate-Control ~ "content=\"ESI/1\.0\"") {
    set beresp.do_esi = true;
}

If this would undermine any other proxy or CDN that has ESI capabilities, we can use device targeting to make it more specific. The corresponding VCL code would then look like this:

if(beresp.http.Surrogate-Control ~ "content=\"ESI/1\.0\";varnish") {
    set beresp.do_esi = true;
}

Next steps

  • Is this level of compliance in regards to surrogates of interest here?
  • Should I pursue this?
  • Should this be part of #3211?
  • Should I split this up into 2 different issues? One for builtin.vcl, and one for the TTL?
  • Does it make sense to announce our own surrogate capabilities using the proper header?
  • Does it make sense to process ESI automatically if the right Surrogate-Control header is set?
I had a look at the [Edge Architecture Specification](https://www.w3.org/TR/edge-arch/), and noticed something that is not properly implemented in our `vcl_backend_response` logic. > The Surrogate-Control response header contains several directives that influence entity cacheability; specifically, "no-store", "no-store-remote", and "max-age" (see "Surrogate-Control Header" for more information). Collectively, these directives and their behaviors are described by the capability token > >Surrogate/1.0 > >This token should be included in all requests sent by compliant surrogates (see "Surrogate-Capabilities Header"). > >When any of these directives are present, they override any HTTP cacheability information present in the response. ### Surrogate-Control: no-store In https://github.com/varnishcache/varnish-cache/blob/master/bin/varnishd/builtin.vcl#L154, we unconditionally allow `Surrogate-Control: no-store` to have precedence over `Cache-Control`. According to the spec `Surrogate-Capability: varnish="Surrogate/1.0"` should be set first. If we want to improve compliance, we could change the Surrogate check as follows: ``` bereq.http.Surrogate-Capability ~ "(?i)Surrogate\/1\.0" && beresp.http.Surrogate-control ~ "(?i)no-store" ``` ### Surrogate-Control: max-age The `Surrogate-Control` header also supports `max-age`. Setting the *TTL* of an object based on this, is a bit more complicated, as it is not part of `builtin.vcl`. This would be a change in the core. But the same check on `Surrogate-Capability: varnish="Surrogate/1.0"` should happen before allowing `Surrogate-Control: max-age=30` to take precedence over `Cache-Control: max-age=50`, for example. ### Setting a Surrogate-Capability header in builtin.vcl If we really want to support *surrogates*, it would also make sense to already set a `Surrogate-Capability` header in the `vcl_backend_fetch` subroutine of the `builtin.vcl`: ``` set bereq.http.Surrogate-Capability = "varnish=\"Surrogate/1.0 ESI/1.0\""; ``` That would allow the origin to know that we support both the *surrogate caching syntax* and *ESI*. ### Processing ESI based on the Surrogate-Control header If we send a proper `Surrogate-Capability` header announcing *ESI support*, we can just as wel process *ESI* when the origin sends us a `Surrogate-Control` header. We could add the following to `vcl_backend_response`: ``` if(beresp.http.Surrogate-Control ~ "content=\"ESI/1\.0\"") { set beresp.do_esi = true; } ``` If this would undermine any other proxy or CDN that has ESI capabilities, we can use *device targeting* to make it more specific. The corresponding *VCL* code would then look like this: ``` if(beresp.http.Surrogate-Control ~ "content=\"ESI/1\.0\";varnish") { set beresp.do_esi = true; } ``` ### Next steps - Is this level of compliance in regards to *surrogates* of interest here? - Should I pursue this? - Should this be part of #3211? - Should I split this up into 2 different issues? One for `builtin.vcl`, and one for the *TTL*? - Does it make sense to announce our own *surrogate capabilities* using the proper header? - Does it make sense to process *ESI* automatically if the right `Surrogate-Control` header is set?
dridi commented 2020-10-08 16:40:19 +02:00 (Migrated from github.com)

I added a list of parsing mistakes in the issue description, two items I'm aware of so far.

I added a list of parsing mistakes in the issue description, two items I'm aware of so far.
karptonite commented 2022-01-18 15:30:26 +01:00 (Migrated from github.com)

@ThijsFeryn

  • Is this level of compliance in regards to surrogates of interest here?
  • Should I pursue this?
  • Does it make sense to process ESI automatically if the right Surrogate-Control header is set?

I'm super late to this, but I would certainly make use of at least two of these improved use of the Surrogate-Control headers. In particular, we use ESI, and have custom handling in our vcl to process ESI based on that header. It's only a few lines, but it would be simpler for everyone using ESI if those few lines were unnecessary.

Also, I sometimes run into an issue when I want to set Cache-control no-cache for the browse, but cache on Varnish. Being able to override the no-cache in Surrogate-Control with a max-age would simplify things, again removing the need for custom VCL.

@ThijsFeryn > * Is this level of compliance in regards to _surrogates_ of interest here? > * Should I pursue this? > * Does it make sense to process _ESI_ automatically if the right `Surrogate-Control` header is set? I'm super late to this, but I would certainly make use of at least two of these improved use of the Surrogate-Control headers. In particular, we use ESI, and have custom handling in our vcl to process ESI based on that header. It's only a few lines, but it would be simpler for everyone using ESI if those few lines were unnecessary. Also, I sometimes run into an issue when I want to set Cache-control no-cache for the browse, but cache on Varnish. Being able to override the no-cache in Surrogate-Control with a max-age would simplify things, again removing the need for custom VCL.
dridi commented 2025-03-31 12:38:44 +02:00 (Migrated from github.com)

Some notes on the various topics under this umbrella ticket.

Status codes

Something changed between 2020 when we opened this ticket and now: we now have an experimental bits parameter.

When we look at the pending breaking changes, a lot of them boil down to using a more accurate error status code instead of the hegemonic 503 (modulus a couple places where we send a minimal 500).

My suggestion to handle these is a transition period where both behaviors are available. The whole point of the experimental parameter is to give ourselves the ability to introduce breaking changes at any time, when they can be guarded by a flag.

For all the status-related changes, my suggestion to proceed is:

  • a page dedicated to internal errors (possibly linking to other docs related to the features failing)
  • an experimental status_error_rfc bit
  • a clear notice for when this flag disappears and the new behavior takes effect

Alternatively, we could have individual flags like status_502, status_504, status_413 etc but even for individual status codes there may be multiple reasons to get these errors, so I'm in favor of a single flag.

param.set experimental +status_error_rfc

It hopefully conveys that the behavior may change while this guard exists (for example, picking a different status code, or adding more failure scenarii behind a given code).

Cache control

There are 5 topics listed so far:

  • waiting list change to enable "proper" no-cache support
  • private headers (that would add the missing piece for actual proper no-cache support)
  • trivial must-revalidate support
  • cache control vs status code (specifically 303)
  • and finally, surrogate-control support

I already presented a general plan during a VDD, and I have a better idea on how to proceed now than I did then. For the most part, I think the plan is still valid, I just have a different opinion on the implementation details of the initial parts of the plan.

In order to support private/no-cache headers, we need #4073. I made further progress with the waiting list on the Enterprise side, and I would need to rebase the pull request onto current master anyway to at least solve conflicts.

For must-revalidate we don't formally need proper "revalidation" support coming with #4073 but in my VDD plan, dealing with contradicting cache control directives came after the structural changes. We can merge must-revalidate support early, but we should decide which one takes precedence between must-revalidate and stale-while-revalidate. In my presentation the suggestion was to process directives in order of appearance.

In my VDD plan I suggested moving as much core processing at all to the built-in VCL, including the status code checks. That would solve the 303 problem, among other limitations.

Regarding surrogate-control I think it is delicate to just do this without a guard, so my suggestion is...

Exposing some parameters to VCL

I'd like to see a few variables like param.default_ttl and other kinds of timeouts in VCL. I'd like to also see new parameters like uncacheable_ttl (to use it in the built-in VCL) or immutable_ttl for general cache control support.

It could also be used in VCL, some people like to classify resources as static:

if (beresp.http.content-type ~ "^image/") {
        # all images are static in this app
        set beresp.ttl = param.immutable_ttl;
}

We could imagine a boolean parameter surrogate_control off by default, to have off-the-shelf support for surrogate-control headers and automatic ESI activation directly in the built-in VCL, guarded by an if (param.surrogate_control) { return; } block in the built-in subroutine dedicated to ESI and other surrogate control features available in Varnish.

Some notes on the various topics under this umbrella ticket. ### Status codes Something changed between 2020 when we opened this ticket and now: we now have an `experimental` bits parameter. When we look at the pending breaking changes, a lot of them boil down to using a more accurate error status code instead of the hegemonic 503 (modulus a couple places where we send a minimal 500). My suggestion to handle these is a transition period where both behaviors are available. The whole point of the `experimental` parameter is to give ourselves the ability to introduce breaking changes at any time, when they can be guarded by a flag. For all the status-related changes, my suggestion to proceed is: - a page dedicated to internal errors (possibly linking to other docs related to the features failing) - an experimental `status_error_rfc` bit - a clear notice for when this flag disappears and the new behavior takes effect Alternatively, we could have individual flags like `status_502`, `status_504`, `status_413` etc but even for individual status codes there may be multiple reasons to get these errors, so I'm in favor of a single flag. ``` param.set experimental +status_error_rfc ``` It hopefully conveys that the behavior may change while this guard exists (for example, picking a different status code, or adding more failure scenarii behind a given code). ### Cache control There are 5 topics listed so far: - waiting list change to enable "proper" `no-cache` support - private headers (that would add the missing piece for actual proper `no-cache` support) - trivial `must-revalidate` support - cache control vs status code (specifically 303) - and finally, `surrogate-control` support I already presented a general plan during a VDD, and I have a better idea on how to proceed now than I did then. For the most part, I think the plan is still valid, I just have a different opinion on the implementation details of the initial parts of the plan. In order to support private/no-cache headers, we need #4073. I made further progress with the waiting list on the Enterprise side, and I would need to rebase the pull request onto current master anyway to at least solve conflicts. For `must-revalidate` we don't formally need proper "revalidation" support coming with #4073 but in my VDD plan, dealing with contradicting cache control directives came after the structural changes. We can merge `must-revalidate` support early, but we should decide which one takes precedence between `must-revalidate` and `stale-while-revalidate`. In my presentation the suggestion was to process directives in order of appearance. In my VDD plan I suggested moving as much core processing at all to the built-in VCL, including the status code checks. That would solve the 303 problem, among other limitations. Regarding `surrogate-control` I think it is delicate to just do this without a guard, so my suggestion is... ### Exposing some parameters to VCL I'd like to see a few variables like `param.default_ttl` and other kinds of timeouts in VCL. I'd like to also see new parameters like `uncacheable_ttl` (to use it in the built-in VCL) or `immutable_ttl` for general cache control support. It could also be used in VCL, some people like to classify resources as static: ```vcl if (beresp.http.content-type ~ "^image/") { # all images are static in this app set beresp.ttl = param.immutable_ttl; } ``` We could imagine a boolean parameter `surrogate_control` off by default, to have off-the-shelf support for `surrogate-control` headers and automatic ESI activation directly in the built-in VCL, guarded by an `if (param.surrogate_control) { return; }` block in the built-in subroutine dedicated to ESI and other surrogate control features available in Varnish.
nigoroll commented 2025-04-30 15:42:41 +02:00 (Migrated from github.com)

FTR, so it does not get lost: I do fully agree with accurate status codes, but I also think adding vcl control over the responses to send could be a life saver in some situations. So I wonder if we should have a "vcl_synth without a req", which could also replace our minimal_response facility

FTR, so it does not get lost: I do fully agree with accurate status codes, but I also think adding vcl control over the responses to send could be a life saver in some situations. So I wonder if we should have a "vcl_synth without a req", which could also replace our `minimal_response` facility
This discussion has been locked. Commenting is limited to contributors.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
vinyl-cache/vinyl-cache#3246
No description provided.