8.0 Compliance improvements #3246
Labels
No labels
a=ActionBeforeRelease
a=Bugwash Today
a=Implement
a=Move To VIP
a=NextVCL
a=OK'ed
a=RunByUPLEX
a=VDD
a=duplicate
a=feedback please
a=freeze
a=help wanted
a=invalid
a=last call
a=more-info-needed
a=need bugwash
b=breaking
b=bug
b=cleanup
b=enhancement
c=H/2
c=build
c=doc
c=packaging
c=varnishadm
c=varnishapi
c=varnishd
c=varnishhist
c=varnishlog
c=varnishncsa
c=varnishstat
c=varnishtest
c=varnishtop
c=vmod
r=6.0
r=7.0
r=7.1
r=7.2
r=7.3
r=7.4
r=7.5
r=8.0
r=trunk
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
vinyl-cache/vinyl-cache#3246
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
We have agreed to improve RFC compliance in the next "big" release (7.0/vcl x.y)
This ticket is meant to be the coordination point for this, in particular for the necessary documentation updates.
Each of the subtasks will have their own tickets, listed here:
9210c4685f)Completed Subtasks for 7.0:
Discarded Subtasks:
Todo list:
Registering the idea here: we should try to improve RFC coverage in the code as we make progress in this area.
By that I mean comments like:
github.com/varnishcache/varnish-cache@e896529982/bin/varnishd/cache/cache_req_fsm.c (L414-L420)I'm wondering whether VIP24 also falls under the compliance umbrella.
I had a look at the Edge Architecture Specification, and noticed something that is not properly implemented in our
vcl_backend_responselogic.Surrogate-Control: no-store
In https://github.com/varnishcache/varnish-cache/blob/master/bin/varnishd/builtin.vcl#L154, we unconditionally allow
Surrogate-Control: no-storeto have precedence overCache-Control.According to the spec
Surrogate-Capability: varnish="Surrogate/1.0"should be set first.If we want to improve compliance, we could change the Surrogate check as follows:
Surrogate-Control: max-age
The
Surrogate-Controlheader also supportsmax-age. Setting the TTL of an object based on this, is a bit more complicated, as it is not part ofbuiltin.vcl.This would be a change in the core. But the same check on
Surrogate-Capability: varnish="Surrogate/1.0"should happen before allowingSurrogate-Control: max-age=30to take precedence overCache-Control: max-age=50, for example.Setting a Surrogate-Capability header in builtin.vcl
If we really want to support surrogates, it would also make sense to already set a
Surrogate-Capabilityheader in thevcl_backend_fetchsubroutine of thebuiltin.vcl:That would allow the origin to know that we support both the surrogate caching syntax and ESI.
Processing ESI based on the Surrogate-Control header
If we send a proper
Surrogate-Capabilityheader announcing ESI support, we can just as wel process ESI when the origin sends us aSurrogate-Controlheader.We could add the following to
vcl_backend_response:If this would undermine any other proxy or CDN that has ESI capabilities, we can use device targeting to make it more specific. The corresponding VCL code would then look like this:
Next steps
builtin.vcl, and one for the TTL?Surrogate-Controlheader is set?I added a list of parsing mistakes in the issue description, two items I'm aware of so far.
@ThijsFeryn
I'm super late to this, but I would certainly make use of at least two of these improved use of the Surrogate-Control headers. In particular, we use ESI, and have custom handling in our vcl to process ESI based on that header. It's only a few lines, but it would be simpler for everyone using ESI if those few lines were unnecessary.
Also, I sometimes run into an issue when I want to set Cache-control no-cache for the browse, but cache on Varnish. Being able to override the no-cache in Surrogate-Control with a max-age would simplify things, again removing the need for custom VCL.
Some notes on the various topics under this umbrella ticket.
Status codes
Something changed between 2020 when we opened this ticket and now: we now have an
experimentalbits parameter.When we look at the pending breaking changes, a lot of them boil down to using a more accurate error status code instead of the hegemonic 503 (modulus a couple places where we send a minimal 500).
My suggestion to handle these is a transition period where both behaviors are available. The whole point of the
experimentalparameter is to give ourselves the ability to introduce breaking changes at any time, when they can be guarded by a flag.For all the status-related changes, my suggestion to proceed is:
status_error_rfcbitAlternatively, we could have individual flags like
status_502,status_504,status_413etc but even for individual status codes there may be multiple reasons to get these errors, so I'm in favor of a single flag.It hopefully conveys that the behavior may change while this guard exists (for example, picking a different status code, or adding more failure scenarii behind a given code).
Cache control
There are 5 topics listed so far:
no-cachesupportno-cachesupport)must-revalidatesupportsurrogate-controlsupportI already presented a general plan during a VDD, and I have a better idea on how to proceed now than I did then. For the most part, I think the plan is still valid, I just have a different opinion on the implementation details of the initial parts of the plan.
In order to support private/no-cache headers, we need #4073. I made further progress with the waiting list on the Enterprise side, and I would need to rebase the pull request onto current master anyway to at least solve conflicts.
For
must-revalidatewe don't formally need proper "revalidation" support coming with #4073 but in my VDD plan, dealing with contradicting cache control directives came after the structural changes. We can mergemust-revalidatesupport early, but we should decide which one takes precedence betweenmust-revalidateandstale-while-revalidate. In my presentation the suggestion was to process directives in order of appearance.In my VDD plan I suggested moving as much core processing at all to the built-in VCL, including the status code checks. That would solve the 303 problem, among other limitations.
Regarding
surrogate-controlI think it is delicate to just do this without a guard, so my suggestion is...Exposing some parameters to VCL
I'd like to see a few variables like
param.default_ttland other kinds of timeouts in VCL. I'd like to also see new parameters likeuncacheable_ttl(to use it in the built-in VCL) orimmutable_ttlfor general cache control support.It could also be used in VCL, some people like to classify resources as static:
We could imagine a boolean parameter
surrogate_controloff by default, to have off-the-shelf support forsurrogate-controlheaders and automatic ESI activation directly in the built-in VCL, guarded by anif (param.surrogate_control) { return; }block in the built-in subroutine dedicated to ESI and other surrogate control features available in Varnish.FTR, so it does not get lost: I do fully agree with accurate status codes, but I also think adding vcl control over the responses to send could be a life saver in some situations. So I wonder if we should have a "vcl_synth without a req", which could also replace our
minimal_responsefacility