Content-aware HTTP cache¶
HTTP cache in Ibexa DXP is aware of which content or entity it is connected to. This awareness is accomplished by means of cache tagging. All supported reverse proxies are content-aware.
Tag header is stripped in production for security reasons
For security reasons this header, and other internal cache headers, are stripped from output in production by the reverse proxy (in VCL for Varnish and Fastly).
Cache tags¶
Understanding tags is the key to making the most of ezplatform-http-cache
.
Tags form a secondary set of keys assigned to every cache item, on top of the "primary key" which is the URI. Like an index in a database, a tag is typically used for anything relevant that represents the given cache item. Tags are used for cache invalidation.
For example, the system tags every article response, and when the article Content Type is updated, it tells Varnish that all articles should be considered stale and updated in the background when someone requests them.
Current content tags (and when the system purges on them):
- Content:
c<content-id>
- Purged on all smaller or larger changes to content (including its metadata, Fields and Locations). - Content Version:
cv<content-id>
- Purged when any version of Content is changed (for example, a draft is created or removed). - Content Type:
ct<content-type-id>
- Used when the Content Type changes, affecting content of its type. - Location:
l<location-id>
- Used for clearing all cache relevant for a given Location. - Parent Location:
pl<[parent-]location-id>
- Used for clearing all children of a Location (pl<location-id>
), or all siblings (pl<parent-location-id>
). - Path:
p<location-id>
- For operations that change the tree itself, like move, remove, etc. - Relation:
r<content-id>
- Only purged on when content updates are severe enough to also affect reverse relations. - Relation location:
rl<location-id>
- Same as relation, but by Location ID.
Automatic repository prefixing of cache tags
As Ibexa DXP supports multi-repository (multi-database) setups that can have overlapping IDs, the shared HTTP cache systems need to distinguish tags relevant to the different content repositories.
This is why in multi-repository setup you can see cache tags such as 1p2
.
In this example 1
represents the index among configured repositories, meaning the second repository in the system.
Tags are not prefixed for default repository (index "0").
The content tags are returned in a header in the responses from Ibexa DXP. The header name is dependent on which HTTP Cache Ibexa DXP is configured with:
- Symfony reverse proxy:
X-Cache-Tags
- Varnish:
xkey
- Fastly:
Surrogate-Key
Examples:
X-Cache-Tags: ez-all,c52,ct42,l2,pl1,p1,p2,r56,r57
xkey: ez-all c52 ct42 l2 pl1 p1 p2 r56 r57
Surrogate-Key: ez-all c52 ct42 l2 pl1 p1 p2 r56 r57
Troubleshooting - Cache header too long errors¶
In case of complex content, for example, Pages with many blocks, or RichText with a lot of embeds/links,
you can encounter problems with too long cache header on responses.
It happens because necessary cache entries may not be tagged properly.
You may also see 502 Headers too long
errors, and webserver refusing to serve the page.
You can solve this issue in one of the following ways:
A. Allow larger headers¶
Varnish configuration:
- http_resp_hdr_len (default 8k, change to for example, 32k)
- http_max_hdr (default 64, change to for example, 128)
- http_resp_size (default 23k, change to for example, 96k)
- workspace_backend (default 64k, change to for example, 128k)
If you need to see these long headers in varnishlog
, adapt the vsl_reclen setting.
Nginx has a default limit of 4k/8k when buffering responses:
- For PHP-FPM setup using proxy module, configure proxy_buffer_size
- For FastCGI setup using fastcgi module, configure fastcgi_buffer_size
Fastly has a Surrogate-Key
header limit of 16 kB, and this cannot be changed.
Apache has a hard coded limit of 8 kB, so if you face this issue consider using Nginx instead.
B. Limit tags header output by system¶
1. For inline rendering just displaying the content name, image attribute, and/or link, it would be enough to:
- Look into how many inline (non ESI) render calls for content rendering you are doing, and see if you can organize it differently.
- Consider inlining the views not used elsewhere in the given template and tagging the response in Twig with "relation" tags.
- (Optional) You can set reduced cache TTL for the given view, to reduce the risk of stale cache on subtree operations affecting the inlined content.
2. You can opt in to set a max length parameter (in bytes) and corresponding ttl (in seconds) for cases when the limit is reached. The system will log a warning where the limit is reached, and when needed, you can optimize these cases as described above.
1 2 3 4 5 |
|
Response tagging with content view¶
For content views response tagging is done automatically, and cache system outputs headers as follows:
1 2 3 |
|
If the given content has several Locations, you can see several l<location-id>
and p<location-id>
tags in the response.
How response tagging for ContentView is done internally
In ezplatform-http-cache
there is a dedicated response listener HttpCacheResponseSubscriber
that checks if:
- the response has attribute
view
- the view implements
eZ\Publish\Core\MVC\Symfony\View\CachableView
- cache is not disabled on the individual view
If that checks out, the response is adapted with the following:
ResponseCacheConfigurator
applies SiteAccess settings for enabled/disabled cache and default TTL.DispatcherTagger
dispatches the built-in ResponseTaggers which generate the tags as described above.
ResponseConfigurator¶
A ReponseCacheConfigurator
configures an HTTP Response object, makes the response public, adds tags and sets the shared max age.
It is provided to ReponseTaggers
that use it to add the tags to the response.
The ConfigurableResponseCacheConfigurator
(ezplatform.view_cache.response_configurator
) follows the view_cache
configuration and only enables cache if it is enabled in the configuration.
Delegator and Value taggers¶
- Delegator taggers - extract another value or several from the given value and pass it on to another tagger. For example, a
ContentView
is covered both by theContentValueViewTagger
andLocationValueViewTagger
, where the first extracts the Content from theContentView
and passes it to theContentInfoTagger
. - Value taggers - extract the
Location
and pass it on to theLocationViewTagger
.
DispatcherTagger¶
Accepts any value and passes it on to every tagger registered with the service tag ezplatform.http_response_tagger
.
Response tagging in controllers¶
For tagging needs in controllers, there are several options, here presented in recommended order:
1. Reusing DispatcherTagger
to pick correct tags.
Examples for tagging everything needed for content using the autowirable ResponseTagger
interface:
1 2 3 4 5 6 7 8 |
|
2. Use ContentTagInterface
API for content related tags.
Examples for adding specific content tags using the autowireable ContentTagInterface
:
1 2 3 4 5 6 7 8 9 10 11 |
|
3. Manually add tags yourself using low-level FOS TagHandler
.
In PHP, FOSHttpCache exposes the fos_http_cache.http.symfony_response_tagger
service which enables you to add tags to a response.
The following example adds minimal tags when ID 33 and 34 are rendered in ESI, but parent response needs these tags to get refreshed if they are deleted:
1 2 |
|
See Tagging from code in FOSHttpCacheBundle doc.
4. Use deprecated X-Location-Id
header.
For custom or built-in controllers (e.g. REST) still using X-Location-Id
, XLocationIdResponseSubscriber
handles translating
this header to tags. It supports singular and comma-separated Location ID value(s):
1 2 3 4 5 |
|
X-Location-Id use is deprecated
X-Location-Id
is deprecated and will be removed in future.
For rendering content it is advised to refactor to use Content View,
if not applicable ContentTagInterface
or lastly manually output tags.
Response tagging in templates¶
1. ez_http_cache_tag_location()
For full content tagging when inline rendering, use the following:
1 |
|
2. ez_http_cache_tag_relation_ids()
or ez_http_cache_tag_relation_location_ids()
When you want to reduce the amount of tags, or the inline content is rendered using ESI, a minimum set of tags can be set:
1 2 3 4 |
|
3. {{ fos_httpcache_tag(['r33', 'r44']) }}
As a last resort you can also use the following function from FOS which lets you set low level tags directly:
1 2 3 4 |
|
See Tagging from Twig Templates in FOSHttpCacheBundle documentation.
Tag purging¶
Default tag purging¶
ezplatform-http-cache
uses Repository API event subscribers to listen to events emitted on Repository operations,
and depending on the operation triggers expiry on a specific tag or set of tags.
All event subscribers can be found in http-cache/src/lib/EventSubscriber/CachePurge
.
Tags purged on publish event¶
Below is an example of a Content structure. The tags which the content view controller adds to each location are also listed
1 2 3 4 5 6 7 8 9 10 |
|
In the event when a new version of Child
is published, the following keys are purged:
c55
, because Content[Child]
was changedr55
, because cache for any object that has a relation to Content[Child]
should be purgedl22
, because Location[Child]
has changed ( that would be location holding content-id=55)pl22
, because cache for children of[Child]
should be purgedrl22
, because cache for any object that has a relation to Location[Child]
should be purgedl20
, because cache for parent of[Child]
should be purgedpl20
, because cache for siblings of[Child]
should be purged
In summary, HTTP Cache for any location representing [Child]
, any Content that relates to the Content [Child]
, the
location for [Child]
, any children of [Child]
, any Location that relates to the Location [Child]
, location for
[Parent1]
, any children on [Parent1]
.
Effectively, in this example HTTP cache for [Parent1]
and [Child]
will be cleared.
Tags purged on move event¶
With the same Content structure as above, the [Child]
location is moved below [Parent2]
.
The new structure will then be:
1 2 3 4 5 6 7 8 9 10 |
|
The following keys will be purged during the move:
l20
, because cache for previous parent of[Child]
should be purged ([Parent1]
)pl20
, because cache for children of[Parent1]
should be purgedl21
, because cache for new parent of[Child]
should be purged ([Parent2]
)pl21
, because cache for all children of new parent ([Parent2]
) should be purgedp22
, because cache for any element below[Child]
should be purged (because path has changed)
In other words, HTTP Cache for [Parent1]
, children of [Parent1]
( if any ), [Parent2]
, children of [Parent2]
( if any ),
[Child]
and any subtree below [Child]
.
Custom purging from code¶
While the system purges tags whenever API is used to change data, you may need to purge directly from code. For that you can use the built-in purge client:
1 2 3 4 5 6 7 |
|
Purging from command line¶
Example for purging by Location and by Content ID:
1 |
|
Example for purging by all cache:
1 |
|
Purge is done on the current Repository
Similarly to purging from code, the tags you purge on, are prefixed to match the currently configured SiteAccess. When you use this command in combination with multi-repository setup, make sure to specify SiteAccess argument.
Testing and debugging HTTP cache¶
It is important to test your code in an environment which is as similar as your production environment as possible. That means that if only are testing locally using the default Symfony Reverse proxy when your are going to use Varnish or Fastly in production, you are likely ending up some (bad) surprises. Due to the symfony reverse proxy's lack of support for ESIs, it behaves quite different from Varnish and Fastly in some aspects. If you are going to use Varnish in production, make sure you also test your code with Varnish. If you are going to use Fastly in production, testing with Fastly in your developer install is likely not feasible (you're local development environment must then be accessible for Fastly). Testing with Varnish instead will in most cases do the job. But if you need to change the varnish configuration to make your site work, be aware that Varnish and Fastly uses different dialects, and that .vcl code for Varnish V6.x will likely not work as-is on Fastly.
This section describes to how to debug problems related to HTTP cache. In order to that, you must be able to look both at responses and headers Ibexa DXP sends to HTTP cache, and not so much at responses and headers the HTTP cache sends to the client (web browser). It means you must be able to send requests to your origin (web server) that do not go through Varnish or Fastly. If you run Nginx and Varnish on premise, you should know what host and port number both Varnish and Nginx runs on. If you perform tests on Fastly enabled environment on Ibexa Cloud provided by Platform.sh, you need to use the Platform.sh Dashboard to obtain the endpoint for Nginx.
The following example shows how to debug and check why Fastly does not cache the front page properly. If you run the command multiple times:
curl -IXGET https://www.staging.foobar.com.us-2.platformsh.site
it always outputs:
1 2 3 |
|
Nginx endpoint on Platform.sh¶
Finding Nginx endpoint for environments located on the grid¶
To find the Nginx point, first, you need to know in which region your project is located. To do that, go to the Platform.sh dashboard.
To find a valid route, click an element in the URLs drop-down for the specified environment and select the route.
A route may look like this:
https://www.staging.foobar.com.us-2.platformsh.site/
In this case the region is us-2
and you can find the public IP list on Platform.sh documentation page
Typically, you can add a gw
to the hostname and use nslookup to find it.
1 2 3 4 |
|
You can also use the Ibexa Cloud CLI (which has the same command as the Platform.sh CLI) to find the endpoint:
1 |
|
Finding Nginx endpoint on dedicated cloud¶
If you have a dedicated 3-node cluster on Platform.sh, the procedure for getting the endpoint to environments that are
located on that cluster (production
and sometimes also staging
) is slightly different.
In the URLs drop-down in the Platform.sh dashboard, find the route that has the format
somecontent.[clusterid].ent.platform.sh/
, for example, myenvironment.abcdfg2323.ent.platform.sh/
The endpoint in case has the format c.[clusterid].ent.platform.sh
, for example, c.asddfs2323.ent.platform.sh/
Next, use nslookup to find the IP:
1 2 3 4 |
|
Fetching user context hash¶
As explained in User Context Hash caching, the HTTP cache indexes the cache based on the
user-context-hash. Users with the same user-context-hash here the same cache (as long as Ibexa DXP
responds with Vary: X-User-Hash
).
In order to simulate the requests the HTTP cache sends to Ibexa DXP, you need this user-context-hash.
To obtain it, use curl
.
1 |
|
Some notes about each of these parameters:
- -IXGET
, one of many ways to tell curl that we want to send a GET request, but we are only interested in outputting the headers
- --resolve www.staging.foobar.com.us-2.platformsh.site:443:1.2.3.4
- We tell curl not to do a DNS lookup for www.staging.foobar.com.us-2.platformsh.site
. We do that because in our case
that will resolve to the Fastly endpoint, not our origin (nginx)
- We specify 443
because we are using https
- We provide the IP of the nginx endpoint at platform.sh (1.2.3.4
in this example)
- --header "Surrogate-Capability: abc=ESI/1.0"
, strictly speaking not needed when fetching the user-context-hash, but this tells Ibexa DXP that client understands ESI tags.
It is good practice to always include this header when imitating the HTTP Cache.
- --header "accept: application/vnd.fos.user-context-hash"
tells Ibexa DXP that the client wants to receive the user-context-hash
- --header "x-fos-original-url: /"
is required by the fos-http-cache bundle in order to deliver the user-context-hash
- https://your-page-blah-blah.us-2.platformsh.site/_fos_user_context_hash
: here we use the hostname we earlier told
curl how to resolve using ---resolve
. /_fos_user_context_hash
is the route to the controller that are able to
deliver the user-context-hash.
- You may also provide the session cookie (`--cookie ".....=....") for a logged-in-user if you are interested in
the x-user-context-hash for a different user but anonymous
The output for this command should look similar to this:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
The header X-User-Hash
is the one of the interest here, but you may also note the Surrogate-Key
which
holds the cache tags.
Fetching HTML response¶
Now you have the user-context-hash, and you can ask origin for the actual resource you are after:
1 |
|
The output :
1 2 3 4 5 6 7 8 9 10 11 12 |
|
The Cache-Control
header tells the HTTP cache to store the result in the cache for 1 day (86400 seconds)
The Vary: X-User-Hash
header tells the HTTP cache that this cache element may be used for all users which has
the given x-user-hash
(daea248406c0043e62997b37292bf93a8c91434e8661484983408897acd93814
).
The document might also be removed from the cache by purging any of the keys provided in the Surrogate-Key
header.
So back to the original problem here. This resource is for some reason not cached by Fastly ( remember the
x-cache: MISS
we started with). But origin says this page can be cached for 1 day. How can that be?
The likely reason is that this page also contains some ESI fragments and that one or more of these are not cachable.
So, first let's see if there are any ESIs here. We remove the -IXGET
options (in order to see content of the response,
not only headers) to curl and search for esi:
1 |
|
The output is:
1 2 3 |
|
Now, investigate the response of each of these ESI fragments to understand what is going on. It is important to put that URL in single quotes as the URLS to the ESIs include special characters that can be interpreted by the shell.
1st ESI¶
1 |
|
You can also note that this ESI is handled by a controller in the FieldTypePage
bundle provided by Ibexa DXP.
The output is:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
The headers here look correct and do not indicate that this ESI will not be cached by the HTTP cache The second ESI has a similar response.
3rd ESI¶
1 |
|
This ESI is handled by a custom FooController::customAction
and the output of the command is:
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
The Cache-Control
and Vary
headers look correct. The request is handled by a custom controller and the Surrogate-Key
only contains the default ez-all
value.
This is not a problem as long as the controller
does not return values from any Content in the Ibexa DXP Repository. If it does, the controller should also add
the corresponding IDs to such objects in that header.
The Set-Cookie
here may cause the problem. A ESI fragment should never set a cookie because:
- Clients will only receive the headers set in the "mother" document (the headers in the "/" response in this case).
-
Only the content of ESIs responses will be returned to the client. No headers set in the ESI response will ever reach the client. ESI headers are only seen by the HTTP cache.
-
Symfony reverse proxy does not support ESIs at all, and any ESI calls (
render_esi()
) will implicitly be replaced by sub-requests (render()
). So anySet-Cookie
will be sent to the client when using Symfony reverse proxy. -
Fastly will flag it resource as "not cachable" because it set a cookie at least once. Even though that endpoint. stops setting cookies, Fastly will still not cache that fragment. Any document referring to that ESI will be a
MISS
. Fastly cache needs to be purged (Purge-all
request) in order to remove this flag. -
It means that it is not recommended to always initiate a session when loading the front page.
You must ensure that you do not unintendedly start a session in a controller used by ESIs, for example, when trying to access as session variable before a session has been initiated yet.