Skip to main content

Redirecting dead links to my Hugo images with a Cloudflare Worker

·1173 words·6 mins

In my previous post, I explained how I used a Cloudflare Worker to basically build the missing Logs feature of Cloudflare Pages. I was trying to find redirections that I missed on my blog by moving and breaking stuff over the years.

Recently, I updated the theme of my blog, which uses Hugo. By doing so, I broke the image permalinks for two reasons:

  • I’m converting everything to webp, the superior image format
  • The fingerprints in the optimized image files have changed for some reason

Hugo image optimization #

Hugo is able to process and optimize images.

This means that for this in my source markdown file:

![nextdns](nextdns.png)

Hugo will convert the images to webp, keeping a fallback to the original format in case the browser doesn’t support webp, and provide multiple sizes of the image to load the most appropriate size for the reader’s screen:

<picture class="mb-6 rounded-md">
  <source
    srcset="
      https://stanislas.blog/2020/04/nextdns/nextdns_hu_6174a8ac082f0146.webp  330w,
      https://stanislas.blog/2020/04/nextdns/nextdns_hu_5659ade9cd25891b.webp  660w,
      https://stanislas.blog/2020/04/nextdns/nextdns_hu_e974d6830389ba5f.webp 1024w,
      https://stanislas.blog/2020/04/nextdns/nextdns_hu_516d42bcc4e19991.webp 1280w
    "
    sizes="100vw"
    type="image/webp" />
  <img
    width="1280"
    height="640"
    data-zoom-src="https://stanislas.blog/2020/04/nextdns/nextdns.png"
    class="mb-6 rounded-md medium-zoom-image"
    src="https://stanislas.blog/2020/04/nextdns/nextdns_hu_4e7fd8428258e6e4.png"
    srcset="
      https://stanislas.blog/2020/04/nextdns/nextdns_hu_b193c712a2c9a2ee.png  330w,
      https://stanislas.blog/2020/04/nextdns/nextdns_hu_4e7fd8428258e6e4.png  660w,
      https://stanislas.blog/2020/04/nextdns/nextdns_hu_236f3862db0e92a4.png 1024w,
      https://stanislas.blog/2020/04/nextdns/nextdns.png                     1280w
    "
    sizes="100vw"
  />
</picture>

As you can see, while the source image uses the actual file name, the generated one has a random part to it: nextdns_**hu_6174a8ac082f0146**.webp.

This random part is actually deterministic and will not change when Hugo rebuilds the website. However, for some reason, it changed with my theme. Hugo can get pretty complex sometimes, and I gave up trying to find out why. 😅

Concretely, here is a example of an old image URL:

https://stanislas.blog/2019/01/how-to-setup-vpn-server-wireguard-nat-ipv6/wireguard-vpn_hu333ae5654844a126b1b32ff4a4b8af90_20977_960x0_resize_linear_3.png

The image file name is:

wireguard-vpn_hu333ae5654844a126b1b32ff4a4b8af90_20977_960x0_resize_linear_3.png

Right now, if I go to the post, the image file name is actually something like:

wireguard-vpn_hu_36c9ef46a4fedce5.png

The first link now returns a 404 error since the Hugo-generated file no longer exists. This is bad for SEO, search engine crawlers, and anyone who referenced that image elsewhere.

I see a few options:

  • Redirect all 404s for images to the homepage
  • Redirect all 404s for images to their respective post, which can be obtained by trimming the URL
  • Redirect all 404s for images to their new, corresponding image

The first option is not very useful and probably doesn’t make sense for SEO.

The second option is more relevant—users could at least find the same image in the post. I’m not sure how crawling bots would react.

The last option is definitely the best, as the link juice transfers to the new URL of the same image, making it the best approach for damage control.

In my last post, I explained how I use the _redirects file on Cloudflare Pages to handle redirections.

The problem is I can’t really use the _redirects feature here since I don’t know in advance what the 404s are. There’s also no way to say “redirect but only if the status code is 404.”

I could go through my Git history, extract all the old image URLs, map them to the new ones, and generate redirections. But I’d probably hit the redirect limits quickly—even though the limits are higher with bulk redirects.

Cloudflare Worker back to the rescue again #

Now, since I was already playing with a Cloudflare Worker, why not use it for the redirect?

Here’s how I handled it with a worker sitting in front of the website:

  • The worker is invoked on each request
  • It requests the actual target (my blog on Cloudflare Pages)
  • It filters 404s for images, returning the response as-is for everything else
  • It gets the slug of the post and image
  • It fetches the post’s HTML
  • It searches the page for an image with a src matching the slug
  • If found, it returns a permanent (301) redirect to the image URL
  • If not, it falls back to redirecting to the post

Just like in my post five years ago, HTMLRewriter is still the best option for DOM parsing in a worker—it’s very fast.

Here’s the final implementation:

export default {
  async fetch(request, env, ctx) {
    const response = await fetch(request);

    // Passthrough non-404 requests
    if (response.status !== 404) return response;

    // Ignore if the URL is not for an image
    const url = new URL(request.url);
    const validExtensions = ['.png', '.jpg', '.jpeg', '.webp', '.gif'];
    if (!validExtensions.some((ext) => url.pathname.toLowerCase().endsWith(ext))) {
      return response;
    }

    // Derive post slug and image slug
    const pathParts = url.pathname.split('/');
    const postSlug = pathParts.slice(0, -1).join('/');
    const imageFilename = pathParts[pathParts.length - 1];
    const [imageSlug] = imageFilename.split('.');
    const imageBaseSlug = imageSlug.split('_hu')[0];

    // Fetch the post corresponding to the image
    const postUrl = new URL(postSlug, request.url);
    const postResponse = await fetch(postUrl);
    if (postResponse.status !== 200) return response;

    // Parse the post HTML to find the current image, based on the base file name
    let currentImageURL = null;
    await new HTMLRewriter()
      .on('img', {
        element(element) {
          const src = element.getAttribute('src') || '';
          const srcParts = src.split('/');
          const foundSlug = srcParts.pop().split('.')[0];
          if (!currentImageURL && foundSlug.startsWith(`${imageBaseSlug}_hu_`)) {
            currentImageURL = src;
          }
        },
      })
      .transform(postResponse)
      .text(); // Drain the stream

    // In the unlikely case we don't find an image, we redirect to the post
    if (!currentImageURL) {
      const redirectUrl = new URL(request.url);
      redirectUrl.pathname = postSlug + '/';

      console.log({
        action: 'Image not found, redirecting to post URL',
        redirectUrl: redirectUrl.toString(),
        originalRequest: {
          url: request.url,
          headers: Object.fromEntries(request.headers),
          method: request.method,
        },
      });

      return Response.redirect(redirectUrl, 301);
    }

    // Redirect to the full-size image and not the optimized one
    currentImageURL = currentImageURL.replace(/_hu[^.]+/, '');

    console.log({
      action: 'Image found, redirecting to image URL',
      redirectUrl: currentImageURL,
      originalRequest: {
        url: request.url,
        headers: Object.fromEntries(request.headers),
        method: request.method,
      },
    });

    return Response.redirect(currentImageURL, 301);
  },
};

It works!

➜ curl -I https://stanislas.blog/2019/01/how-to-setup-vpn-server-wireguard-nat-ipv6/wireguard-vpn_hu333ae5654844a126b1b32ff4a4b8af90_20977_960x0_resize_linear_3.png
HTTP/2 301
date: Wed, 19 Feb 2025 23:15:33 GMT
location: https://stanislas.blog/2019/01/how-to-setup-vpn-server-wireguard-nat-ipv6/wireguard-vpn_hu_36c9ef46a4fedce5.png

For not found images as well:

➜ curl -I https://stanislas.blog/2019/01/how-to-setup-vpn-server-wireguard-nat-ipv6/invalid-image.png
HTTP/2 301
date: Thu, 20 Feb 2025 13:07:57 GMT
location: https://stanislas.blog/2019/01/how-to-setup-vpn-server-wireguard-nat-ipv6/

In terms of performance, the additional impact is only for 404s on images, as we need to refresh the blog entry from the worker, making it roughly 2x slower. However, this is acceptable since it’s a rare occurrence, and the key point is that there is a redirection. A 200ms delay instead of 100ms is fine.

A simple optimization would be to use Workers KV to store a mapping between old and new images as they are accessed. This way, only the first redirection would be slower, with subsequent redirections being much faster.

Since I’m using permanent redirections (301), I expect these old URLs to be accessed less frequently over time.

Another optimization would be to skip fetching the underlying post and directly redirect to the source image file. However, since I used webp for some images, I can’t determine the source file extension in advance, necessitating fetching the post. For example, I can’t know if a dead link to nextdns_hu_6174a8ac082f0146.webp should redirect to nextdns.png or nextdns.jpg. I could use HEAD requests in the worker to figure it out, but I prefer to keep my current solution.

Cloudflare Worker logs
Fixing the dead links, one at a time!

Anyway, I’m not sure if any of this is actually useful, but at least it was fun!