CDNs and Website Caching

Want to use a CDN or cache? Here are some things to consider to ensure your website doesn't break.

Posted by Steven Tan on 26th August 2020

Why use CDNs or Website Caching?

Using a content distribution network (CDN) or website caching software can tremendously improve the performance to your website through caching content or website assets. This can lead to improved search engine optimisation (SEO) as the website loads quicker. Another beneficial side-effect is that your users remain on your website for longer as when a website takes longer than 3 seconds to load, users generally start to look elsewhere for their content.

What is a CDN?

A CDN is a network of servers around the world that is designed to cache and serve traffic to users. By leveraging these networks, you are in a position to cache your content in a point of presence (POP) closest to the user. This lowers the latency experienced when loading your content, improving the user-experience of anyone browsing your site. It is generally recommended to use a CDN to serve your static assets at a minimum to improve your website loading speeds due to the global network of higher-capacity servers.

What is Caching?

Cached content is a version of your data that is stored in either of following places:

This can decrease the amount of time it takes for your website to load, and potentially even have it instantly load on the user's browser if they revisit your page at a future date. Due to the content being served elsewhere, you can also decrease the load your server experiences as each user visit doesn't result in another request to your web-servers. You can also save some cost here, as your server does not need to serve traffic for each request.

What Does it Look Like?

Traffic from the user will hit a CDN server, which serves cached data (or will request data from your web server if a cached version doesn't exist) to the internet. This means that the CDN will always sit in between your user and your server. Here's a simplified diagram to show how internet traffic behaves when using a CDN.

CDN Network Traffic

I Need This Now!

Wow! Why don't we just put a CDN in front of my servers, and cache everything you might ask? Well that is a good question. You definitely should be using CDNs or caching where you can. Despite this, there are also a multitude of reasons why you shouldn't just enable caching on everything. Here is a small list of things you should think about when configuring caching or CDNs in front of your website.

Caching Your Content

The biggest issue with caching your content is that this can also inadvertently cache content that you don't want cached. Content such as personal information (name, address, email, etc) and even credit-card information. Configuring your caching incorrectly can cause these details to be leaked to other users on your website. There are multiple ways to get around this, one being to disable caching on these pages entirely. Another is to cache on a per-session or cookie basis. This will mean that the cache will take the cookie, and cache content for the specific cookies provided by the browser as well as the URL. This will ensure that content is cached for all your users, and only the valid user will be able to access their cached content / information.

Cache Expiry

A cache expiry in layman terms is how long your data will be cached for. You will generally want to cache content for as long as you reasonably can. Static content such as images and CSS can generally be cached for periods of a week to a month without any issues. Your website pages and Javascript may warrant shorter caching periods as these are generally more frequently updated. I've seen this configured from a day and even to a week in some cases. You will need to find a sensible period of time, and adjust as necessary. If your cached data needs to be refreshed prior to the cache expiry, you can consider invalidating your cache to remove cached content. Although the feature is available to use, you shouldn't rely on this too heavily as some providers will charge for excessive amounts.

Client IP Addresses

As traffic to your website is now handled by the CDN, your website will not be able to use the usual methods to identify the IP addresses of your clients. Some CDN providers will use the X-Forwarded-For header to provide the IP address of the end user, and you will need to change your logging / website behaviour to account for this. I'll guide you to the docs for NGINX, Apache and IIS as these are the most likely to be used.

CDN Timeouts

A timeout occurs when the website does not respond within a certain amount of time. Configuring this correctly can be tricky, as sometimes it may take longer than expected to process data. General recommendations usually include keeping your content loading under 1 second where possible. Despite best efforts, you might notice longer wait-times on your website. This can be due to a plethora of reasons including (but not limited to) inefficient SQL queries / code, 3rd party APIs your website depend on or even simply a spike in traffic. All of these factors can play a part in your website responding slower than usual. You should consider these factors when configuring your timeouts, and even increase it for specific pages that are known to take longer to respond.

CDN Retries

When a timeout occurs, some CDNs will allow you to retry the request. You might initially think that this is a great idea, but there are some pitfalls you might encounter when doing this. As the retry will cause another HTTP request to trigger, this can cause issues when the specific request involves sending data. If this is not accounted for, duplicate entries on the database can be inserted. Even when retrieving data, the timeout may likely be due to the website taking longer to process data than normal. Performing a retry will likely result in the server queuing the same request multiple times eventually resulting in a gridlock. If this occurs, you will have accidentally executed a distributed denial of service (DDOS) attack against your own website. You will need to consider which pages are allowed to be retried when enabling this option, as these side-effects may result in your website quickly becoming inaccessible with undesirable knock-on effects. As more and more users start to browse your website (or even refresh when errors occur), retries will start to trigger and render the server unable to respond to most of the requests.

Web Application Firewalls

Some CDN providers will allow you to configure a web application firewall (WAF) as part of their service. A WAF service involves rules that each request must abide by which can greatly improve your website security by blocking malicious traffic from ever hitting your servers. This can range from bot traffic, SQL injections and sometimes even for 0-day attacks.

Choosing a CDN Provider

CDN providers by nature act as a man in the middle for traffic between end users and your servers. This means that the CDN will have access to all traffic being passed through their network in its decrypted state. Due to this, you should only choose reputable providers to serve your traffic as you will need to trust that these providers will not modify content or siphon your data in any way when traversing the internet. It may be tempting to use that random free provider that you found whilst digging online, but you may damage your reputation in irreparable ways. Additionally, if your company must follow regulations such as PCI-DSS then you will definitely need to limit your providers to those that guarantee you remain compliant.

Recommendations?

I will only list out providers I have experience with below. The features provided by these providers are generally quite comparable, and decisions will largely depend on personal preference or cost. Please do your own research when choosing a provider and take my suggestions with a grain of salt, as I am not privy to any rules or regulations you are bound by. As a disclaimer, I'm not affiliated with or sponsored by these providers in any way whatsoever and there are other great alternatives out there that I have not listed.