How to block rate-limited traffic with Varnish

Varnish is brilliant. For those who don’t know it is a reverse proxy cache / HTTP accelerator which serves cached content quickly. Really really quickly. But don’t just take my word for it:

At Abelson Info we have started work on a public API product, and one of our concerns has been how to implement effective and scalable rate-limiting. 

We already use Varnish to serve significant web traffic on behalf of Betfair so it seems a good fit to use as a front end cache for any public API we produce. So if we can also get it to implement rate-limiting (to be precise - blocking rate limited traffic) that will be great.

What we want is for our API application servers to count requests from a client (by IP address or API key) and when a rate-limit is exceeded, serve a header that indicates this, which Varnish can cache, and then block all subsequent requests (to any URL of the API) from that client until the block response expires; this means the application server controls the business logic that implements the rate limit, but Varnish deals with the rate-limited load.

This will be especially useful in the case of a denial of service attack, or what I like to call an “inadvertent denial of service attack” - when an innocent (but slightly incompetent) client writes a script that hammers your service with requests because they forgot to put a sleep in a loop (or similar). We have encountered these before with our non-public APIs, so opening up our APIs to everyone only makes the possibility of these more likely.

Here is my first attempt at a Varnish VCL (config file) that implements this:

# set back end as normal
backend default {
 .host = "127.0.0.1";
 .port = "81";
}

sub vcl_hash {
 # uncomment next line to rate limit based on IP address
 # set req.hash += client.ip;

 # uncomment next line to rate limit based on query parameter ('key' in this case)
 # set req.hash += regsub(regsub(req.url, ".+(\?|\&)key=", ""), "\&.*" ,"");
 return(hash);
}

sub vcl_fetch {
 # cache 400s (rate limited) responses
 if (beresp.status == 400) {
   set beresp.cacheable = true;
 } else {
   set beresp.ttl = 0s;
 }
 return(deliver);
}

What this is doing is hashing every request from the client to look like the same request, and only caching the 400 (rate-limited) responses. TTL (time to live) is set in the cache-control header from the application server, so it is fully in control of implementing the rate-limit, and when a rate-limit should expire.

Aside: I chose 400 to represent a rate-limited response as that is what Twitter uses, and feels the most appropriate response code. You can of course any response you like (within reason)

There are two flaws to be addressed with this.

  1. If you are blocking based on an URL parameter, it will not help in the case where a user is deliberately launching a DOS attack, as all they need to do is change the key for each request. However if you are under a deliberate DOS attack, you probably have bigger problems anyway.
  2. This uses Varnish only as a rate-limited traffic blocker - the hashing function does not allow for any other sort of caching.

Initially my solution to problem (B) was to implement a second Varnish instance as the default back end, which would then implement ‘normal’ caching. But a bit of experimentation and tweaking brought me to this:

# set back end as normal
backend default {
 .host = "127.0.0.1";
 .port = "81";
}

sub vcl_hash {
 if (req.http.X-rate-ok != "1") {
     # uncomment next line to rate limit based on IP address
     # set req.hash += client.ip;

     # uncomment next line to rate limit based on query parameter ('key' in this case)
     # set req.hash += regsub(regsub(req.url, ".+(\?|\&)key=", ""), "\&.*" ,"");
 } else {
     ### these 2 entries are the default ones used for vcl.
     set req.hash += req.url;
     set req.hash += req.http.host;
 }

 return(hash);
}

sub vcl_fetch {
 if (req.http.X-rate-ok != "1") {
   # first pass - only cache 400s (rate limited) responses
   if (beresp.status == 400) {
     set beresp.cacheable = true;
     return(deliver);
   } else {
     # not a rate limited response, so restart using normal hash function
     set req.http.X-rate-ok = "1";
     restart;
   }
 }

 # non-rate limited fetch
 return(deliver);
}


Here Varnish does an initial pass to see if there is a cached rate limited response, and if not it sets a flag (req.http.X-rate-ok) and restarts the request, which uses the default (or your custom) hashing function.

While we have not battle-tested this configuration yet, under test it seems to implement the exact functionality we want, and we look forward to using it on our live public platform.