Tuesday, February 5, 2013

NGINX as sticky balancer for HA using cookies

I recently needed to find a solution to implement several http load balancers that balance among several backend servers so each balancer can proxyfy any of the backend servers. The abstract architecture of the deploy is represented in this image.


Yes, the backends have an NGINX, I'll explain that later...

Of course one of the requirements was that the session had to be kept but there was a problem: the backend server did not synchronized the session among themselves. This situation directly required the balancers to send each user to the same backend server each time the user makes a request. This way,  as the one user is using only one backend, there won't be any problem with the session persistence.

I know that the Upstream Module from NGINX provide the above feature through the ip_hash parameter as, if you have in all the balancer servers the same list of backends in the same order, the IP for a user will always match the same backend server. You can check this taking a look at the module source code at
nginx-X.X.XX/src/http/modules/ngx_http_upstream_ip_hash_module.c.
However, using this module has several cons:
  • No support for IPv6. In fact, all request from an IPv6 will be redirected to the first backend server (as I understood from the source code). NOTE: IPv6 addresses are supported starting from versions 1.3.2 and 1.2.2. The version included in last ubuntu (12.10) is 1.1.19 and it is the version I was testing so it lacks of ipv6 support.
  • Colissions as it only uses the 3 first numbers of the IP for the hash. That means that all the ips of the same C-class network range will go to the same backend server.
  • All users behind a NAT will access to the same backend server.
  • If you add new backends, all the hashes will change and sessions will be lost.
Because of these problems there are some situations where the balacers will not be fair at all, overloading some nodes while others are idle. For this reason I wondered if it could be possible to balance among servers using other solution than ip_hash.

I found the project nginx-sticky-module which uses cookies for balance among backneds, but it is not integrated into nginx so I had to recompile it which I don't really like for production environments as I prefer to use official packages from the GNU/Linux distribution. So I wondered... would be NGINX so versatile that would let me implement something like this module only using configuration parameters? Guess the answer! There exist a way!

So, now that we are in context, let's start with a little test. Our objective in this example is to use nginx to balance between two backend nodes using cookies. The solution we will implement will work the way it is represented in the following diagram.






In the diagram you can see that the user USER0 when accessing to our application service is directed to balancer1 probably through a DNS round robin resolution. In step 1 the user accesses for the first time to the service so the balancer1 will choose one backend by round robin and, in this example, backend1 was chosen (for the luck of our draftsman). This backend will set the cookie backend=backend1 so the client have "registered" in the backend.

At step 2 the user will access the platform for the second time. This time, the DNS round robin will send our user to the balancer balancer3 and, thanks to the cookie named backend, the balancer will be able to know which backned should attend the request: backend1.

Now that we know how it works, how do we make it work with NGINX ?


Well, lets talk first about how the cookie is set in the backend server. If you can change some code on the web application you can force the application to set the cookie for you. If not, you can proxify the application with NGINX in the same server and you could even use that NGINX to implement the cache of the platform. I will choose to proxyfy the backend application with NGINX as it is a generic solution to our problem.

So a very simple configuration of the NGINX in the backend server backend1 would look like this:


server {
   listen 80;

   location / {
      add_header Set-Cookie backend=backend1;
      proxy_pass http://localhost:8000/;
      proxy_set_header X-Real-IP $remote_addr;
   }
}

Each backend server will have to have an specific configuration with its backend identifier. I recommend you for testing to set up two backends like this one with ids backend1 and backend2. Of course in production environments we could use md5 hashes or uuids to avoid predictable identifiers.

We can now configure a balancer to balance between these two backends. The balancer configuration would look like this:

upstream rrbackend  {
   server 192.168.122.201;
   server 192.168.122.202;
}

map $cookie_backend $sticky_backend {
   default bad_gateway;
   backend1 192.168.122.201;
   backend2 192.168.122.202;
}

server {
    listen 80;


    location / {
       error_page 502 @rrfallback;
       proxy_pass http://$sticky_backend$uri;
    }
  
   location @rrfallback {
      proxy_pass  http://rrbackend;
   }
}

The good part is that the configuration will be the same for all balancers if you have more than one. But lets explain how this configuration works.

In first place we declare the backends for the round robin access.

Then we map the cookie backend variable to the $sticky_backend NGINX variable. If $cookie_backend is backend1, then $sticky_backend value will be 192.168.122.201 and so on. If no match is found, the default value bad_gateway will be used. This part will be used to choos the sticky backend.

The next thing is to declare the server part, with its port (server_name and whatever) and its locations. Note that we have a @rrfallback location that resolves to a backend using round robin. When a user access the location /, two things can happen:
  1. The user accesses for the first time so no backend cookie is provided and $sticky_backend NGINX variable will have the value bad_gateway. In this case, the proxy_pass will try to access a server called bad_gateway which will return an error 502 (Bad Gateway) and, as declared in the line above the proxy_pass, this error will fallback to @rrfallback so a backend will be choosed by round robin.
  2. The users passes the backend cookie with a valid value so the $sticky_backend NGINX variable will have a valid IP from the map and redirected to the backend indicated in the cookie through the proxy_pass.
  3. The same than 2. but the backend is down. In this case, the proxy_pass will return a 502 error and the process will start from 1: new backend by round-robin.

And this is all! You can test it both with a browser to check that you'll stay on the same server or make requests with curl/wget without sending the cookie to check the initial round-robin.

As always, comments are more than welcome!