Loadbalance your website with haproxy and varnish

In this post will show how to install haproxy and varnish. The setup will have haproxy as frontend and varnish will be between haproxy and the nodes. Why not use varnish as a frontend? Because in case you would like to use https varnish does not have https support. We will be using debian jessie as linux distribution for this installation.

Our setup will have an external box where we will have haproxy and varnish running and then a node (just one for simplicity).

INSTALLATION###

First we would install Varnish:

#apt-get install apt-transport-https
#curl https://repo.varnish-cache.org/GPG-key.txt | apt-key add -
#echo "deb https://repo.varnish-cache.org/debian/ jessie varnish-4.1" >> /etc/apt/sources.list.d/varnish-cache.list
#apt-get update
#apt-get install varnish

Then we will install HaProxy:

#echo "deb http://httpredir.debian.org/debian jessie-backports main" | \
tee /etc/apt/sources.list.d/backports.list
#apt-get update
#apt-get -t jessie-backports install haproxy

CONFIGURATION###

Now let's jump into configuring varnish. First we need to edit /etc/default/varnish:

# Configuration file for Varnish Cache.
#
# /etc/init.d/varnish expects the variables $DAEMON_OPTS, $NFILES and $MEMLOCK
# to be set from this shell script fragment.
#
# Note: If systemd is installed, this file is obsolete and ignored.  You will
# need to copy /lib/systemd/system/varnish.service to /etc/systemd/system/ and
# edit that file.

# Should we start varnishd at boot?  Set to "no" to disable.
START=yes

# Maximum number of open files (for ulimit -n)
NFILES=131072

# Maximum locked memory size (for ulimit -l)
# Used for locking the shared memory log in memory.  If you increase log size,
# you need to increase this number as well
MEMLOCK=82000

DAEMON_OPTS="-a :6081 \
             -T localhost:6082 \
             -f /etc/varnish/default.vcl \
             -S /etc/varnish/secret \
             -s malloc,256m"

Next file to configure is /etc/varnish/defaut.vcl. First thing first we configure the backends:

backend default {
   .host = "10.x.x.x";
   .port = "80";
}

Then we add an ACL for with the hosts which should have the permission to purge the cache.

acl purge {
    "localhost";
    "127.0.0.1";
    "10.x.x.x";
}

Under sub vcl_recv section we do the following changes:
First thing first we do some cleaning and removing the cookies where you don't want to store them:

if (req.http.host == "(subdomain1|subdomain2).example.com") {
     return (pipe);
}

We normalize the headers and remove the ports in case we do some testing on various ports:

set req.http.Host = regsub(req.http.Host, ":[0-9]+", "");

Allow purging from ACL and send 405 if not allowed:

if (req.method == "PURGE") {
    if (!client.ip ~ purge) {
        return(synth(405, "This IP is not allowed to send PURGE requests."));
    }
    return(purge);
}

POST requests will not be cached:

if (req.http.Authorization || req.method == "POST") {
    return (pass);
}

WordPress specific configuration:
Do not cache RSS feed:

if (req.url ~ "/feed") {
    return (pass);
}

Do not cache admin and login pages:

if (req.url ~ "/wp-(login|admin)") {
    return (pass);
}

Remove the has_js, Google Analytics based, Quant Capital, wp-settings-1, wp-settings-time-1, wp test cookies and cookies left only with spaces or the ones which are empty:

set req.http.Cookie = regsuball(req.http.Cookie, "has_js=[^;]+(; )?", "");
set req.http.Cookie = regsuball(req.http.Cookie, "__utm.=[^;]+(; )?", "");
set req.http.Cookie = regsuball(req.http.Cookie, "__qc.=[^;]+(; )?", "");
set req.http.Cookie = regsuball(req.http.Cookie, "wp-settings-1=[^;]+(; )?", "");
set req.http.Cookie = regsuball(req.http.Cookie, "wp-settings-time-1=[^;]+(; )?", "");
set req.http.Cookie = regsuball(req.http.Cookie, "wordpress_test_cookie=[^;]+(; )?", "");

if (req.http.cookie ~ "^ *$") {
unset req.http.cookie;
}
Cache files with the followin extensions:

if (req.url ~ "\.(css|js|png|gif|jp(e)?g|swf|ico)") {
    unset req.http.cookie;
}

Normalize the Accept-Encoding headers and compression

if (req.http.Accept-Encoding) {
    if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)") {
        unset req.http.Accept-Encoding;
    } elsif (req.http.Accept-Encoding ~ "gzip") {
        set req.http.Accept-Encoding = "gzip";
    } elsif (req.http.Accept-Encoding ~ "deflate") {
        set req.http.Accept-Encoding = "deflate";
    } else {
        unset req.http.Accept-Encoding;
    }
}

Check for wordpress specific cookies:

if (req.http.Cookie ~ "wordpress_" || req.http.Cookie ~ "comment_") {
    return (pass);
}
if (!req.http.cookie) {
    unset req.http.cookie;
}

With this we are done with the wordpress related configuration.

Do not cache the HTTP authentication and HTTP cookies:

if (req.http.Authorization || req.http.Cookie) {
    return (pass);
}

And cache all the other requests:

return (hash);

With this we are done with the configuration which should go to vcl_recv.

Add the following to handle the pass and pipes.

sub vcl_pipe {
    return (pipe);
}

sub vcl_pass {
    return (fetch);
}

Here is the data which will take place on hashing:

sub vcl_hash {
    hash_data(req.url);
    if (req.http.host) {
        hash_data(req.http.host);
    } else {
        hash_data(server.ip);
    }

# If the client supports compression, keep that in a different cache
    if (req.http.Accept-Encoding) {
        hash_data(req.http.Accept-Encoding);
    }
    return (lookup);
}

Configure what happens if after we read the headers from the backends:

sub vcl_backend_response {
# Happens after we have read the response headers from the backend.
#
# Here you clean the response headers, removing silly Set-Cookie headers
# and other mistakes your backend does.
# Remove some headers we never want to see
    unset beresp.http.Server;
    unset beresp.http.X-Powered-By;

# For static content strip all backend cookies
    if (bereq.url ~ "\.(css|js|png|gif|jp(e?)g)|swf|ico") {
            unset beresp.http.cookie;
    }

# don't cache response to posted requests or those with basic auth
    if ( bereq.method == "POST" || bereq.http.Authorization ) {
            set beresp.uncacheable = true;
            set beresp.ttl = 120s;
            return (deliver);
    }

# don't cache search results
    if ( bereq.url ~ "\?s=" ){
            set beresp.uncacheable = true;
            set beresp.ttl = 120s;
            return (deliver);
    }

# only cache status ok
    if ( beresp.status != 200 ) {
            set beresp.uncacheable = true;
            set beresp.ttl = 120s;
            return (deliver);
    }

# A TTL of 24h
    set beresp.ttl = 24h;
# Define the default grace period to serve cached content
    set beresp.grace = 30s;

    return (deliver);


}

When this is done cofigure what we are about to sent to the client.

sub vcl_deliver {
# Happens when we have all the pieces we need, and are about to send the
# response to the client.
#
# You can do accounting or modifying the final object here.
    if (obj.hits > 0) {
            set resp.http.X-Cache = "cached";
    } else {
            set resp.http.x-Cache = "uncached";
    }

# Remove some headers: PHP version
    unset resp.http.X-Powered-By;

# Remove some headers: Apache version & OS
    unset resp.http.Server;

# Remove some heanders: Varnish
    unset resp.http.Via;
    unset resp.http.X-Varnish;

    return (deliver);
}

sub vcl_init {
    return (ok);
}

sub vcl_fini {
    return (ok);
}

Now it's time to start the varnish and configure it to automatically start at boot time.

#systemctl enable varnish
#systemctl start varnish

Now it's time to configure HAProxy to read the request from varnish and send it to the client via HTTPS.
To do so you need to edit the /etc/haproxy/haproxy.cf file.
The first part it is the global configuration which is pretty standard and it can be used as it is:

global
    log /dev/log    local0
    log /dev/log    local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

Now let's set up in the global part the certificate base folders:

    ca-base /etc/ssl/certs
    crt-base /etc/ssl/private

Then configure the defalt ssl ciphers in the haproxy:

    ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS
    ssl-default-bind-options no-sslv3 no-tlsv10
    tune.ssl.default-dh-param 4096

When this done it's time to configure the default section of the HAProxy:

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    option  forwardfor
    http-reuse always
    timeout connect 5000
    timeout client  50000
    timeout server  50000
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

Now we can add the frontend and configure it so that non HTTPS requests would be automatically redirected to HTTPS:

frontend FRONTEND_NAME
    bind *:80
    bind *:443 ssl crt /etc/ssl/private/certificate.pem
    acl secure dst_port eq 443
    redirect scheme https if !{ ssl_fc }
    rspadd Strict-Transport-Security:\ max-age=31536000;\ includeSubDomains;\ preload
    rsprep ^Set-Cookie:\ (.*) Set-Cookie:\ \1;\ Secure if secure
    default_backend BACKEND_NAME

Now we add the backend and configure it so that it would read the information from varnish:

backend BACKEND_NAME
    http-request set-header X-Forwarded-Port %[dst_port]
    http-request add-header X-Forwarded-Proto https if { ssl_fc }
    server vpsieprod 127.0.0.1:6081 

Optional we can add a statistic page on an alternative nodes where we can watch the status of the frontends and backends in case we have multiple ones.

listen statistics
    bind *:9000
    mode http
    stats enable
    stats show-desc VPSie HAProxy Status
    stats uri /

Having this configured we can start the HAProxy and configure it so that it would start at the boot.

#systemctl enable haproxy
#systemctl start haproxy

In the web inspector we can see for static contents that they are cached.