Loadbalance your website with haproxy and varnish
In this post will show how to install haproxy and varnish. The setup will have haproxy as frontend and varnish will be between haproxy and the nodes. Why not use varnish as a frontend? Because in case you would like to use https varnish does not have https support. We will be using debian jessie as linux distribution for this installation.
Our setup will have an external box where we will have haproxy and varnish running and then a node (just one for simplicity).
INSTALLATION###
First we would install Varnish:
#apt-get install apt-transport-https
#curl https://repo.varnish-cache.org/GPG-key.txt | apt-key add -
#echo "deb https://repo.varnish-cache.org/debian/ jessie varnish-4.1" >> /etc/apt/sources.list.d/varnish-cache.list
#apt-get update
#apt-get install varnish
Then we will install HaProxy:
#echo "deb http://httpredir.debian.org/debian jessie-backports main" | \
tee /etc/apt/sources.list.d/backports.list
#apt-get update
#apt-get -t jessie-backports install haproxy
CONFIGURATION###
Now let's jump into configuring varnish. First we need to edit /etc/default/varnish
:
# Configuration file for Varnish Cache.
#
# /etc/init.d/varnish expects the variables $DAEMON_OPTS, $NFILES and $MEMLOCK
# to be set from this shell script fragment.
#
# Note: If systemd is installed, this file is obsolete and ignored. You will
# need to copy /lib/systemd/system/varnish.service to /etc/systemd/system/ and
# edit that file.
# Should we start varnishd at boot? Set to "no" to disable.
START=yes
# Maximum number of open files (for ulimit -n)
NFILES=131072
# Maximum locked memory size (for ulimit -l)
# Used for locking the shared memory log in memory. If you increase log size,
# you need to increase this number as well
MEMLOCK=82000
DAEMON_OPTS="-a :6081 \
-T localhost:6082 \
-f /etc/varnish/default.vcl \
-S /etc/varnish/secret \
-s malloc,256m"
Next file to configure is /etc/varnish/defaut.vcl
. First thing first we configure the backends:
backend default {
.host = "10.x.x.x";
.port = "80";
}
Then we add an ACL for with the hosts which should have the permission to purge the cache.
acl purge {
"localhost";
"127.0.0.1";
"10.x.x.x";
}
Under sub vcl_recv
section we do the following changes:
First thing first we do some cleaning and removing the cookies where you don't want to store them:
if (req.http.host == "(subdomain1|subdomain2).example.com") {
return (pipe);
}
We normalize the headers and remove the ports in case we do some testing on various ports:
set req.http.Host = regsub(req.http.Host, ":[0-9]+", "");
Allow purging from ACL and send 405 if not allowed:
if (req.method == "PURGE") {
if (!client.ip ~ purge) {
return(synth(405, "This IP is not allowed to send PURGE requests."));
}
return(purge);
}
POST requests will not be cached:
if (req.http.Authorization || req.method == "POST") {
return (pass);
}
WordPress specific configuration:
Do not cache RSS feed:
if (req.url ~ "/feed") {
return (pass);
}
Do not cache admin and login pages:
if (req.url ~ "/wp-(login|admin)") {
return (pass);
}
Remove the has_js, Google Analytics based, Quant Capital, wp-settings-1, wp-settings-time-1, wp test cookies and cookies left only with spaces or the ones which are empty:
set req.http.Cookie = regsuball(req.http.Cookie, "has_js=[^;]+(; )?", "");
set req.http.Cookie = regsuball(req.http.Cookie, "__utm.=[^;]+(; )?", "");
set req.http.Cookie = regsuball(req.http.Cookie, "__qc.=[^;]+(; )?", "");
set req.http.Cookie = regsuball(req.http.Cookie, "wp-settings-1=[^;]+(; )?", "");
set req.http.Cookie = regsuball(req.http.Cookie, "wp-settings-time-1=[^;]+(; )?", "");
set req.http.Cookie = regsuball(req.http.Cookie, "wordpress_test_cookie=[^;]+(; )?", "");
if (req.http.cookie ~ "^ *$") {
unset req.http.cookie;
}
Cache files with the followin extensions:
if (req.url ~ "\.(css|js|png|gif|jp(e)?g|swf|ico)") {
unset req.http.cookie;
}
Normalize the Accept-Encoding headers and compression
if (req.http.Accept-Encoding) {
if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)") {
unset req.http.Accept-Encoding;
} elsif (req.http.Accept-Encoding ~ "gzip") {
set req.http.Accept-Encoding = "gzip";
} elsif (req.http.Accept-Encoding ~ "deflate") {
set req.http.Accept-Encoding = "deflate";
} else {
unset req.http.Accept-Encoding;
}
}
Check for wordpress specific cookies:
if (req.http.Cookie ~ "wordpress_" || req.http.Cookie ~ "comment_") {
return (pass);
}
if (!req.http.cookie) {
unset req.http.cookie;
}
With this we are done with the wordpress related configuration.
Do not cache the HTTP authentication and HTTP cookies:
if (req.http.Authorization || req.http.Cookie) {
return (pass);
}
And cache all the other requests:
return (hash);
With this we are done with the configuration which should go to vcl_recv
.
Add the following to handle the pass and pipes.
sub vcl_pipe {
return (pipe);
}
sub vcl_pass {
return (fetch);
}
Here is the data which will take place on hashing:
sub vcl_hash {
hash_data(req.url);
if (req.http.host) {
hash_data(req.http.host);
} else {
hash_data(server.ip);
}
# If the client supports compression, keep that in a different cache
if (req.http.Accept-Encoding) {
hash_data(req.http.Accept-Encoding);
}
return (lookup);
}
Configure what happens if after we read the headers from the backends:
sub vcl_backend_response {
# Happens after we have read the response headers from the backend.
#
# Here you clean the response headers, removing silly Set-Cookie headers
# and other mistakes your backend does.
# Remove some headers we never want to see
unset beresp.http.Server;
unset beresp.http.X-Powered-By;
# For static content strip all backend cookies
if (bereq.url ~ "\.(css|js|png|gif|jp(e?)g)|swf|ico") {
unset beresp.http.cookie;
}
# don't cache response to posted requests or those with basic auth
if ( bereq.method == "POST" || bereq.http.Authorization ) {
set beresp.uncacheable = true;
set beresp.ttl = 120s;
return (deliver);
}
# don't cache search results
if ( bereq.url ~ "\?s=" ){
set beresp.uncacheable = true;
set beresp.ttl = 120s;
return (deliver);
}
# only cache status ok
if ( beresp.status != 200 ) {
set beresp.uncacheable = true;
set beresp.ttl = 120s;
return (deliver);
}
# A TTL of 24h
set beresp.ttl = 24h;
# Define the default grace period to serve cached content
set beresp.grace = 30s;
return (deliver);
}
When this is done cofigure what we are about to sent to the client.
sub vcl_deliver {
# Happens when we have all the pieces we need, and are about to send the
# response to the client.
#
# You can do accounting or modifying the final object here.
if (obj.hits > 0) {
set resp.http.X-Cache = "cached";
} else {
set resp.http.x-Cache = "uncached";
}
# Remove some headers: PHP version
unset resp.http.X-Powered-By;
# Remove some headers: Apache version & OS
unset resp.http.Server;
# Remove some heanders: Varnish
unset resp.http.Via;
unset resp.http.X-Varnish;
return (deliver);
}
sub vcl_init {
return (ok);
}
sub vcl_fini {
return (ok);
}
Now it's time to start the varnish and configure it to automatically start at boot time.
#systemctl enable varnish
#systemctl start varnish
Now it's time to configure HAProxy to read the request from varnish and send it to the client via HTTPS.
To do so you need to edit the /etc/haproxy/haproxy.cf
file.
The first part it is the global configuration which is pretty standard and it can be used as it is:
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
Now let's set up in the global part the certificate base folders:
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
Then configure the defalt ssl ciphers in the haproxy:
ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS
ssl-default-bind-options no-sslv3 no-tlsv10
tune.ssl.default-dh-param 4096
When this done it's time to configure the default section of the HAProxy:
defaults
log global
mode http
option httplog
option dontlognull
option forwardfor
http-reuse always
timeout connect 5000
timeout client 50000
timeout server 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
Now we can add the frontend and configure it so that non HTTPS requests would be automatically redirected to HTTPS:
frontend FRONTEND_NAME
bind *:80
bind *:443 ssl crt /etc/ssl/private/certificate.pem
acl secure dst_port eq 443
redirect scheme https if !{ ssl_fc }
rspadd Strict-Transport-Security:\ max-age=31536000;\ includeSubDomains;\ preload
rsprep ^Set-Cookie:\ (.*) Set-Cookie:\ \1;\ Secure if secure
default_backend BACKEND_NAME
Now we add the backend and configure it so that it would read the information from varnish:
backend BACKEND_NAME
http-request set-header X-Forwarded-Port %[dst_port]
http-request add-header X-Forwarded-Proto https if { ssl_fc }
server vpsieprod 127.0.0.1:6081
Optional we can add a statistic page on an alternative nodes where we can watch the status of the frontends and backends in case we have multiple ones.
listen statistics
bind *:9000
mode http
stats enable
stats show-desc VPSie HAProxy Status
stats uri /
Having this configured we can start the HAProxy and configure it so that it would start at the boot.
#systemctl enable haproxy
#systemctl start haproxy
In the web inspector we can see for static contents that they are cached.