pressflow varnish installation and configuration

Installation

Varnish is the key software that speeds up your web site.

It is Open Source, built on industry standards and requires very few resources.

Varnish is distributed in the EPEL (Extra Packages for Enterprise Linux) package repositories. However, while EPEL allows new versions to be distributed, it does not allow for backwards-incompatible changes.

Therefore, new major versions will not hit EPEL and it is therefore not necessarily up to date.

If you require a newer major version than what is available in EPEL, you should use the repository provided by varnish-cache.org. To use the varnish-cache.org repository,

run

rpm --nosignature -i http://repo.varnish-cache.org/redhat/el5/noarch/varnish-release-2.1-2.noarch.rpm

and then run

yum install varnish

 

The --no-signature is only needed on initial installation, since the Varnish GPG key is not yet in the yum keyring Now it's time to configure varnish.

Configuration

1. configure varnish port

Varnish configuration resides (on most linux distributions) in

/etc/default/varnish.

Edit this file and in specific the following details

DAEMON_OPTS="-a :6081

-T localhost:6082 
-f /etc/varnish/default.vcl 
-S /etc/varnish/secret
-s file,/var/lib/varnish/$INSTANCE/varnish_storage.bin,1G" 

-a is the port that will listen for requests. For the our server default this to 80.

We want the caching server to accept requests on port 80.

-T is the port for the administrative interface. This should only be accessible by administrators.

-f is the vcl file that we will configure below.

-S should not be altered.

The content of this file (/etc/varnish/secret) should be changed to A-KEY-COMES-HERE-SO-MAKE-HASH-NOW

You should put this key also in your varnish module settings.

-s is the place to store cached data. when you are changing this, please be aware of file permission problems.

performance consideration : It is also advised to set the mount option noatime and nodiratime on the filesystems where you keep your Varnish data(.bin) file(s). 
The filesystem will not keep access dates and times for these files as they are of no importance and only delay the process. 

2. configure varnish with backends

find the default.vcl file used on your system.

In most linux distributions this is /etc/varnish/default.vcl. Write the following:

# If you're running a single site on a server, or else want all sites 
# on a server to go through Varnish you'd only need one of the following backends. 
# Showing different possibilities for those who have sites that they # don't want to run Varnish on. In this example file, Varnish is assumed to
# be running on port 80, and Apache (or whatever) on port 8080

. backend default {

.host = "127.0.0.1";
.port = "80";
#imPORTant#
.connect_timeout = 600s;
.first_byte_timeout = 600s;
.between_bytes_timeout = 600s;
.max_connections = 800;

}

backend MYSITE {

.host = "127.0.0.1";
.port = "80";
#imPORTant#
.connect_timeout = 600s;
.first_byte_timeout = 600s;
.between_bytes_timeout = 600s;
.max_connections = 800;

}
# #backend siteb {
# .host = "127.0.0.1";
# .port = "8080";
# .connect_timeout = 600s;
# .first_byte_timeout = 600s;
# .between_bytes_timeout = 600s;
# .max_connections = 800;
#}

 

#Start of the receive subroutine

sub vcl_recv { 

# Now we use the different backends based on the uri of the site. Again, this is
# not needed if you're running a single site on a server

if (req.http.host ~ "MYSITE.be$") { 
  set req.backend = MYSITE; 

#} else if (req.http.host ~ "siteb.com$") {
# set req.backend = siteb; } else {
# Use the default backend for all other requests

}else{
set req.backend = default;

}
# Allow a grace period for offering "stale" data in case backend lags

set req.grace = 5m;

remove req.http.X-Forwarded-For;

set req.http.X-Forwarded-For = client.ip;

# Properly handle different encoding types

if (req.http.Accept-Encoding) {

if (req.url ~ "\.(jpg|jpeg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf)$") {

  # No point in compressing these

  remove req.http.Accept-Encoding;

} elsif (req.http.Accept-Encoding ~ "gzip") {

  set req.http.Accept-Encoding = "gzip";

} elsif (req.http.Accept-Encoding ~ "deflate") { 

  set req.http.Accept-Encoding = "deflate";

} else {

# unkown algorithm remove req.http.Accept-Encoding;

}

}

# Force lookup if the request is a no-cache request from the client

if (req.http.Cache-Control ~ "no-cache") {

return (pass);

}

## Default request checks

if (req.request != "GET" &&

req.request != "HEAD" &&
req.request != "PUT" &&
req.request != "POST" &&
req.request != "TRACE" &&
req.request != "OPTIONS" &&
req.request != "DELETE") {

# Non-RFC2616 or CONNECT which is weird.

return (pipe);
}

if (req.request != "GET" && req.request != "HEAD") {

# We only deal with GET and HEAD by default
return (pass);

}

## Modified from default to allow caching if cookies are set, but not http auth

if (req.http.Authorization) {

/* Not cacheable by default */

return (pass);

}

## This would make varnish skip caching for this particular site

if (req.http.host ~ "internet-safety.yoursphere.com$") {

return (pass);

}

# This makes varnish skip caching for every site except this one 
# Commented out here, but shown for sake of some use cases
#if (req.http.host != "sitea.com") {
# return (pass);
#}

## Remove has_js and Google Analytics cookies.

set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-z]+|has_js)=[^;]*", "");

## Remove a ";" prefix, if present.

set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");

## Remove empty cookies.

if (req.http.Cookie ~ "^\s*$") { unset req.http.Cookie; }

## Pass cron jobs

if (req.url ~ "cron.php") { return (pass); }

# Pass server-status

if (req.url ~ ".*/server-status$") { return (pass); }

# Don't cache install.php

if (req.url ~ "install.php") { return (pass); }

# Cache things with these extensions

if (req.url ~ "\.(js|css|jpg|jpeg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf)$") {

return (lookup);

}

# Don't cache Drupal logged-in user sessions 
# LOGGED_IN is the cookie that earlier version of Pressflow sets 
# VARNISH is the cookie which the varnish.module sets

if (req.http.Cookie ~ "(VARNISH|DRUPAL_UID|LOGGED_IN)") { return (pass);

}

}

sub vcl_fetch {

# Grace to allow varnish to serve content if backend is lagged
# set obj.grace = 5m;
# These status codes should always pass through and never cache.

if (beresp.status == 404 || beresp.status == 503 || beresp.status == 500) {

set beresp.http.X-Cacheable = "NO: 
beresp.status";
set beresp.http.X-Cacheable-status = beresp.status;
return (pass);

}

if (req.url ~ "\.(js|css|jpg|jpeg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf)$") {

unset req.http.Set-Cookie;

}

if (req.url ~ "^/files/") {

unset req.http.Set-Cookie; 
set beresp.cacheable = true;

}

if (req.url ~ "^/sites/") {

unset req.http.Set-Cookie;
set beresp.cacheable = true;

}

if (!beresp.cacheable) {

set beresp.http.X-Cacheable = "NO: !obj.cacheable"; 
return (pass);

} else {

# From http://varnish-cache.org/wiki/VCLExampleLongerCaching /* Remove Expires from backend, it's not long enough */
unset beresp.http.expires;
# These TTLs are based on the specific paths and may not apply to your site.
# You could just set a single default TTL if you want.
if (req.url ~ "(.js|.css)$") {

set beresp.ttl = 60m; // js and css files ttl 60 minutes

} else if (req.url ~ "(^/articles/)|(^/tags/)|(^/taxonomy/)") {

set beresp.ttl = 10m; // list page ttl 10 minutes

} else if (req.url ~ "^/article/") {

set beresp.ttl = 5m; // article ttl 5 minutes

} else {

set beresp.ttl = 45m; // default ttl 45 minutes

}

/* marker for vcl_deliver to reset Age: */

set beresp.http.magicmarker = "1";

# All tests passed, therefore item is cacheable

set beresp.http.X-Cacheable = "YES";

}

return (deliver);

}

 

sub vcl_deliver {

# From http://varnish-cache.org/wiki/VCLExampleLongerCaching

if (resp.http.magicmarker) {

/* Remove the magic marker */

unset resp.http.magicmarker;

/* By definition we have a fresh object */

set resp.http.age = "0";

}

}

 

sub vcl_error {

if (obj.status == 503 && req.restarts < 5) {

set obj.http.X-Restarts = req.restarts; restart;

}

}

# Added to let users force refresh

sub vcl_hit {

set obj.http.X-Cache = "HIT"; //

set resp.http.X-Cache-Hits = OBJ.hits;

if (!obj.cacheable) {

return (pass);

}

if (req.http.Cache-Control ~ "no-cache") {

# Ignore requests via proxy caches, IE users and badly behaved crawlers
# like msnbot that send no-cache with every request.

if (! (req.http.Via || req.http.User-Agent ~ "bot|MSIE")) {

set obj.ttl = 0s;

return (restart);

}

}

return (deliver);

}

sub vcl_miss {

#set beresp.http.X-Cache = "MISS";

}

Starting varnish

Starting varnish caching on a specific port and with specific configuration...

First we kill the old varnishd process:

# pkill varnishd and stop your web server.

Edit the configuration for your web server and make it bind to the same port as configured in this vcl file (see #imPORTant#). Start up your web server and then start varnish:

# varnishd -f /usr/local/etc/varnish/default.vcl -s malloc,1G -T 127.0.0.1:2000 -f /usr/local/etc/varnish/default.vcl

The -f options specifies what configuration varnishd should use. -s malloc,1G The -s options chooses the storage type Varnish should use for storing its content.

I used the typemalloc, which just uses memory for storage. There are other backends as well, described in :ref:tutorial-storage. 1G specifies how much memory should be allocated- one gigabyte. -T127.0.0.1:2000

Varnish has a built-in text-based administration interface. Activating the interface makes Varnish manageble without stopping it. You can specify what interface the management interface should listen to. Make sure you don’t expose the management interface to the world as you can easily gain root access to a system via the Varnish management interface. I recommend tieing it to localhost.

If you have users on your system that you don’t fully trust, use firewall rules to restrict access to the interface to root only. -a0.0.0.0:8080 specify that I want Varnish to listen on port 8080 for incomming HTTP requests. For a production environment you would probably make Varnish listen on port 80, which is the default. Static content Selecting a backend based on the type of document This is used to host static content from a specially equipped server.

This can be done with the regular expression matching operator.

sub vcl_recv {

if (req.url ~ "\.(gif|jpg|swf|css|j)$") {

unset req.http.cookie;

unset req.http.authenticate;

set req.backend = b1;

} else {

set req.backend = b2;

}

}

Health checks and backup servers

At some point you might need Varnish to cache content from several servers. You might want Varnish to map all the URL into one single host or not. There are lot of options. Lets say we need to introduce a Java application into out PHP web site. Lets say our Java application should handle URL beginning with /java/. We manage to get the thing up and running on port 8000. Now, lets have a look a default.vcl.:

backend default {

.host = "127.0.0.1";

.port = "8080";

}

#We add a new backend.: 

backend java {

.host = "127.0.0.1";
.port = "8000";

}

#Now we need tell where to send the difference URL. Lets look at vcl_recv.:

sub vcl_recv {

if (req.url ~ "^/java/") {

set req.backend = java;

} else {

set req.backend = default.

}

}

#It’s quite simple, really. Lets stop and think about this for a moment. As you can see you can define how you choose backends based on really arbitrary data.

Varnish Redirect for mobile devices

You want to send mobile devices to a different backend?

No problem. if (req.User-agent ~ /mobile/) .... should do the trick: Example given:

 

if (req.http.User-Agent ~ "iPad"  
|| req.http.User-Agent ~ "iP(hone|od)"
|| req.http.User-Agent ~ "Android"
|| req.http.User-Agent ~ "SymbianOS"
|| req.http.User-Agent ~ "^BlackBerry"
|| req.http.User-Agent ~ "^SonyEricsson"
|| req.http.User-Agent ~ "^Nokia"
|| req.http.User-Agent ~ "^SAMSUNG"
|| req.http.User-Agent ~ "^LG") {
# Define smartphones, tablets and phones
set req.http.X-Device = "mobile";
error 750;
}
 
sub vcl_error {
   if (obj.status == 503 && req.restarts < 4) {
                return(restart);
   }
   if(obj.status == 750){
     set obj.http.Location = "http://MYMOBILEURL";
     set obj.status = 302;
     return (deliver);
   }
}

 

Directors (and load balancing with varnish)

You can also group several backend into a group of backends.

These groups are called directors.

This will give you increased performance and resilience.
You can define several backends and group them together in a director.:

backend server1 {

.host = "192.168.0.10";

}

backend server2{

.host = "192.168.0.10";

} Now we create the director.:

director example_director round-robin {

{ .backend = server1; } 

{ .backend = server2; } 

}

This director is a round-robin director. This means the director will distribute the incoming requests on a round-robin basis.

There is also a random director which distributes requests in a, you guessed it, random fashion. But what if one of your servers goes down?

Can Varnish direct all the requests to the healthy server? Sure it can. This is where the (load balancer) Health Checks come into play.

Health checks Lets set up a director with two backends and health checks.
First lets define the backends.:

backend server1 { 

.host = "server1.example.com";
.probe = { .url = "/"; .interval = 5s; .timeout = 1 s; .window = 5; .threshold = 3; }

}

backend server2 {

.host = "server2.example.com";
.probe = { .url = "/"; .interval = 5s; .timeout = 1 s; .window = 5; .threshold = 3; }

}

Whats new here is the probe. Varnish will check the health of each backend with a probe.
The options are url What URL should varnish request. interval How often should we poll timeout What is the timeout of the probe window Varnish will maintain asliding windowof the results.

Here the window has five checks. threshold How many of the .window last polls must be good for the backend to be declared healthy.initialHow many of the of the probes a good when Varnish starts - defaults to the same amount as the threshold.

Now we define the director.:

director example_director round-robin {

{ .backend = server1; }
# server2
{ .backend = server2; }

}

You use this director just as you would use any other director or backend. Varnish will not send traffic to hosts that are marked as unhealthy. Varnish can also serve stale content if all the backends are down. See Misbehaving servers for more information on how to enable this. Please note that Varnish will keep probes active for all loaded VCLs. Varnish will coalesce probes that seem identical - so be careful not to change the probe config if you do a lot of VCL loading. Unloading the VCL will discard the probes.

Retrying with another backend if one backend reports a non-200 response.

sub vcl_recv {

if (req.restarts == 0) {

set req.backend = b1;

} else {

set req.backend = b2;

}

}

sub vcl_fetch {

if (obj.status != 200) {

restart;

}

}

Statistics

varnishtop

The varnishtop utility reads the shared memory logs and presents a continuously updated list of the most commonly occurring log entries. With suitable filtering using the -I, -i, -X and -x options, it can be used to display a ranking of requested documents, clients, user agents, or any other information which is recorded in the log.

varnishtop -i rxurl will show you what URLs are being asked for by the client.

varnishtop -i txurl will show you what your backend is being asked the most.

varnishtop -i RxHeader -I Accept-Encoding will show the most popular Accept-Encoding header the client are sending you.

varnishhist

The varnishhist utility reads varnishd(1) shared memory logs and presents a continuously updated histogram showing the distribution of the last N requests by their processing. The value of N and the vertical scale are displayed in the top left corner. The horizontal scale is logarithmic. Hits are marked with a pipe character (“|”), and misses are marked with a hash character (“#”). varnishsizes Varnishsizes does the same as varnishhist, except it shows the size of the objects and not the time take to complete the request.

This gives you a good overview of how big the objects you are serving are.

varnishstat

Varnish has lots of counters. We count misses, hits, information about the storage, threads created, deleted objects. Just about everything.

varnishstat will dump these counters. This is useful when tuning varnish. There are programs that can poll varnishstat regularly and make nice graphs of these counters. One such program is Munin. Munin can be found at http://munin-monitoring.org/ . There is a plugin for munin in the varnish source code.

Performance optimizations

Grace period

Enable grace period (varnish serves stale (but cacheable) objects while retriving object from backend)

in vcl_recv: set req.grace = 30s;

in vcl_fetch: set obj.grace = 30s;

Mount point

Mount the working-directory of Varnish on tmpfs. Typically /usr/lib/varnish.

It will reduce unnecessary disk access for the shmlog. Set the mount option noatime and nodiratime on the filesystems where you keep your Varnish data files. 
There is no point in keeping track of how often they are accessed, it will waste cycles/give unneccessary disk activity.

Monitoring

Make sure you monitor your cache hit ratio, the ratio of requests in % that is actually cached. This should be a high number, in order for Varnish to take the load of the backends. Use varnishstat (see hitrate avg), and if possible also monitor and graph it. Tools here can be [Nagios http://www.nagios.org/], or [Munin http://munin.projects.linpro.no/] (see also [Muninexchange http://muninexchange.projects.linpro.no/ and http://anders.fupp.net/plugins/] for plugins).

Monitor the number of Varnish threads. It should never be as high as the Varnish thread_pool_max setting.

Logging & Debugging

Logging happens by default to /var/log/varnish/varnish.log

This happens by the (comes with varnish) varnishlog deamon. http://stewsnooze.com/content/what-stopping-varnish-and-drupal-pressflow-caching-anonymous-users-page-views 
Adding debuging headers (to see if the cache is working) varnishlog

editorial tip : varnishlog | perl -pe 's/X-Cache/\e[1;31;43m$&\e[0m/g' )

varnishncsa: Displays the varnishd shared memory logs in Apache / NCSA combined log format varnishlog: Reads and presents varnishd shared memory logs. 

varnishstat: Displays statistics from a running varnishd instance. varnishadm: Sends a command to the running varnishd instance.

varnishhist: Reads varnishd shared memory logs and presents a continuously updated histogram showing the distribution of the last N requests by their processing.

varnishtop: Reads varnishd shared memory logs and presents a continuously updated list of the most commonly occurring log entries.

varnishreplay: Parses varnish logs and attempts to reproduce the traffic. http://www.varnish-cache.org/docs/trunk/tutorial/troubleshooting.html When varnish doesn't start