A couple of months ago, I noticed that I was getting pretty close to using up all of my monthly bandwidth allocation for my server and that was a surprise. I run several blogs that get quite a few hits but I didn't think I was anywhere near going over my 250 GB allotment. So I decided to spend a little time to optimize my server and figure out the best way to utilize what I had and optimize it to get the most performance out of my little box. Jeff Atwood's wonderful blog entry about Reducing Your Website's Bandwidth Usage inspired me to write about my experience and what I ended up doing to squeeze the most out of my server.
I had done some of the obvious things that people typically do to minimize traffic to their site. First and foremost was outsourcing of my RSS feeds to FeedBurner. I've been using FeedBurner for several years now after I learned the hard way how badly programmed a lot of the RSS readers were out there. I had to ban several IP addresses as they were getting my full feed every 2 seconds – Hoping that was some bad configuration on their side but who knows. Maybe it was a RSS DOS attack :). After taking a little time to see what was taking up a lot of the bandwidth, I discovered several things that needed immediate attention. First and foremost was the missing HTTP compression. Looks like an Apache or PHP upgrade I did in the past few months had ended up disabling the Apache module for GZIP compression and so all the traffic was going out in text. HTTP Compression delivers amazing speed enhancements via file size reduction and most if not all browsers support compression and so I enabled compression for all content of type text/html and all CSS and JS files.
Some older browser don't handle JS and CSS compressed files but anything of IE6 seemed to handle JS/CSS compression just fine and my usage tracking (pictured above) indicated that most of my IE users were using IE 6 and above.
Enabling HTTP Compression compressed my blog index page by 78% resulting in a statistical performance improvement of almost 4.4x. While your mileage may vary, the resulting performance improvement got me on the Top20 column at GrabPERF almost every single day.
Another issue I had was the number of images being loaded from my web server. As most of you already know, browsers will typically limit themselves to 2 connections per server and so if a webpage being loaded has 4 CSS files, 2 JS files and 10 images, you are loading a lot of content over those 2 connections. And so I used a simple CNAME trick to create an image.j2eegeek.com to complement http://www.j2eegeek.com and started serving images from image.j2eegeek.com. That did help and I considered doing something similar for CSS and JS files but decided instead to outsource image handling to Amazon's S3.
Amazon's S3 or Simple Storage Service is a highly scalable, reliable, fast, inexpensive data storage infrastructure that is fast and relatively inexpensive. S3 allows you to create a 'bucket', which is essentially a folder that must have a globally unique name and cannot have any sub-buckets or directories and so it's basically emulates a flat directory structure. Everything you put in your bucket and make publically available is accessible via http using the URL http://s3.amazonaws.com/bucketname/itemname.png. Amazon's S3 Web Service also allows you to call it using the HTTP Host header and so the URL above would become http://bucketname.s3.amazonaws.com/itemname.png. You can take this further if you have access to your DNS server. In my case, I created a bucket in S3 called s3.j2eegeek.com. I then created a CNAME in my DNS for s3.j2eegeek.com and pointed it to s3.amazonaws.com. And presto – s3.j2eegeek.com resolves to essentially http://s3.amazonaws.com/s3.j2eegeek.com/. I then used John Spurlock's NS3 Manager to get my content onto S3. NS3 Manager is a simple tool (windows only) to transfer files to/from an Amazon S3 storage account, as well as manage existing data. It is an attempt to provide a useful interface for some of the most basic S3 operations: uploading/downloading, managing ACLs, system metadata (e.g. content-type) and user metadata (custom name-value pairs). In my opinion, NS3 Manager is the best tool out there for getting data in and out of S3 and I have used close to 20 web based, browser plug-in and desktop applications.
In addition, I also decided to try out a couple of PHP Accelerators out there to see if I could squeeze a little more performance out of my web server. Compile caches are a no-brainer and I saw decent performance improvement in my PHP applications. I blogged about this topic in a little more detail and you can read that if you care about PHP performance.
The last thing I did probably had the biggest impact after enabling HTTP compression and that was moving my Tomcat application server off my current Linux box and moving it to Amazon's EC2. Amazon's EC2 or Elastic Compute Cloud is a virtualized cloud of computing available to you for $0.10 per hour of CPU utilization. I've been playing around with EC2 for a while now and just started using it for something real. I have tons of notes that I taken during my experimentation with EC2 where I took the stock Fedora Core 4 images from Amazon and made that server into my Java application server running Tomcat and Glassfish. I also created my own Fedora Core 6, CentOS 4.4 image and deployed them as my server. My current AMI running my Java applications is a Fedora Core 6 image and I am hoping to get RHEL 5.0 deployed in the next few weeks but all of that will be a topic for another blog.
In conclusion, the HTTP Compression offered me the biggest reduction in bandwidth utilization. And it is so easy to setup on Apache, IIS or virtually any Java application server that is it almost criminal not to do so. 🙂 Maybe that's overstating it a bit – but there are some really simple ways to optimize your website and you too can make your site hum and perform like you’ve got a cluster of servers behind your site.