Website Performance and Optimization

A couple of months ago, I noticed that I was getting pretty close to using up all of my monthly bandwidth allocation for my server and that was a surprise. I run several blogs that get quite a few hits but I didn't think I was anywhere near going over my 250 GB allotment. So I decided to spend a little time to optimize my server and figure out the best way to utilize what I had and optimize it to get the most performance out of my little box. Jeff Atwood's wonderful blog entry about Reducing Your Website's Bandwidth Usage inspired me to write about my experience and what I ended up doing to squeeze the most out of my server.

I had done some of the obvious things that people typically do to minimize traffic to their site. First and foremost was outsourcing of my RSS feeds to FeedBurner. I've been using FeedBurner for several years now after I learned the hard way how badly programmed a lot of the RSS readers were out there. I had to ban several IP addresses as they were getting my full feed every 2 seconds – Hoping that was some bad configuration on their side but who knows. Maybe it was a RSS DOS attack :). After taking a little time to see what was taking up a lot of the bandwidth, I discovered several things that needed immediate attention. First and foremost was the missing HTTP compression. Looks like an Apache or PHP upgrade I did in the past few months had ended up disabling the Apache module for GZIP compression and so all the traffic was going out in text. HTTP Compression delivers amazing speed enhancements via file size reduction and most if not all browsers support compression and so I enabled compression for all content of type text/html and all CSS and JS files.

Some older browser don't handle JS and CSS compressed files but anything of IE6 seemed to handle JS/CSS compression just fine and my usage tracking (pictured above) indicated that most of my IE users were using IE 6 and above.

Enabling HTTP Compression compressed my blog index page by 78% resulting in a statistical performance improvement of almost 4.4x. While your mileage may vary, the resulting performance improvement got me on the Top20 column at GrabPERF almost every single day.

Another issue I had was the number of images being loaded from my web server. As most of you already know, browsers will typically limit themselves to 2 connections per server and so if a webpage being loaded has 4 CSS files, 2 JS files and 10 images, you are loading a lot of content over those 2 connections. And so I used a simple CNAME trick to create an image.j2eegeek.com to complement http://www.j2eegeek.com and started serving images from image.j2eegeek.com. That did help and I considered doing something similar for CSS and JS files but decided instead to outsource image handling to Amazon's S3.

Amazon's S3 or Simple Storage Service is a highly scalable, reliable, fast, inexpensive data storage infrastructure that is fast and relatively inexpensive. S3 allows you to create a 'bucket', which is essentially a folder that must have a globally unique name and cannot have any sub-buckets or directories and so it's basically emulates a flat directory structure. Everything you put in your bucket and make publically available is accessible via http using the URL http://s3.amazonaws.com/bucketname/itemname.png. Amazon's S3 Web Service also allows you to call it using the HTTP Host header and so the URL above would become http://bucketname.s3.amazonaws.com/itemname.png. You can take this further if you have access to your DNS server. In my case, I created a bucket in S3 called s3.j2eegeek.com. I then created a CNAME in my DNS for s3.j2eegeek.com and pointed it to s3.amazonaws.com. And presto – s3.j2eegeek.com resolves to essentially http://s3.amazonaws.com/s3.j2eegeek.com/. I then used John Spurlock's NS3 Manager to get my content onto S3. NS3 Manager is a simple tool (windows only) to transfer files to/from an Amazon S3 storage account, as well as manage existing data. It is an attempt to provide a useful interface for some of the most basic S3 operations: uploading/downloading, managing ACLs, system metadata (e.g. content-type) and user metadata (custom name-value pairs). In my opinion, NS3 Manager is the best tool out there for getting data in and out of S3 and I have used close to 20 web based, browser plug-in and desktop applications.

In addition, I also decided to try out a couple of PHP Accelerators out there to see if I could squeeze a little more performance out of my web server. Compile caches are a no-brainer and I saw decent performance improvement in my PHP applications. I blogged about this topic in a little more detail and you can read that if you care about PHP performance.

The last thing I did probably had the biggest impact after enabling HTTP compression and that was moving my Tomcat application server off my current Linux box and moving it to Amazon's EC2. Amazon's EC2 or Elastic Compute Cloud is a virtualized cloud of computing available to you for $0.10 per hour of CPU utilization. I've been playing around with EC2 for a while now and just started using it for something real. I have tons of notes that I taken during my experimentation with EC2 where I took the stock Fedora Core 4 images from Amazon and made that server into my Java application server running Tomcat and Glassfish. I also created my own Fedora Core 6, CentOS 4.4 image and deployed them as my server. My current AMI running my Java applications is a Fedora Core 6 image and I am hoping to get RHEL 5.0 deployed in the next few weeks but all of that will be a topic for another blog.

In conclusion, the HTTP Compression offered me the biggest reduction in bandwidth utilization. And it is so easy to setup on Apache, IIS or virtually any Java application server that is it almost criminal not to do so. 🙂 Maybe that's overstating it a bit – but there are some really simple ways to optimize your website and you too can make your site hum and perform like you’ve got a cluster of servers behind your site.

Advertisement

Will (Or Should) Adobe open-source Flex?

I have been building AJAX applications for a while now and absolutely love AJAX and the improvements it can offer in user-interface design, making applications easy and fun to use. But AJAX does have limitations and I, like many others have come to the realization that while AJAX is great for most things, it is not the silver bullet. For data-intensive application, specifically that involve dynamic charting with vector graphics and mining, AJAX falls short.

There are a couple of alternatives out there that fill that niche that AJAX still hasn’t successfully filled and Adobe’s Flex 2 framework is definitely one of the them. Adobe Flex 2 software is a rich Internet application framework based on Adobe Flash that will enable you to create applications that are cross-platform and browser independent as they run inside the Flash VM. Flash has fulfilled the promise that Java applets never delivered for a variety of reasons. The Flex programming model is fairly simple where developers write MXML and ActionScript source code and the source code is then compiled into bytecode by the Flex compiler, resulting in a binary file with the *.swf extension. Developers use MXML to declaratively define the application user interface elements and use ActionScript for client logic and procedural control. MXML provides declarative abstractions for client-tier logic and bindings between the user interface and application data. ActionScript 3.0 is an implementation of ECMAScript, and it provides support for strong typing, interfaces, delegation, namespaces, error handling, and ECMAScript for XML (E4X).

Adobe gives away the Flex 2 SDK for free and so anyone can create Flex 2 application and compile them into SWF bytecode files. Adobe sells Flex Builder, which is the Eclipse based IDE for Flex development and Flex Data Services, which is a J2EE component deployed inside a container. It provides adapters to connect to EJB’s, JMS queues, backend data stores, etc.

One of the barriers to wider Flex adoption is the proprietary nature of the technology. Flex is closed technology and Adobe controls every aspect of it. There’s nothing wrong with that but I and I am guessing a lot of people prefer open architecture, open systems and open platforms for application development to prevent vendor lock-in. Adobe has taken some positive steps by releasing the Flex-Ajax Bridge (FABridge) library, which automatically exposes the public data and methods within a Flex application to the JavaScript engine and vice versa. This enables developers to easily integrate Flex applications with existing sites as well as to deliver new applications that combine Ajax with applications created in Flex. A great example of the Flex-AJAX interaction is the charting application on Google Finance. It was interesting to see that Yahoo also decided to use Flash for charting when they deployed the new version of the Yahoo Finance portal.

Open sourcing Flex would certainly lead to wider adoption of Flex as an application development framework. So why doesn’t Adobe do it? It seems to fit the Adobe business model – If you take a look at Acrobat or Flash or really any of the other Adobe products. They give away the client for free and monetize the creation part of process. Take a look at PDF and Acrobat – Adobe gives away the reader for free but makes money by selling Adobe Distiller. Why couldn’t that model work for Flex? Open-source Flex and continue making money on Flex Builder, Flex Data Services, training, consulting, support and custom components. I’m sure there is already a fairly robust marketplace for Flex components but Adobe can take that to the next level. I know Adobe has spent significant amount of time, money in terms of engineering effort to create Flex but the proprietary nature of it will always be a limiting factor and never let Flex be the premier platform for RIA’s. If Adobe waits too long, the browsers will get better and fully support SVG, CSS3, JavaScript JIT compilers and the advantage Flex offers will narrow. The next generation of AJAX frameworks are also just around the corner and they will compete with Flex. OpenLaszlo is another dark-horse in this race that may eat Flex’s lunch. OpenLaszlo is everything I want Flex to be – OpenLaszlo programs are written in XML and JavaScript and transparently compiled to Flash. The OpenLaszlo APIs provide animation, layout, data binding, server communication, and declarative UI. And what sets it apart from Flex is that OpenLaszlo is an open source platform. Adobe – Are you listening?

Carbonite Rocks – Backups Made Easy

Update (Oct 6, 2007): I have stopped using Carbonite and switched to Mozy for a while now. I’ve had numerous problems with Carbonite and their customer service was crappy. So I decided to give up on Carbonite even though I had already pre-paid for 2 years – I guess it’s better to lose $80.00 than all your data. Mozy rocks and I haven’t any any problems with them and EMC just bought them and so they are now part of a much larger storage company. I think this will be great news for all Mozy users. Mozy is at http://www.mozy.com/

I’ve been using Carbonite in addition to my local backups to external drives and Carbonite really works great. Carbonite is basically Windows backup software tied to an online automatic backup service that uploads and backups your data over your broadband connection. Your data is encrypted and stored in their remote data center and can be restored using the same broadband connection.

The nice thing about Carbonite is the set-it-and-forget-it nature of the software. Once you decide what items you want to backup, you just forget about Carbonite and it backs up your data. You can back up unlimited amounts of data for $5.00 per month or buy a yearly subscription for $49.00. I purchased a 2 year subscription and just finished up backing over 90 GB to the Carbonite servers. Carbonite typically backs up about 2 GB a day and then slows down to .5GB per day once you have backed up 50GB of data.

Carbonite Backup

The only issue I’ve seen so far with Carbonite is the lack of Windows Vista support. While Carbonite was backing up my system, I upgraded my box to Windows Vista and Carbonite continued to work. But I am not sure I am going to be able to restore things correctly and it’s not Carbonite’s fault. It’s another stupid thing Microsoft did in Vista where all of the user settings documents were moved from "C:Documents and settings" to C:users to make it look more like MAC OSX. My Documents become Documents and My Music became Music. Why – No one knows? I am working with Carbonite support and they hope to have an update to their software for Vista and I hope they have a fix for this issue.

If you are interested in trying Carbonite free for 15 days, click this [link deleted].

Pictobrowser Rocks

Just found the Pictobrowser from Thomas Hawk's blog and it is an amazing way to embed pictures in your blog. Pictobrowser is a simple widget that allows you to display sets of pictures from Flickr directly on your site or blog using Flash and the users never leave your site. Pictobrowser is the brainchild of Diego Bauducco. Check out a sample below from one of my sets:

http://www.db798.com/work/photo_browser/photo_browser.swf

Essential Software for Windows

You know the old routine – You get a new machine and then you spend weeks looking for and installing all the applications, tools, utilities, etc that you had on your old computer that made you so productive. There is always that utility that you use once in a while but you just can’t seem to find it.

I recently bought a new computer and decided to make a list of all the software I installed on the new computer so that I’m ready to do this again for my next machine. I wish I had discovered Belarc Advisor before I rebuilt my old desktop as a Linux (Ubuntu) desktop. So here is a fairly complete list of what’s installed on my machine and if you see something that I should have, please leave me a comment:
The Essentials

Development

Audio, Video & Graphics

Browsers & Extensions

Utilities

Amazon EC2 and S3 – Implications for the Enterprise

Now that the blogosphere has settled down after the launch of Amazon’s EC2 beta program, I figured it was time to talk about something that I found missing in all the blogs and online discussions.

Before we get to the Enterprise implication of EC2 and S3, I should probably let the people that haven’t heard about them a little background. Amazon S3 is a service that launched a few months ago that provides a simple web services interface to store and retrieve any amount of data, at any time, from anywhere on the web. It gives you access to a highly scalable, reliable, fast data storage infrastructure without spending the millions it would take to create a redundant, fault-tolerant SAN environment. Amazon Elastic Compute Cloud (Amazon EC2) is a new service that launched last week that finally realizes the promise of grid computing for me.

Amazon EC2 gives you access to a virtual computing environment in the cloud. Your applications run on a “virtual CPU”, the equivalent of a 1.7 GHz Xeon processor with 1.75 GB of RAM, 160 GB of local disk and 250 Mb/second of network bandwidth. You pay 10 cents per hour (per instance) which would amount to about $72 per month. You can provision one, hundreds or even thousands of servers or grow capacity as needed as your application grows. Can you imagine being able to provision 1, 2 or 500 additional servers in minutes for your application programmatically?

To setup your instance, Amazon gives you tools to create your own Amazon Machine Image (AMI). An AMI is simply a packaged-up environment that includes all the necessary bits to set up and boot your instance (Currently Fedora Core 3 and 4 systems based on the Linux 2.6 kernel are explicitly supported, although any Linux distribution which runs on this kernel version should work.) that can include a webserver, database server, etc. Once you create your AMI, you upload it to Amazon S3 and your instance is ready to go. You can target that image to multiple instances or build out your web tier on a set of machine, your middle tier on another set and your database on another set as well. Since this is essentially virtualized Linux, any applications that work on Linux should work here including Java applications. Amazon EC2 is a closed-beta program and I haven’t gotten access to the beta yet but Edwin Ong over at castblog has a nice review with some great screenshots that will demonstrate the potential here.

Now that you have the background on S3 and EC2, you can just imagine the potential for startups. Instead of having to pay for terabytes of storage if you are the next Flickr or YouTube killer, you can simply use S3 for all your storage needs and have a redundant, encrypted file system that’s fairly bulletproof and grows with you. Instead of having to forecast your storage needs, you can focus on other real tangible problems. EC2 now provides the same on the computing side of the house. Not sure how many dedicated managed servers to get at your ISP? Well, just use EC2 and grow your farm of dedicated virtual boxes you as need them. And so if Digg, Techmeme, Reddit or TechCrunch or the meme of your choice is sending you millions of hits, add a few virtual servers to support your application and then scale back as traffic dissipates.

The advantages of this virtual platform are pretty obvious but I see major potential of this model for the enterprises. Take any of your Fortune 1000 companies or millions of other smaller companies than that. Most of them are required either by regulation or competitive landscape to have BCP (business continuity) plans, especially if they are in a highly-regulated industry like banking/finance, insurance, health-care etc. So what if you could build out a virtual BCP environment where you test, build and deploy your applications on a few EC2 instances to validate your applications and scale up by adding additional instances if you really need to failover your applications. The traditional model of BCP is building out another datacenter or leasing space (colocation) in an established datacenter center that meets your power, telecom/network, security and service needs which costs anywhere from several thousands to millions depending on the scale. What if you could completely eliminate that cost by using Amazon’s virtual computing grid? What if you could deploy all of your applications that are critical to running the business in case of a disaster on a virtual cluster of servers without paying the cost of a full-physical build-out? The question of privacy, data encryption and access controls would need to be flushed out but Amazon could potentially be the solution for companies that are struggling to justify exorbitant BCP costs.

I think S3 has already changed the competitive landscape and realized the dream of the virtual storage network and EC2 is going to be type of disruptive change that will turn the market on its head. I cannot wait to see the tools that are going to pop around the EC2 space to make the creation and deployment of your virtual server easier than the current command line process. S3 is a great example with some great applications that have popped up to take advantage of S3 and my current favorite is JungleDisk. S3 over WebDAV � brilliant. As an aside, I am working on my own version of an AJAX enabled S3 web application but it’s more for personal use that will end up being used as tutorial-ware more than anything. It’s interesting to see the mini-industry that has popped around S3 and EC2 will draw even greater interest. It will only be a matter of time before you will have your vendors offering one-click setup of your Amazon EC2 server preloaded with the Linux flavor of your choice along with applications you need. I can also see a new group of hosting providers jumping as VAR vendors to resell the many virtual instances they pay for as shared host servers slices. The potential is limitless – Now I just need to get into the beta so I can see it for real.

Windows Live Writer – Microsoft’s new blog editor

Microsoft just launched Windows Live Writer, a new publishing tool for WYSIWYG blog authoring on Windows Live Spaces, WordPress, Typepad and other blogging services.  Found this software via Digg and it looks pretty good so far.  The Live Writer is pretty similar to the Word and so I’m not sure how this product will be positioned in the future.  With all the blog editing tools in Office 2007, this is an interesting offering but I guess it could be the free tool that doesn’t have all the features of Word 2007. 

My initial impressions of this tool are fairly positive.  The WYSIWYG blog authoring is really good and allows you to edit in GUI mode or directly edit the HTML being generated.  The image tool is pretty cool that will allow you to add images to your post that can uploaded to your blog directly.  

Writer supports RSD (Really Simple Discoverability), the Metaweblog API, and the Movable Type API with more blog platforms and API coming in the near future. 

Another feature that’s interesting is the ability to insert a Windows Live Local map directly into a post.  For now, only Live.com maps is supported but the SDK that is also shipping should allow anyone to create interesting add-ons. 

 


The White House

Google’s Picasa Web Albums – No Flickr Killer, Yet!

Google launched a beta version of the Picasa Web Albums earlier in the week and I got invited to participate in the beta of the product. Picasa Web Albums is Picasa’s newest feature, designed to help users post and share their photos quickly and easily on the web.

Picasa has always had the ability to take you existing pictures and create html based or webpage slideshows but you needed to have a host where you could upload and display these pictures. I’ve been using Picasa since it became free with the Google acquisition and it has always worked great. Picasa is really my favorite program for managing my photo collections and it’s worked great for that purpose. I always used the ability to upload and print pictures directly from Picasa to Ofoto and other online digital photo printers.

The latest beta which as the Picasa Web Albums is exactly like the older (v2.0) of Picasa but the first thing I noticed that was different was the extra button in the bottom of the application named ‘Web Album’. Clicking that button allowed you to upload the selected pictures to the web and stored on Google’s servers. The first requirement is a Google account and you get 250MB of free storage space. For $25.00 per year, you can get a subscription to an additional 6GB of storage. While the fee is comparable to Flickr, the storage limit is something different. I have a Flickr Pro account and I essentially have unlimited storage capacity – The only restriction I have is a monthly upload bandwidth limit.

Picasa Web Album

The user-interface is very simple and with a few simple clicks, you can upload your pictures directly to your Google Picasa Web Album site. One of the things that make Picasa stand apart from Flickr is that Picasa offers image manipulation ability while Flickr just offers hosting. In the past, I would export ‘fixed’ pictures from Picasa and upload them to Flickr or create web-pages out of Picasa to share with family and friends. Once you create an album, you can share it with friend and Picasa lets you email people to share the location of the album.

Share Picasa Web Album

I took some pictures from my recent vacation in Hawaii and put them up using Picasa. The album interface is full of Ajax goodness like Flickr but there are a few subtle differences. A nice feature that I haven’t seen in Flick is the ability to ‘Download Album’ which allows you to download a complete album to your computer. Each album is also accessible and available as an RSS feed which is also pretty nice. It also appears that you can add a user as an RSS feed and see future uploads in your feed reader.

All in all, Picasa Web Albums is a pretty neat idea and a very simple, yet elegant interface. I didn’t see any Google Adsense ads on the site yet but I am sure that will come. Being a Picasa fanatic, I guess I’ll start using the Web Albums but I wish you could get the same result but my web host for storage. I know some of the hosted features of comments, etc wouldn’t work without the Google backend but I would live without those features of have that type of functionality still tied back to Google. Oh well – Time will tell and once the API opens up, people will come up with some pretty creative solutions.

Network Neutrality: Why it really matters?

Let’s hope Congress does the right thing and adds in adequate protection to guarantee Network Neutrality as life without it could really suck. Can you imagine a world where people who use Verizon as their ISP not being able to get to Google as Microsoft is paying Verizon to make MSN the preferred search engine. So all traffic to Google could be blocked or QOS’d down to a trickle. Can you imagine using a 4 MBps Internet connection where your connection to Google or other sites is a trickle of 56 Kbps? Can you imagine your Skype call sounding like crap because you are using AT&T as an ISP and they would rather sell you their VoIP services?

It’s funny how the Telecommunication companies essentially want to blackmail content providers and consumers. When I surf to Google or Flickr or Digg or Live.com, I am using my local Internet connection that I pay for – Sites like Google, Flickr and Digg also pay for the bandwidth they use and so the consumer is paying to get to the provider and the provider is paying to connect to the Internet. But my ISP now wants to give one company priority on their networks over another, for a price. Sounds pretty shady, doesn’t it?

Can you imagine how this would just totally kill innovation on the Internet? If we don’t get Network Neutrality, innovative companies like YouTube, Flickr, del.icio.us and countless others would have never been able to launch as they wouldn’t have been able to pay the Telecom companies extortion. Google, Microsoft and Yahoo would be able to afford it but new upstarts would be left out and can you imagine a world where there is no real competition? Look at Microsoft and Internet Explorer – After Netscape died, Microsoft essentially disbanded the IE team and didn’t add any new features in the browser for almost 6 years. Competition is critical and fuels innovation and without competition, we have stagnation and the consumer suffers.

Web2.0 or Bubble2.0, depending on your perspective has largely been made possible by the ubiquity of the high-speed Internet access. Even Al Gore, the creator of the Internet has spoken up on this issue. 🙂 At a recent speech, Al Gore said “Freedom of communication is an essential prerequisite for the restoration of the health of our democracy. It is particularly important that the freedom of the Internet be protected against either the encroachment of government or the efforts at control by large media conglomerates.” (Via FreePress.net)

Google hits a home run with Google Spreadsheet

I just got my invite to play with the latest offering from Google, the Google Spreadsheets and my initial reaction after playing with it for the past hour is incredibly positive. Unlike some of the duds Google has launched recently, this is a pretty nice, robust and useful offering.

I started off by creating a simple spreadsheet and tried out some simple formulas and it worked – I shouldn’t be surprised but I was. Tying in =(a1 – a2) actually worked and that’s pretty cool. Here are 2 simple screen-shots from my playing with the formula.

google-spreadsheet1

After I hit enter, the results are plopped into the cell.

google-spreadsheet2

The other neat thing was that the formula was saved in my document and the numbers updated when I changed one of the columns involved in the formula. I know it sounds pretty simple but it’s great to have a web application behave like a fat-client application.

Google Spreadsheets have a ton of other formulas that you can apply and it seems to offer all the functionality I use out of Excel. The collaborative feature also has great possibilities but that’s limited to people who have Google Accounts only at this point. Weird but I guess that’s something that they are working on to include anyone with an email address.

google-spreadsheet3

A couple of other nice features include options to export to Excel (xls), CSV and HTML. The Excel export works great and I was able to open up the exported spreadsheet in Excel as you would expect. The HTML export also works but the generated HTML does not validate which seemed odd but I know this is beta [insert your own joke here] software.

In addition to creating new spreadsheets, you can import existing Excel documents and this application did a great job of importing the spreadsheet with a ton of data and some complex formulas. I am very impressed with the overall functionality and overall usability of this application. Would I dump Excel to start using this? No – The accessibility and collaboration features are great but there are privacy concerns that would not make me comfortable using this application with some personal and confidential data. Maybe when GDrive launches, we will feel more comfortable about how data is encrypted in storage, segregated from other users and protected from hackers. Till then, I’m sticking with Excel for my rudimentary needs.