Saturday, May 29, 2010

Free Web Hosting on App Engine

I don't have much going on at www.swwomm.com; just a few static mockups and other documents (this blog is hosted by blogger.com). So I finally got around to transferring it from a paid host (lylix; no complaints other than that they have the gall to charge money for their services) to a free one: Google's App Engine (GAE).

I'm not sure why there isn't already a cookbook for transferring a static site to GAE, since it's pretty easy and painless — so here's my recipe.

Gotchas

But first, note that your lunch is not completely free — Google will 503 your ass if you exceed its (extremely generous) daily quotas for bandwidth and processing power.

Also, there are a few other GAE gotchas which apply to static content:

  1. Can't host a "naked" domain (ie example.com instead of www.example.com).
  2. Limit of 3000 files per app.
  3. No automatic directory listings.
  4. No automatic directory redirect (ie redirect from /foo to /foo/).
  5. No custom 404 page.

You can get around #3 and #4, however (as I describe below).

Setup

The first thing you need to do is get the GAE SDK. I'm using the version for python on linux.

With the SDK installed, you can start developing right away on your local dev machine. To deploy an app, of course, you need to sign up for a GAE account. They make you verify your account with an SMS message, so have your cell phone ready.

Hello World

Building a static app is really simple:

  1. Create a directory for your project (ie myapp).
  2. Create a sub-directory named static (or really, whatever you want to name it) inside the project directory. This will be the web root.
  3. Copy your static files into static.
  4. Add a boilerplate app.yml to the project directory.

This is the boilerplate app.yml:

application: myapp version: 1 runtime: python api_version: 1 default_expiration: "1d" handlers: # show index.html for directories - url: (.*/) static_files: static\1index.html upload: static(.*)index.html # all other files - url: / static_dir: static

The first line (application: myapp) specifies the name of your app. This name doesn't matter for development on your local machine; but when you use the GAE dashboard to create a new app, you'll be prompted for a name. The name must be unique globally on GAE (ie it has to be a name no other GAE user has claimed for his or her app), and GAE uses it in the default url for your application (ie http://myapp.appspot.com/). Once you've created the name and set up the app in the GAE dashboard, go back and change it here in app.yml.

The default_expiration setting is the default http cache-age for your static files; "1d" = one day, "4h" = 4 hours, etc. You can configure a separate expiration time for each url handler, but "1d" is probably good for most static content.

The first url in the handlers section captures all urls which end with a trailing-slash. These are directories; with static content you usually either want to display the index.html file of the requested directory, or, if there's no index.html, just the bare directory listing. Unfortunately GAE doesn't support listing static files, so this handler always just tries to display the index.html file in the requested directory.

The second url in the handlers section captures all other urls, and simply serves the requested static file.

Test It Out

At this point, you've already built a fully-functioning GAE app. You can test it out by running the test appserver (where $APPENGINE_HOME is the path to the GAE SDK on your local box, and $MYAPP_HOME is the path to your project directory):

$ $APPENGINE_HOME/dev_appserver.py $MYAPP_HOME

This boots up a test GAE appserver on port 8080. Enter http://localhost:8080/ into your browser address bar; it should load up the index.html from the root of your static directory.

Directory Woeage

But as I pointed out above, if you navigate to a sub-directory, and omit the trailing slash (ie http://localhost:8080/foo), you won't see the index.html for that sub-directory — you'll just get a blank page (and Google's generic 404 page when deployed to GAE).

This we can fix, however, by implementing a trivial RequestHandler in python. Create a file called directories.py (or whatever the heck you want to call it) in your project directory, and dump this into it:

import cgi from google.appengine.ext import webapp from google.appengine.ext.webapp.util import run_wsgi_app class RedirectToDirectory(webapp.RequestHandler): def get(self): self.redirect(self.request.path + "/", permanent=True) application = webapp.WSGIApplication([('/.*', RedirectToDirectory)], debug=True) def main(): run_wsgi_app(application) if __name__ == "__main__": main()

This creates a RequestHandler called RedirectToDirectory; this class simply appends a trailing slash to the current url and redirects. The other non-boilerplate line is just below, where RedirectToDirectory is registered to be used for all urls (/.*) handled by this directories.py script.

Next, drop in a url entry for the directories.py script into the handlers section of your app.yml (and yes, the order of the url entries is important):

application: myapp version: 1 runtime: python api_version: 1 default_expiration: "1d" handlers: # show index.html for directories - url: (.*/) static_files: static\1index.html upload: static(.*)index.html # redirect to directories (/foo to /foo/) - url: .*/[^.]+ script: directories.py # all other files - url: / static_dir: static

This url entry will capture all the urls which don't end in a trailing slash and don't have a file extension. It will handle requests for these urls by sending them to your directories.py script, which in turn will redirect them back to your app — but with a trailing slash this time.

Directory Listings

But if your directory doesn't have a index.html file, you're still SOL — unlike a normal webserver, you can't configure GAE to just display the directory listing. And you can't just implement a handler to do this — GAE apps don't have access to their static filesystem.

One possible workaround to this would be to store all your files as entries in the Big Table DB, and then serve them (and the directory listings) dynamically. This would require writing a bunch of code, however, when all you really want is just to serve some freaking static files already.

So I compromised and wrote a simple perl script which automatically creates static index.html files for a configurable list of static directories. To make it work, create a directories sub-directory in your project, and add to it four files:

make.pl
#!/usr/bin/perl -w open LIST, "directories/list.txt" or die $!; while (<LIST>) { s/\n//; # strip newline $title = $dir = $_; $title =~ s!.*/!!; # strip path from directory name # open directory index.html for writing open INDEX, ">$dir/index.html" or die $!; # dump header template into index.html # replacing %title% with directory name open HEAD, "directories/head.html" or die $!; while (<HEAD>) { s/%title%/$title/g; print INDEX; } close HEAD; # dump directory listing into index.html open DIR, "ls -lh $dir |" or die $!; while (<DIR>) { s/\n//; # strip newline # parse fields listed for each file @fields = split / +/, $_, 8; # skip lines that aren't file listings # and also skip this index.html next if ($#fields < 7 || $fields[7] eq 'index.html'); # print a table row for this file print INDEX '<tr><td class="name"><a href="' . $fields[7] . '">' . $fields[7] . '</a></td><td class="size">' . format_size($fields[4], $fields[0]) . '</td><td class="modified">' . $fields[5] . ' ' . $fields[6] . '</td></tr>' . "\n"; } close DIR; close INDEX; # dump footer into index.html `cat directories/foot.html >> $dir/index.html`; } close LIST; # format file size a little nicer than ls sub format_size { my($size, $perm) = @_; # skip for subdirectories return '-' if $perm =~ /^d/; # add bytes abbr $size .= 'B' if $size =~ /\d$/; return $size; }

This is the perl script. When you create it, make sure you make it executable:

$ chmod +x directories/make.pl

When you run it (after you've created the other three files), make sure you run it from your project directory:

$ cd myapp $ directories/make.pl
list.txt
static/foo static/foo/bar static/baz

This is the list of directories for which to auto-generate index.html files. Please note that the make.pl script will delete the existing index.html files in these directories. So make sure that you list only directories which don't have a custom index.html. (Plus this is another good reason to be using version control on your project.)

head.html
<html> <head> <title>%title% - My App</title> </head> <body> <h1>%title%</h1> <table> <thead> <tr><th>Name</th><th>Size</th><th>Modified</th></tr> </thead> <tbody>

This is the first part of template for the auto-generated index.html files. The perl script will replace the instances of %title% in this file with the directory name.

foot.html
</tbody> </table> </body> </html>

This is the second part of the template.

So add to directories/list.txt the paths of the directories you want to have listings, customize directories/head.html and directories/foot.html to your liking, and run directories/make.pl. This will create your directory-listing index.html files.

Deploy

And now you're ready to deploy. Make sure you've created the app in the GAE dashboard and updated your app.yml with its appspot name; then upload it:

$ $APPENGINE_HOME/appcfg.py update $MYAPP_HOME

You'll have to enter your GAE credentials, and wait for a minute or two while your app boots up on GAE; when your app is deployed and ready to use, the script will let you know. If you screw up your credentials, delete your ~/.appcfg_cookies file (which caches them), and try again.

If you've also signed up for Google Apps, you can use your (separate) Google Apps dashboard to configure a sub-domain of a domain you own to point to your deployed app (ie http://www.example.com/). Otherwise, you have to access it via its sub-domain of the appspot domain (ie http://myapp.appspot.com/).

Bon app├ętit!

2 comments:

  1. this is good stuff!!
    I've been looking for 'trailing slash' script for a while now
    Yours working great! Thanks a lot!!

    ReplyDelete
  2. Thank you very much for the redirect script. Being not familiar with regular expressions, I was going nuts trying to figure out how to redirect from no slash to trailing slash. Thank you again!

    - Pj @ www.punjabirangmanch.com

    ReplyDelete