This is the story of how I deployed the Taubatron – a web application written in Scheme. I wrote about what it does earlier; this post is concerned with getting it running on the actual internet, as opposed to on my personal computer.
CGI vs server process
The most straightforward approach would have been to deploy the application as a CGI program. I had deployed Scheme applications at my hosting provider that way before. This is also the type of service offered, at the time of this writing, at the free Scheme web hosting site ellyps.net. But for this application, performance with CGI was a problem –
Document Path: /taubatron.cgi Document Length: 248 bytes Concurrency Level: 10 Time taken for tests: 23.234 seconds Complete requests: 100 Failed requests: 0 Write errors: 0 Total transferred: 29000 bytes HTML transferred: 24800 bytes Requests per second: 4.30 [#/sec] (mean) Time per request: 2323.402 [ms] (mean) Time per request: 232.340 [ms] (mean, across all concurrent requests) Transfer rate: 1.22 [Kbytes/sec] received Percentage of the requests served within a certain time (ms) 50% 2321 66% 2331 75% 2333 80% 2338 90% 2354 95% 2365 98% 2380 99% 2401 100% 2401 (longest request)
The slowness – up to several seconds to complete a request – had two causes: first, the expense of forking a new guile process for every request, and second, the application’s lengthy initialization phase (building a graph out of a dictionary of words). Web developers in other languages have found ways to avoid these costs – for Python there is WSGI, and of course there is mod_perl – couldn’t I do as well in Scheme? I considered mod_lisp and FastCGI but frankly these seemed difficult and perhaps not possible on my host. The approach that seemed most promising was to run the application as a long-living server process using the built-in HTTP server found in recent versions of the guile Scheme compiler. Getting such a server application running was about as easy as setting up the CGI program and the performance boost was remarkable:
Concurrency Level: 10 Time taken for tests: 2.488 seconds Complete requests: 1000 Failed requests: 0 Write errors: 0 Total transferred: 385000 bytes Total POSTed: 170000 HTML transferred: 306000 bytes Requests per second: 401.97 [#/sec] (mean) Time per request: 24.878 [ms] (mean) Time per request: 2.488 [ms] (mean, across all concurrent requests) Transfer rate: 151.13 [Kbytes/sec] received 66.73 kb/s sent 217.86 kb/s total
That’s right – performing the initialization steps just once resulted in a 100-fold performance increase. Serving requests directly from the built-in HTTP server like this probably represents a lower bound on the latency of this application. But to use this approach at all, first I would have to get a recent-enough version of guile running on my host, which turned out to be non-trivial.
Installing guile on the host
Of course guile did not come pre-installed on my host the way PHP did and I had no root access so I could not simply install it like I would on a personal machine; however I did have user-level shell access and the necessary tools to compile guile from source. This is how I had gotten an older version of guile running to host some simpler applications. But the high-performance web server is found only in a newer guile version which failed to compile from source due to resource limits imposed by the hosting service. I tried simply uploading a build of guile from a Debian package appropriate for the hosts’ architecture; this failed at run time with an error about a glibc version mismatch. However, I noticed that the early parts of the build process that involved compiling and linking C code were working fine; the build wouldn’t fail until later when guile was trying to compile parts of itself into bytecode (files with extension ‘.go’ in the build tree). Figuring that these bytecode files might be architecture-dependent but should not depend on any specific glibc version, I tried simply copying to the build tree from the Debian package those bytecode files which were failing to build. And it worked – I had a working guile-2.0 compiler installed on my host.
Configuring the front-end server
But of course I wasn’t finished – it’s not as though I could just bind the guile server to port 80 on the shared host and be done with it. I needed a way to integrate it with the front-end server, Apache in the case of this host. One way is to bind the guile process to some high-numbered port and use Apache’s RewriteRule to map requests to that port. But in a shared hosting environment I couldn’t count on just being able to grab an available port. I had a good discussion about this at Barcamp Portland and concluded that the best approach was to bind the guile process to a local unix socket, and then configure the front-end web server to forward requests to that socket. Binding the guile-based HTTP server to a unix socket was no problem, but trying to figure out how to get Apache to cooperate in this seemingly simple task was frustrating. I eventually tried asking the Internet, but apparently it either did not know or did not care. In contrast, it is easy to find examples of this in nginx. I soon had my application serving requests through a unix socket behind nginx with a latency of less than 3 msec per request – nearly as fast as the bare guile HTTP server. (It entertains me that this benchmark, my first use of nginx for any purpose, was achieved on a laptop on Portland’s MAX blue line, heading East.)
The CGI trampoline
Still, I was not finished, because I didn’t have the option of using nginx on my host – I had to figure out a way to make their Apache installation work for me. I gave up on finding an Apache configuration directive to do this and realized that there was a alternative that was also likely to be portable to just about any host, no matter which web server it was running or how it was configured – I could write a lightweight CGI process that would simply open up a connection to the socket, forward the request, and echo back the response. I called this a “CGI trampoline”, implemented it, and after the fact found at least one other implementation of the same idea using the same name. My first implementation was in Scheme, and I had my web application serving requests through a unix socket behind Apache with a latency of 39 msec – ten times slower than the bare guile HTTP server, but still ten times better than the whole application as a CGI process. The performance hit was due of course to the cost of starting a new guile process for each request. I rewrote the CGI trampoline in C and request latency dropped to 3.6 msec – pretty good compared to the lower bound of 2.4 msec achieved by serving requests directly from the application running as an HTTP server.
And that’s how the Taubatron was deployed – try it out here!