Performance analysis

= Goals =

Page loads on the OpenHatch website should:


 * Be good to clients: Be processed in less than 0.2 seconds so that users feel the site is snappy.
 * Be good to the server: Not create huge strains on the server, making service slow for other users/requests.

At the same time, we should optimize the most commonly-viewed pages first. This way, we can have the most impact per unit of effort.

We try to measure the time spent as experienced by the linode.

= Tools =

vhost_effort
Asheesh wrote an Apache patch and reporting tool called vhost_effort. This patch modifies access.log files in the following way:


 * The first few columns are standard "combined" log format.
 * They are followed by:
 * musecs (microseconds of server time to process the request)
 * vhost (the name of the Host: that was asked for this post)
 * responsebytes (the number of bytes in the response)

If "musecs" is prefixed by a plus sign, it means that the musecs actually measured the full time the request took to get to the client. This means we end up including bandwidth time, which skews the figures.

Reporting based on vhost_effort
If you run these commands on a shell:


 * cd milestone-a
 * cd mysite/scripts
 * ./vhost_effort.py /path/to/a/log/file

You will get three reports in the current working directory. Note that they use sort of fake URLs; the script guesses "http://" is the right URI scheme since it does not know better.


 * musecs, an unsorted table where column 1 is microseconds of server time and column 2 is URL
 * hitcount, an unsorted table where column 1 is the number of hits to that URL
 * bytes, an unsorted table where column 1 is the number of bytes sent out for this URL