Monitoring (moved)

The basics

 * is the main OpenHatch box, which runs the website.
 * is the secondary server for OpenHatch. It hosts the Hudson continuous integration server, as well as Nagios!
 * is a third server, hosted at GPLHost, that runs Jenkins.
 * The Nagios configuration is owned by a user called nagios on linode2.openhatch.org.

Access

 * We use ssh keys for login.
 * If you want SSH access to that account, file a bug requesting it, and attach an SSH key. You should hear back within 2 days; if you don't hear back by then, try to find paulproteus or jesstess on IRC.
 * Then you can do:

ssh nagios@linode2.openhatch.org


 * You'll know it's working if you are logged in. If you see a "Password:" prompt, then it is not working.

Notifications

 * Nagios notifications go to monitoring@lists.openhatch.org. Anyone can subscribe to this list or read its archives.

Making changes
In brief, here's what you need to know:

git checkout -b my_changes git commit --author="Some Body "
 * Edit files in ~nagios/
 * Once you know what changes you want to make, create a local branch with those changes:
 * As you make changes, make meaningful commits. Also, tell "git commit" to use your identity:
 * After you have made the changes, ask someone to review them and merge the changes to master.
 * Rationale: If you stick to the above process, it is fairly easy to roll back to the "master" branch of the Nagios configuration.
 * History: We came up with this process during issue332.

Viewing the web interface, and handling the daemon

 * On,   contains the mailman and Nagios web interface passwords.
 * View the Nagios web interface at
 * To restart the Nagios daemon, run

sudo /etc/init.d/nagios3 restart

In case of emergency

 * See Emergency operations for the openhatch server. People with ssh keys set up for the Linode Shell (Lish) can reboot the box and have other limited emergency capabilities.

TODOs

 * 1) Send Nagios notifications to IRC ( ?)?
 * 2) Make the Nagios web interface world-viewable.
 * 3) Version the monitoring configurations.
 * 4) Send SMS alerts to people who want them.
 * 5) Add historical trending (Munin)?

Related

 * See also Emergency operations for the openhatch server
 * See also the page about the Login team