Monitoring (moved): Difference between revisions

From OpenHatch wiki
Content added Content deleted
imported>Paulproteus
imported>Jesstess
Line 18: Line 18:
Viewing the web interface, and handling the daemon:
Viewing the web interface, and handling the daemon:


* On <code>linode2</code>, <code>~nagios/secrets/</code> contains the mailman and Nagios web interface passwords.
* On <code>linode2</code>, <code>~/nagios/secrets/</code> contains the mailman and Nagios web interface passwords.
* View the Nagios web interface at <code>http://linode2.openhatch.org/nagios3/</code>
* View the Nagios web interface at <code>http://linode2.openhatch.org/nagios3/</code>
* To restart the Nagios daemon, run <code>sudo /etc/init.d/nagios3 restart</code>
* To restart the Nagios daemon, run <code>sudo /etc/init.d/nagios3 restart</code>

Revision as of 23:32, 21 January 2011

This is a page about improving or modifying OpenHatch.

We call that "Hacking OpenHatch," and there is a whole category of pages about that.


Monitoring

The basics:

  • linode.openhatch.org is the main OpenHatch box, which runs the website.
  • Nagios is running on linode2.openhatch.org. This is also the machine where OpenHatch runs Hudson.

Access:

  • There is a nagios user on linode2. We use ssh keys for login. To get ssh access to linode2, paulproteus will need a public ssh key from you. After he's granted ssh access, you should be able to ssh nagios@linode2.openhatch.org.

Notifications:

  • Nagios notifications go to monitoring@lists.openhatch.org. Anyone can subscribe to this list.

Viewing the web interface, and handling the daemon:

  • On linode2, ~/nagios/secrets/ contains the mailman and Nagios web interface passwords.
  • View the Nagios web interface at http://linode2.openhatch.org/nagios3/
  • To restart the Nagios daemon, run sudo /etc/init.d/nagios3 restart

TODOs

  1. Send Nagios notifications to IRC (#openhatch-auto?)?
  2. Make the Nagios web interface world-viewable.
  3. Currently, only paulproteus has access to linode, so only he can do things like reboot the machine. We're still working out an access model that makes sense.