Difference between revisions of "Monitoring (moved)"

From OpenHatch wiki
Jump to navigation Jump to search
imported>Paulproteus
imported>Jesstess
Line 27: Line 27:
 
# Make the Nagios web interface world-viewable.  
 
# Make the Nagios web interface world-viewable.  
 
# Currently, only paulproteus has access to <code>linode</code>, so only he can do things like reboot the machine. We're still working out an access model that makes sense.
 
# Currently, only paulproteus has access to <code>linode</code>, so only he can do things like reboot the machine. We're still working out an access model that makes sense.
 +
# Version the monitoring configurations.
  
 
== Related ==
 
== Related ==
  
 
* See also [[Emergency operations for the openhatch server]]
 
* See also [[Emergency operations for the openhatch server]]

Revision as of 00:11, 22 January 2011

This is a page about improving or modifying OpenHatch.

We call that "Hacking OpenHatch," and there is a whole category of pages about that.


Monitoring

The basics:

  • linode.openhatch.org is the main OpenHatch box, which runs the website.
  • Nagios is running on linode2.openhatch.org. This is also the machine where OpenHatch runs Hudson.

Access:

  • There is a nagios user on linode2. We use ssh keys for login. To get ssh access to linode2, paulproteus will need a public ssh key from you. After he's granted ssh access, you should be able to ssh nagios@linode2.openhatch.org.

Notifications:

  • Nagios notifications go to monitoring@lists.openhatch.org. Anyone can subscribe to this list.

Viewing the web interface, and handling the daemon:

  • On linode2, ~/nagios/secrets/ contains the mailman and Nagios web interface passwords.
  • View the Nagios web interface at http://linode2.openhatch.org/nagios3/
  • To restart the Nagios daemon, run sudo /etc/init.d/nagios3 restart

TODOs

  1. Send Nagios notifications to IRC (#openhatch-auto?)?
  2. Make the Nagios web interface world-viewable.
  3. Currently, only paulproteus has access to linode, so only he can do things like reboot the machine. We're still working out an access model that makes sense.
  4. Version the monitoring configurations.

Related