Difference between revisions of "Monitoring (moved)"

From OpenHatch wiki
Jump to navigation Jump to search
imported>Jesstess
Line 3: Line 3:
 
== Monitoring ==
 
== Monitoring ==
  
The basics:
+
=== The basics===
 
 
 
* <code>linode.openhatch.org</code> is the main OpenHatch box, which runs the website.
 
* <code>linode.openhatch.org</code> is the main OpenHatch box, which runs the website.
 
* Nagios is running on <code>linode2.openhatch.org</code>. This is also the machine where OpenHatch runs Hudson.
 
* Nagios is running on <code>linode2.openhatch.org</code>. This is also the machine where OpenHatch runs Hudson.
  
Access:
+
=== Access ===
 +
 
 +
* There is a <code>nagios</code> user on <code>linode2</code>. We use ssh keys for login.
 +
* If you want SSH access to that account, file a bug requesting it, and attach an SSH key. You should hear back within 2 days; if you don't hear back by then, try to find paulproteus or jesstess on IRC.
 +
* Then you can do:
  
* There is a <code>nagios</code> user on <code>linode2</code>. We use ssh keys for login. To get ssh access to <code>linode2</code>, paulproteus will need a public ssh key from you. After he's granted ssh access, you should be able to <code>ssh nagios@linode2.openhatch.org</code>.
+
ssh nagios@linode2.openhatch.org
  
Notifications:
+
=== Notifications ===
  
 
* Nagios notifications go to <code>monitoring@lists.openhatch.org</code>. Anyone can subscribe to this list.
 
* Nagios notifications go to <code>monitoring@lists.openhatch.org</code>. Anyone can subscribe to this list.
  
Viewing the web interface, and handling the daemon:
+
=== Viewing the web interface, and handling the daemon ===
  
 
* On <code>linode2</code>, <code>~/nagios/secrets/</code> contains the mailman and Nagios web interface passwords.
 
* On <code>linode2</code>, <code>~/nagios/secrets/</code> contains the mailman and Nagios web interface passwords.
Line 22: Line 25:
 
* To restart the Nagios daemon, run <code>sudo /etc/init.d/nagios3 restart</code>
 
* To restart the Nagios daemon, run <code>sudo /etc/init.d/nagios3 restart</code>
  
In case of emergency:
+
===In case of emergency===
  
 
* See [[Emergency operations for the openhatch server]]. People with ssh keys set up for the Linode Shell (Lish) can reboot the box and have other limited emergency capabilities.
 
* See [[Emergency operations for the openhatch server]]. People with ssh keys set up for the Linode Shell (Lish) can reboot the box and have other limited emergency capabilities.

Revision as of 06:06, 17 March 2011

This is a page about improving or modifying OpenHatch.

We call that "Hacking OpenHatch," and there is a whole category of pages about that.


Monitoring

The basics

  • linode.openhatch.org is the main OpenHatch box, which runs the website.
  • Nagios is running on linode2.openhatch.org. This is also the machine where OpenHatch runs Hudson.

Access

  • There is a nagios user on linode2. We use ssh keys for login.
  • If you want SSH access to that account, file a bug requesting it, and attach an SSH key. You should hear back within 2 days; if you don't hear back by then, try to find paulproteus or jesstess on IRC.
  • Then you can do:
ssh nagios@linode2.openhatch.org

Notifications

  • Nagios notifications go to monitoring@lists.openhatch.org. Anyone can subscribe to this list.

Viewing the web interface, and handling the daemon

  • On linode2, ~/nagios/secrets/ contains the mailman and Nagios web interface passwords.
  • View the Nagios web interface at http://linode2.openhatch.org/nagios3/
  • To restart the Nagios daemon, run sudo /etc/init.d/nagios3 restart

In case of emergency

TODOs

  1. Send Nagios notifications to IRC (#openhatch-auto?)?
  2. Make the Nagios web interface world-viewable.
  3. Version the monitoring configurations.
  4. Send SMS alerts to people who want them.
  5. Add historical trending (Munin)?

Related