I feel this is related to the topic of log analysis - the smart handling and escalation of the alerts. In our shop we have a home-grown system we call "escalationbot" that collects alerts and sends them to our oncall Ops person. If that person doesn't cancel the alert withing thirty minutes, it escalates to a secondary oncall person, and then it sends mail to our whole team (after another thirty minutes). Part of what it does is automatically cancel an escalation if a BigBrother or Mon message comes in that reports that the problem is cleared up (BB "green" or Mon "UPALERT"). Escalationbot is written entirely in bash scripting, handling only email alerts, and sending pages with qpage and a modem. What I'm wondering is how many other people have grown their own escalation system, and if this should start being shared on this list as well. The handling of the alerts is just as important as generating the alerts in the first place, IMHO. I'd actually be happy to look at other people's implementions, either to improve ours or ditch it for something better. Thoughts? -- Nate Campi (415) 276-8678 UNIX Ops, Terra Lycos - WiReD SF --------------------------------------------------------------------- To unsubscribe, e-mail: loganalysis-unsubscribeat_private For additional commands, e-mail: loganalysis-helpat_private
This archive was generated by hypermail 2b30 : Fri Aug 24 2001 - 17:08:07 PDT