While we're discussing XML formats I thought I'd mention some experience I've had. I've been working with IBM's Common Base Event XML format. [1] It's been morphed into the OASIS standard WSDM Management Using Web Services v1.0 (WSDM-MUWS) [2] . It's been fairly useful to us because the Eclipse client (using Eclipse TPTP) has good support for the format which made implementing a viewer very easy. We used Visual Basic to generate the XML straight from the source (as the source app is in Visual Basic - grrr) and sends the message over WebSphere MQ to a J2EE message driven bean which writes the event to the database and publishes it to any subscribers using Publish/Subscribe. The CBE format holds most of the common information you'd expect - creation time, severity, priority, source, etc and has placeholders for arbitrary XML data. It doesn't fully solve the problem in this thread - i.e. representing the actual event contents (i.e. not creation time, severity, etc) in a standard way to allow analysis. Links: [1] Search www.ibm.com/developerworks for Common Base Event or start at http://www-128.ibm.com/developerworks/webservices/library/specification/ws-cbe/ [2] http://www.oasis-open.org/specs/index.php Edward Sargisson BSc, BCom Consultant IBM Business Consulting Services Wellington, New Zealand DDI: + 64-4-462-3586, Mob: + 64-21-254-8927 P O Box 38 993, Wellington, NEW ZEALAND edward.j.sargisson@private todd.glassey@private Sent by: loganalysis-bounces+edward.j.sargisson=nz1.ibm.com@private 08/12/2005 05:37 To Christina Noren <cfrln@private>, LogAnalysis@private cc Subject [logs] Re: regex-less parsing of messages Christina - FYI I am working on a log management practice statement for the use of SPLUNK to address log management issues inside of ITIL, COBIT v4, and the updated ISO17799/20001:20005 documents. The intent is that for the current client I have now, to be able to use SPLUNK Pro as the basis of a logging management and event detection regimen for their automated and periodic controls. This makes SPLUNK Pro totally good to go for meeting SOX and compliance in 2CFR/211CFR type environments. Todd -------------- Original message ---------------------- From: Christina Noren <cfrln@private> > Speaking from Splunk... > > This problem of needing to build and maintain a big library of > regexes to analyze logs centrally is one we're trying to end run, so > thanks Todd for bringing us into the conversation. > > We agree with Frank that getting common XML standards is pretty > unlikely across the broad range of log sources people need to correlate. > > We've instead built a series of universal processors that find and > normalize timestamps in any format, then tokenize everything in each > event, and classify new sources and events based on patterns and > grammatical structure in the event. We put off all of the semantics > till search time so we don't need to worry about mapping "deny" > "reject" and other variants of the same action to a common value. I'm > oversimplifying a more complex set of algorithms for the sake of a > short message. > > Users are able to put in log sources we've never seen before and have > them handled by the same algorithms as everything else. > > Then, instead of a structured relational db, we put everything into a > rich, dense search index behind a simple search interface that > provides results to most searches in seconds. This has the nice side > effect of making ad hoc access to the logs a lot easier than needing > to form a SQL style query. > > This works pretty well for use cases like tracing an email message > through different sendmail, antispam and other events and other > investigative/troubleshooting scenarios. There's really no reason to > write a regex to parse sendmail's different message formats into a > structured schema if you're going to search for an email address and > time, then follow that event based on message id and other content of > that event. We have some interesting accelerators for following the > correlation, like a "related" feature that looks for the connections > based on time and value. > > - Christina > > p.s. you can download Splunk free at www.splunk.com > > > > On Dec 6, 2005, at 8:13 AM, todd.glassey@private wrote: > > > We use SPLUNK for exactly this. > > > > Todd > > -------------- Original message ---------------------- > > From: "Solomon, Frank" <frank@private> > > > >> Jason, your example certainly struck a chord. We haven't even > >> begun to > >> put our mail logs into our central log server because of the > >> technical > >> challenges that would pose. And yet, we get asked the same sort of > >> questions which require a highly trained person to probe through the > >> heterogeneous mail log files and trace the path of some errant > >> envelope > >> that may or may not actually exist. It is not pretty; part of the > >> price > >> we pay for having to accommodate multiple mail systems, vendors and > >> standards. > >> > >> Our standing joke is: "That's the nice thing about standards, > >> there are > >> so many to choose from and everyone can have their own." So, > >> "sendmail" > >> has its "standard" log format and "Exchange" has its "standard" log > >> format, and "Novell" has its "standard" log format, etc. I saw an > >> article recently describing the new "logging standard" that Microsoft > >> was about to introduce in their latest OS. Well that will certainly > >> clear things up! I'm sure all their competitors will rush to > >> implement > >> compatible systems. Don't get me wrong, I laud Microsoft's > >> attempt to > >> enforce programmer discipline. > >> > >> In case you're interested in the MS stuff: > >> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/ > >> wes/wes > >> /about_the_windows_event_log.asp > >> > >> <dreaming> > >> Certainly, the first challenge in being able to analyze data is > >> getting > >> it into a common format with a common symbolic representation of the > >> underlying information. Since we cannot count on the energy and > >> discipline of the programmers that write the log-generating programs, > >> that energy must be invested in and discipline must be enforced by > >> the > >> log collection mechanism. It's becoming obvious to me that the > >> blanket > >> approach of collecting everything on the off chance that some > >> auditor or > >> forensic specialist in the future might be able to make sense of > >> it, is > >> a waste of resources. That implies that the requirements for what > >> needs > >> to be logged could be set at the collecting end and that somehow > >> those > >> requirements need to be communicated to the source of the messages to > >> make sure that the required messages exist and are coded > >> appropriately > >> (which they won't be). > >> </dreaming> > >> > >> I know, I'm dreaming: there's no choice but to continue to collect > >> tons > >> of ore and hope to glean an ounce of silver from it every once in a > >> while. And besides, those old log CD's make nifty tree ornaments. > >> > >> John Moehrke mentioned that his organization was making the > >> attempt to > >> define the standards for the events at the beginning. To quote: "We > >> thus will be sending the experts in log analysis an already > >> manageable > >> format." That's a great idea, but it suffers from the same standards > >> problem I've mentioned: everybody's likely to have their own (maybe > >> someday the only industry will be healthcare, but not yet). And > >> after > >> looking at the RFC, I can't imagine that good things will come of the > >> burden this will place on the infrastructure if the logging rate > >> is very > >> high. Can you imagine the "sendmail" guys wrapping xml around the > >> mail > >> logs? Or, all the mail system vendors agreeing on a common xml > >> schema > >> for their mail logs? Yeah, it might happen. > >> > >> Personally, I'm glad that syslog uses udp. > >> > >> Sorry, I've rambled entirely too long, I'll go back to merely > >> listening. > >> > >> Frank Solomon > >> University of Kentucky > >> Lead Systems Programmer, Enterprise Systems > >> http://www.franksolomon.net > >> "If you give someone a program, you will frustrate them for a day; if > >> you teach them how to program, you will frustrate them for a > >> lifetime." > >> --Anonymous > >> > >> > >> -----Original Message----- > >> [mailto:loganalysis-bounces+sysfrank=uky.edu@private] On > >> Behalf > >> Of Jason Haar > >> Sent: Monday, December 05, 2005 3:15 PM > >> > >> . . .snip. . . > >> > >> Boring, everyday example: These days (due to the horrors of antispam > >> systems) internal users routinely ring the helpdesk and ask > >> "Customer YY > >> sent me an email and I never got it. What happened?". To figure > >> that out > >> involves converting what you can learn about customer YY into DNS > >> records and IP addresses, then tracking any related connections as > >> they > >> hits the edge of our Internet link. Where it first meets our RBL > >> checks, > >> then flows through AV and antispam systems, then through a couple > >> more > >> internal mail relays before hitting our end mail servers. We have > >> logs > >> all merged together from all those systems, but frankly, I am > >> still the > >> only one who can link all those events together. And my attempts at > >> turning that eyeballing into a program have failed so far. And that's > >> only one example. > >> > >> . . . > >> _______________________________________________ > >> LogAnalysis mailing list > >> LogAnalysis@private > >> http://lists.shmoo.com/mailman/listinfo/loganalysis > >> > > > > > > _______________________________________________ > > LogAnalysis mailing list > > LogAnalysis@private > > http://lists.shmoo.com/mailman/listinfo/loganalysis > > > _______________________________________________ LogAnalysis mailing list LogAnalysis@private http://lists.shmoo.com/mailman/listinfo/loganalysis
_______________________________________________ LogAnalysis mailing list LogAnalysis@private http://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2.1.3 : Wed Dec 07 2005 - 18:17:57 PST