Jason Royes wrote:
>Databases (w/ good schema) excel when complex
>analysis is required.
Databases also require index inserts, support for transaction
rollback, and all kinda crazy stuff that makes them completely
unsuitable as logging systems. We (the collective unconscious "we")
keep using them, though, because they're available and can be
made to suit the purpose by throwing a bunch of hardware at the
problem - which is cheaper, really, than understanding the problem
or even thinking about it. There are a lot of techniques that
make more sense than using a generic SQL database - storing
records in raw syslog files indexed only by offset into the
file would save a _HUGE_ amount of space over what a database
uses - a time_t and an off_t is all you need. Primary indexes
can/should be created on the fly at query time (like with a
glimpse database) rather than updated at insert time like they
have to be with a commercial database - doing a sorted insert
into a b+tree is orders of magnitude faster and more space
efficient than a random-ordered insert/query, etc. There's a
lot of simplifying assumptions you can make about logs:
- they are inserted in event-sequence
- they are approximately clustered by time
- you seldom (if ever) will need to seek back 20 minutes
and delete a single log record
- the fields you'll want to search on are either bounded
fairly tightly (priority, source, time) or are
free-form (regexp or string fragment) - so you'll
either want a very compact primary index for
the bounded values and a patricia tree or inverted
index for the strings
I'm not saying that current approaches won't work, because they
will. But only 'cuz Moore's law overcomes a lot of the need to
understand the problem. ;) If you're thinking of implementing
a database solution for searching logs, humor an old curmudgeon
by researching text-retrieval systems - keyword-in-context,
inverted indexes, and how to do bulk loads of search tables. ;)
mjr. ("once a database guy - always a database guy.")
---
Marcus J. Ranum http://www.ranum.com
Computer and Communications Security mjr@ranum.com
_______________________________________________
LogAnalysis mailing list
LogAnalysis@lists.shmoo.com
http://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2b30 : Tue Dec 03 2002 - 09:47:50 PST