Re: [logs] Secure Central Log Host

From: Marcus J. Ranum (mjrat_private)
Date: Tue Dec 03 2002 - 09:37:43 PST

  • Next message: Florin Andrei: "Re: [logs] Secure Central Log Host"

    Jason Royes wrote:
    >Databases (w/ good schema) excel when complex
    >analysis is required.
    
    Databases also require index inserts, support for transaction
    rollback, and all kinda crazy stuff that makes them completely
    unsuitable as logging systems. We (the collective unconscious "we")
    keep using them, though, because they're available and can be
    made to suit the purpose by throwing a bunch of hardware at the
    problem - which is cheaper, really, than understanding the problem
    or even thinking about it. There are a lot of techniques that
    make more sense than using a generic SQL database - storing
    records in raw syslog files indexed only by offset into the
    file would save a _HUGE_ amount of space over what a database
    uses - a time_t and an off_t is all you need. Primary indexes
    can/should be created on the fly at query time (like with a
    glimpse database) rather than updated at insert time like they
    have to be with a commercial database - doing a sorted insert
    into a b+tree is orders of magnitude faster and more space
    efficient than a random-ordered insert/query, etc. There's a
    lot of simplifying assumptions you can make about logs:
            - they are inserted in event-sequence
            - they are approximately clustered by time
            - you seldom (if ever) will need to seek back 20 minutes
                    and delete a single log record
            - the fields you'll want to search on are either bounded
                    fairly tightly (priority, source, time) or are
                    free-form (regexp or string fragment) - so you'll
                    either want a very compact primary index for
                    the bounded values and a patricia tree or inverted
                    index for the strings
    
    I'm not saying that current approaches won't work, because they
    will. But only 'cuz Moore's law overcomes a lot of the need to
    understand the problem. ;)  If you're thinking of implementing
    a database solution for searching logs, humor an old curmudgeon
    by researching text-retrieval systems - keyword-in-context,
    inverted indexes, and how to do bulk loads of search tables. ;)
    
    mjr. ("once a database guy - always a database guy.")
    ---
    Marcus J. Ranum				http://www.ranum.com
    Computer and Communications Security	mjrat_private
    
    _______________________________________________
    LogAnalysis mailing list
    LogAnalysisat_private
    http://lists.shmoo.com/mailman/listinfo/loganalysis
    



    This archive was generated by hypermail 2b30 : Tue Dec 03 2002 - 09:47:50 PST