Re: [logs] SDSC Secure Syslog

From: Tom Perrine (tepat_private)
Date: Thu Dec 05 2002 - 17:12:50 PST

  • Next message: Richard Welty: "Re[2]: [logs] reinventing syslog [was: Secure Central Log Host]"

    >>>>> On Fri, 6 Dec 2002 10:33:15 +1100 (Australia/ACT), Darren Reed <avalonat_private> said:
    
        DR> Well, the big problem with lots of data, to me, is not how to collect
        DR> it all in a reliable fashion but what do you do with it all ?  Do you
        DR> just archive it to CD or DVD on a regular basis in case someone with
        DR> a warrant comes knocking or do you generate load graphs or something
        DR> else from it ?  Who's going to look at all the log output from 20k
        DR> nodes when they're all screaming and sending a message every second ?
        DR> I dealing with so much data, there are logistical problems as much as
        DR> technical ones to solve that make the technical ones seem trivial.
    
    Well, the problem boils down to mechanisms to collect the data,
    systems to put it somewhere safe (transport), and systems to do big
    analysis.
    
    We're trying to address the audit transport problem.  We think we've
    helped with the collection problem, but that's really a configuration
    and policy issue.  Our next step will be analysis.
    
    But we realized years ago that all the transport, e.g. classic syslog,
    was crap.  It the protocol, not the implementations.  When
    syslog-reliable finally came out, for better or worse, that's what
    we've got.  And it probably sucks less than UDP :-)
    
    We have groups here and around campus that specialize in data mining.
    We are trying to figure out how to put our security experience
    together with their data mining magic to get useful information from
    the raw data.
    
    But without data, nothing to mine.  Without enough data, mining gives
    you not so useful results.  If the data is low integrity, you get low
    integrity results.
    
    The logistics of big data, we're familiar with.  We have users who
    think that a Terabyte is a good size for a small data set :-)  We're
    currently spinning about 120 Tbytes of disk, and expect to hit a Peta
    byte of disk some time late next year.  We've got HPSS with about 6
    petabytes in it, and we'll probably grow that to about 50 or 60 Pbytes
    in the next 2 years.
    
    I'm a packrat.  I've saved every syslog record since 1996 or so:
    
        5007 pitofdespair:/scratch/slocal/tep-test/logs % ls
        1994  1996  1997  1998  1999  2000  2001  2002
        5008 pitofdespair:/scratch/slocal/tep-test/logs % df -lkh
        Filesystem            Size  Used Avail Use% Mounted on
        /dev/sda1             5.9G  2.7G  2.9G  47% /
        none                  243M     0  243M   0% /dev/shm
        /dev/sda3             520G  373G  120G  76% /scratch/slocal
    
    So, see, that's only 373G so far.  That's 2,825,305,174 lines as of
    the end of October.  That's pretty manageable.  Our supercomputer
    users think that this dataset is "cute", and might be interesting if
    it ever grows up :-)
    
        DR> So, in a sense, what it comes down to is you only spend serious effort
        DR> logging data securely that you care about and the rest goes to /dev/null,
        DR> whether directly or indirectly.  If you're doing that and the messages
        DR> you are interested in only make up a minority of those being generated,
        DR> why do you need such high performance as opposed to good filtering on
        DR> the sender(s) ?
    
    Because I never know what I'm going to be looking for.  Its like
    astronomy, or climate modeling.  If you are looking at a particular
    star, and throw away the images because you are done studying the
    star, then people can't use your images to find Earth-crossing
    asteroids, or new planets, 10 years later.  Similar things with
    climate modeling.  Sometimes you need a long-baseline, wide-spectrum
    data set to see long-term trends, or to find out just when some
    significant event *really* began, when it was very, very small.
    
    A Cray cycle wasted, is lost forever.  A byte that wasn't collected
    and saved can never be collected in the future.  It gone.  Space is
    cheap, too bad you need that byte now.
    
    Also, we're a research place, so perhaps we just have a warped sense
    of packratism.
    
        DR> The point I was trying to make was when you're trying to get really
        DR> high performance out of standard hardware, you need to tune lots of
        DR> corner cases.
    
    Agreed.  Sometimes Moore's Law just isn't enough.  Sometimes you just
    have to get clever and actually write some slick code instead of just
    throwing hardware at it.
    
    --tep
    _______________________________________________
    LogAnalysis mailing list
    LogAnalysisat_private
    http://lists.shmoo.com/mailman/listinfo/loganalysis
    



    This archive was generated by hypermail 2b30 : Fri Dec 06 2002 - 10:02:41 PST