> > MJR> I've left the whole "...and stick it in a database" part out > MJR> because that's a HARD problem to tackle right and I think that > MJR> will be the bulk of your pain. Hint: before you think about > MJR> putting it into a database, ask yourself "what queries will I > MJR> want to make?" and see if it's even possible to make a data > MJR> model that will allow them... > > Yeah, again, everyone wants to "put it all in a database", but often > without having a reason to do so. Databases trade performance and > scalability for superior data handling (e.g. searching and > correlation). What kind of database engine (HW and SW) is it going to > take to handle transaction streams from a few hundred machines that > are each generating 10K logs records/sec (think e-commerce web > servers)? (Hint: Your Sun salesperson will *love* that new E-server > sale...) > > -- > Tom E. Perrine <tepat_private> | San Diego Supercomputer Center > http://www.sdsc.edu/~tep/ | Hi all, had to jump in on this- I run systems consulting at Addamark, and the problem of economically writing logs to a queryable db is what we deal with. Most of our customers generate 10's-100's of GB of logs per day, both security & web traffic, and writing to a database is impractical and often impossible. There are few relational databases out there that are more than 1TB in size, and they demand very expensive SMP hardware. Yet security analysts, security apps & managers often want to do a lot of ad hoc querying, so having to run custom scripting against flat logfiles for each new analysis becomes pretty impractical. Log reporting packages tend to offer only a few summaries and not support ad hoc investigations. Indeed, as Marcus points out , this is the hard part even though a lot of people start out saying "I'm just going to dump it to a database." The Addamark Log Management System turns the filesystem itself into a database. It stores log data as compressed files segregated in a b-tree index on time, with an SQL front end for querying. Because we parse the logs for each time segment into columns with more data redundancy than full rows we actually use less disk than gzipping flat files. Our engine runs on clusters of Linux PC's, with full parallelism in both load & query processing & distribution of data storage. So you get even more storage efficiency and write performance than flat files with the queryability of a relational database. And to the question of whether a database model will support the queries, in addition to standard SQL, we allow customers to develop user-defined functions and aggregates in perl. For example, if you want to match a signature pattern that may show a long-term attack reflected in logs across firewalls, routers, servers & apps, you can make a perl function that evaluates each user's activity and returns the users that match the pattern. Pretty hard to do in SQL alone. Hope this helps Christina Christina Noren director, systems consulting Addamark Technologies Inc. cfrlnat_private www.addamark.com 415-613-5441 cell _______________________________________________ LogAnalysis mailing list LogAnalysisat_private http://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2b30 : Tue Oct 29 2002 - 12:26:58 PST