-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Marcus J. Ranum writes: >Let me point out that: >a) you're right >b) signatures have their place Your first point is very good. But I certainly don't disagree with the second. Hell, I've even committed the heinous act of unleashing yet another open source signature-based IDS on the world. Everybody knows the aphorism, `If your only tool is a hammer, all your problems look like nails,' right? Signature-based analysis is the hammer of the security industry. Vendors have been pimping quote solutions unquote that bundle fifteen hundred nearly identical hammers, often to customers who don't even know what a nail looks like. In the end, we discover we've developed a lot of really sexy hammer technology, and we have plush custom hammers available for hundreds of distinct kinds of nails. I'm not saying that I don't want a hammer in my toolbox, I'm just saying that not all of our problems are nails (I'm pretty sure there are a nonzero number of screws loose as well). >True signatureless systems generate results like: "the ratio of SYN to FIN >packets is 2 standard deviations from the norm for this time of the day." >They leave it entirely up to you to figure out the significance. Well, yes and no. Imagine a system that uses some hideously byzantine algorithm to profile network traffic. It coughs out a summary which is isomorphic to a BPF filter which will match all normal traffic (for some sufficiently specified definition of `normal'). A perl script translates this into its inverse and runs, oh, tcpdump(8) on passing traffic with the resulting filter. The internal mechanisms are certainly the same as a vanilla signature-based system, but I think it does some violence to the term if we call the system as a whole `signature based'. If we're not actually enumerating characteristics and associating them with some tag, then we're not writing signatures---any more than we're writing strings of ANDs and NOTs if we're coding in C (even though the end result is provably isomorphic to a collection of ANDs and NOTs). So, bringing this back to the context of my comments, my complaint is not that we use signatures (we should---we'd be nuts not to). My complaint is that the narrow focus on signature-based methods reinforces a lot of bad habits in data analysis and collection (in the same way that firewalls are useful gadgets, but reliance on them results in a lot of bad network design decisions---or, indeed, networks being built without being designed at all). To clarify my point and put this more firmly into the context of log analysis, here's an example: Take a log file. Come up with a list of regexen for things that you consider interesting which may appear in the logfile, and actions you want taken when they happen (i.e., send mail or put a blinking red light on the web page): Conventionally, you would improve this system by: -Enumerating more and more interesting things -Making more and more elaborate regexen -Writing more elaborate response actions ...and so on. What is the limiting factor going to be? Without dealing with all the cases and specifics, my contention is that the limiting factor is that the match-some-characteristic mechanism is inherently merely a lexical categorisation. In other words, your pattern will never contain more information than the pattern itself. This is (obviously) tautological: if you're throwing a flag when you see a log line that matches some regex, all the flag means is that some log line matches the regex. We might -assume- that this corresponds to some underlying condition (the web server just got hit by sasser, someone just logged in via ssh(1), or whatever), but that's not what your signature is actually telling you. This is because the simple lexical analysis that your signatures are doing cannot convey semantic content. In other words, they are only testing for the presence or absence of certain characteristics in the data, not evaluating the `meaning' of those characteristics or the data. This is why signature-based systems are lousy at enunciating things like risk analyses or even reporting anomalies---they simply lack the expressive power from the -underlying structure of the system-. The example I like to use is that using a signature system to evaluate the meaning of some event is like trying to figure out what some C code will do by grepping for keywords in the source code. So instead, imagine that we understand the tags we associate with our logfile-searching regexen to be lexical tokens---like the content of a lex(1) input file. So we can then construct a grammar which expresses the relationships between these tokens---analogous to a yacc(1) input. Then we have -enormously- greater expressive power in which to search for things, evaluate the presence or absence of interesting conditions, or (importantly) make statements about the condition of systems or networks. Note that this is -not- merely a system for aggregating conditions (i.e., reporting that three regexen (rather than one) have been matched)---although it certainly encompasses this sort of thing. If, indeed, we were to use lex(1) (or flex(1)) and yacc(1) to construct our grammar (which is what I've been doing) then the resulting system is capable of exactly as much expressive power as any LALR grammar. Now, at the heart of the system, we're still using signatures---we're still playing match-the-regex. But this can, I think, be meaningfully called a non-signature-based system. Or at least it is only signature based in the sense that, say, C is. There's nothing magic about this formulation, mind you. I think it is substantially different from a vanilla signature-matching system, and I think it's substantially different from the aggregation/correlation systems I've seen. The reason why I bring it up is that this model highlights the limitations of the signature model (by explicitly drawing the parallel to a compiler's lexical analyzer). >That's 1/2 of the problem!! The OTHER 1/2 the problem is how to encode >ignorance (anti-knowledge) into our security systems!!!!!!! >Nobody has tried this, yet. But what it someone tried to do "artificial ignora >nce" in an IDS: model what everything that's OK looks like and alert whenever >traffic occurs that doesn't fire an "ignore this" signature. Note to readers; >I hereby disclose this as prior art so if some idiot patents the idea, we can >all point to this posting. ;) I think there's art prior to your prior art. I suggest this as the default mode of operation in the shoki documentation, and I know I wasn't the first one to come up with the idea. Isn't it in Denning and Neumann's model for IDES? >Put another way: it's easier to know who your friends are, than to keep >track of all your enemies IF and ONLY IF you have fewer friends than >enemies. ;) Everything you need to know about information security you can learn from the Mafia[0]. - -spb - ----- 0 Well, not -everything-. But why screw up a perfectly good aphorism with a qualification? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (OpenBSD) iD8DBQFBJmphG3kIaxeRZl8RAmVqAJ9gxBwv+PUQfHLhmKN9t/nwYaqWbgCeImnG AMvVUJ97ziV9OgSZb4L2VbI= =spU9 -----END PGP SIGNATURE----- _______________________________________________ LogAnalysis mailing list LogAnalysis@private http://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2.1.3 : Fri Aug 20 2004 - 14:22:34 PDT