Re: More 2.4.4 benchmarks

From: Valdis.Kletnieksat_private
Date: Wed May 09 2001 - 07:45:37 PDT

  • Next message: Greg KH: "Re: Sample SELinux hook function implementations"

    On Wed, 09 May 2001 00:33:49 PDT, Crispin Cowan said:
    > For the most part, they show very little degradation due to LSM.  Good.
    > The curious metric is main memory latency:  it shows considerable degradation,
    > when this is one metric I would expect to be unaffected by LSM.  I don't have a
    > decent conjecture of what would cause this.
    OK.. Speculating on the *weird* causes first. ;)
    I have to wonder if there's some sort of second-order effect here. For instance,
    if without the patch, the benchmark exhibits "nice" cache behavior, but
    adding the patch causes an extra 15 or 20 cache lines to be used, so instead
    of using (for example) 255K of a 256K cache, it now uses 257K, causing
    additional cache misses.
    I've also seen (on older machines with smaller caches) where odd timing
    constraints would cause odd results - there was a tight loop that took
    almost exactly one timer tick per iteration. For one version of a program, it
    would end up doing this:
         <syscall just before entering the loop - caused lockstep to timer>
    Loop: <read a big chunk of data, flushing the L1/L2 cache>
         <crunch numbers>
         <syscall, and take a timer interrupt while in there - cache flush>
    So effectively, the cache got totally wiped once per iteration.  A very
    small change removed the initial syscall, which caused the timer interrupt
    to pop at a differnt point in the cycle, causing an effective 2 flushes
    per iteration...
    I could even bring up Prof. Eytan Baruch's problem with the Cornell Theory
    Center's IBM 3090-600J supercomputer, where he got good results when
    running at night but program crashes during the day.  During the day,
    more interrupts happened - and although the machine saved the vector registers,
    it truncated the guard digits, and the resulting loss of precision from
    (effectively) 132 bit down to 128 bit caused numerical instability....
    Or it might just be poor benchmarking methodology ;)
    It might be productive to walk through the code and do a 'digits of precision'
    analysis - I strongly suspect that although a lot of places in the
    benchmark output show 4 or 5 digits, there;s only 1 or 2 digits of real
    accuracy (for instance, look at the 'Page Fault' column - 3.0000 all the
    way down.  *immediatly* suspect as a 1-digit value.
    Hand-checking the .0 and .4 data files, it looks like the differing
    memory latency values are the root cause of it - for all strides, the two
    files show near-identical number up to the 1.0 value, and then at 1.5 and
    higher there's a sudden dip for the -lsm kernel.  I *really* have to question
    if there's really as many significant digits in the data as it's reporting.
    Time to go can-opener the lmbench code and start looking, I guess....
    				Valdis Kletnieks
    				Operating Systems Analyst
    				Virginia Tech

    _______________________________________________ linux-security-module mailing list linux-security-moduleat_private

    This archive was generated by hypermail 2b30 : Wed May 09 2001 - 07:46:50 PDT