On Wed, 09 May 2001 00:33:49 PDT, Crispin Cowan said: > For the most part, they show very little degradation due to LSM. Good. > The curious metric is main memory latency: it shows considerable degradation, > when this is one metric I would expect to be unaffected by LSM. I don't have a > decent conjecture of what would cause this. OK.. Speculating on the *weird* causes first. ;) I have to wonder if there's some sort of second-order effect here. For instance, if without the patch, the benchmark exhibits "nice" cache behavior, but adding the patch causes an extra 15 or 20 cache lines to be used, so instead of using (for example) 255K of a 256K cache, it now uses 257K, causing additional cache misses. I've also seen (on older machines with smaller caches) where odd timing constraints would cause odd results - there was a tight loop that took almost exactly one timer tick per iteration. For one version of a program, it would end up doing this: <syscall just before entering the loop - caused lockstep to timer> Loop: <read a big chunk of data, flushing the L1/L2 cache> <crunch numbers> <syscall, and take a timer interrupt while in there - cache flush> repeat So effectively, the cache got totally wiped once per iteration. A very small change removed the initial syscall, which caused the timer interrupt to pop at a differnt point in the cycle, causing an effective 2 flushes per iteration... I could even bring up Prof. Eytan Baruch's problem with the Cornell Theory Center's IBM 3090-600J supercomputer, where he got good results when running at night but program crashes during the day. During the day, more interrupts happened - and although the machine saved the vector registers, it truncated the guard digits, and the resulting loss of precision from (effectively) 132 bit down to 128 bit caused numerical instability.... Or it might just be poor benchmarking methodology ;) It might be productive to walk through the code and do a 'digits of precision' analysis - I strongly suspect that although a lot of places in the benchmark output show 4 or 5 digits, there;s only 1 or 2 digits of real accuracy (for instance, look at the 'Page Fault' column - 3.0000 all the way down. *immediatly* suspect as a 1-digit value. Hand-checking the .0 and .4 data files, it looks like the differing memory latency values are the root cause of it - for all strides, the two files show near-identical number up to the 1.0 value, and then at 1.5 and higher there's a sudden dip for the -lsm kernel. I *really* have to question if there's really as many significant digits in the data as it's reporting. Time to go can-opener the lmbench code and start looking, I guess.... -- Valdis Kletnieks Operating Systems Analyst Virginia Tech
This archive was generated by hypermail 2b30 : Wed May 09 2001 - 07:46:50 PDT