Re: LTT micro-benchmarks

karymat_private

I finally got some time to answer ... :)

"Anil B. Somayaji" wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Well, since I've been harping on micro-benchmarks, I thought I'd go
> ahead and run a few.  The basic news is that the Linux Trace Toolkit
> does impose significant overhead on simple operations; however, the
> overhead is pretty much in the noise for anything real, at least on my
> system.  Each test seemed to indicate a 1 microsecond overhead for
> simple operations, roughly.  Of course, this affects fast operations
> more than slow ones.  A "null" call (getppid) was 60% slower, and
> minimal basic file operations (stat, open, close) were 11-12% slower -
> but the overhead was only around 2-3% for fork and execve.  There also
> seems to be some significant impact on local communication latency,
> but I only ran the fast versions of the tests, so that may need more
> work.

I expected this type of result. Here's what you are measuring for
a "null" call:

1) Before the entry into the real system call, the following assembly
code gets to run:
	movl %esp, %eax                 # copy the stack pointer
	pushl %eax                      # pass the stack pointer copy
	call SYMBOL_NAME(trace_real_syscall_entry)
	addl $4,%esp                    # return stack to state before pass
	movl ORIG_EAX(%esp),%eax	# restore eax to it's original content

2) trace_real_syscall_entry is in traps.c and runs the following when
no trace driver is loaded:
        /* Set the syscall ID */
        trace_syscall_event.syscall_id = (uint8_t) regs->orig_eax;

	/* Set the address in any case */
	trace_syscall_event.address  = regs->eip;

	/* Are we in the kernel (This is a kernel thread)? */
	if(!(regs->xcs & 3))
	  /* Don't go digining anywhere */
	  goto trace_syscall_end;

	/* Get the trace configuration */
	if(trace_get_config(&use_depth,
			    &use_bounds,
			    &seek_depth,
			    (void*)&lower_bound,
			    (void*)&upper_bound) < 0)
	  goto trace_syscall_end;

	...

trace_syscall_end:
	/* Trace the event */
	trace_event(TRACE_EV_SYSCALL_ENTRY, &trace_syscall_event);

Both trace_get_config() and trace_event() are part of kernel/trace.c
This is trace_get_config():
  /* Is there a tracer already registered */
  if(tracer_registered == 0)
    return -ENOMEDIUM;

  /* Get the configuration */
  *pmFetchSyscallUseDepth  = tracer->fetch_syscall_eip_use_depth;
  *pmFetchSyscallUseBounds = tracer->fetch_syscall_eip_use_bounds;
  *pmSyscallEipDepth = tracer->syscall_eip_depth;
  *pmSyscallLowerBound = tracer->syscall_lower_eip_bound;
  *pmSyscallUpperBound = tracer->syscall_upper_eip_bound;

  /* Tell the caller that everything was OK */
  return 0;

And this is trace_event():
  /* Is there a tracer registered */
  if(tracer_registered != 1)
    lRetValue = -ENOMEDIUM;
  else
    /* Call the tracer */
    lRetValue = tracer->trace(pmEventID, pmEventStruct);

  /* Are there any callbacks to call */
  if(trace_callback_table[pmEventID - 1].Next != NULL)
    {
    /* Call all the callbacks linked to this event */
    for(pTCTEntry = trace_callback_table[pmEventID - 1].Next;
	pTCTEntry != NULL;
	pTCTEntry = pTCTEntry->Next)
      pTCTEntry->Callback(pmEventID, pmEventStruct);
    }

  /* Give the return value */
  return lRetValue;

I agree this isn't the most efficient way to do things, but micro-benchmarks
weren't an issue until now.

3) The cost of the call by itself

4) Once the syscall is made, the following assembly runs:
#if (CONFIG_TRACE || CONFIG_TRACE_MODULE)
	call SYMBOL_NAME(trace_real_syscall_exit)
#endif

5) trace_real_syscall_exit is also in traps.c and does only one thing:
        trace_event(TRACE_EV_SYSCALL_EXIT, NULL);

Your results do make sense given the complexity traversed on the "null"
call, but there are different ways to reduce that.

Typically, appart from the above "null" system call, here's what an
LTT macro generates:
#define TRACE_SCHEDCHANGE(OUT, IN, OUT_STATE) \
           do \
           {\
           trace_schedchange sched_event;\
           sched_event.out       = OUT;\
           sched_event.in        = IN;\
           sched_event.out_state = OUT_STATE; \
           trace_event(TRACE_EV_SCHEDCHANGE, &sched_event);\
           } while(0);

Here, the call to trace_event() is made whether a hook has been inserted
or not.

The following are possibilities to accelerate the "null" case:
1) The use of conditional calls:
if(hook_X_active == 1)
	call_hook_X();
There are many variations on this theme. The "if" may be made on
a table of hooks and call_hook_X() could be a function pointer,
etc.

2) The use of NOPs (as I had suggested earlier) which are dynamically
replaced by the hooking code. Practically, this means inserting a
couple of NOPs at the hooking spot. To ease things, a macro could
be used that would both generate the NOPs and add an entry in a
hook table.

3) The use of default jmps over the hook code (this may be simpler
to implement than the NOP scheme). Using this, the default behavior
compiled within the kernel would be a jump over the code implementing
the hook. Something like:
A()
B()
HOOK()
C()

Would yield:
call A
call B
jump label1234
call hook
label1234:
call C

To activate the hook, one would only need to overwrite the "jump label1234"
by NOPs. Modern CPUs are quite good at taking care of unconditionnal branches
and, since the branch distances would be quite short, results may be even better
than adding a bunch of NOPs.

This solution also makes it easier to code hooks that actually pass
parameters to the hook called. In the case of NOPs things are a little
bit more complicated since the hooks being called then has to go figure
out the stack location of the variables he needs to pass.

> 
> Note that these results were for the LTT built-in to the kernel.  I
> wanted to test it as a module (both loaded and unloaded), but the
> unofficial 2.4.2 patch leaves unresolved symbols with my
> configuration.

Hmm... I knew I'd have to check the submitted patches before making
them "official", this is only an extra reason :)

> 
> What does this mean for the LSM project?  I'm not sure.  Ideally, I
> think we should shoot for at most a 20% impact on null calls, but
> really that's a judgement call - maybe someone should check with the
> kernel gurus about what would be acceptable impact.  My main thought
> is that if some bit of code is going to be accepted, it should be
> minimal, and be thoroughly benchmarked.  I would hate to see a
> beautiful design and implementation get rejected because it was too
> expensive for people who didn't care about security.

I agree. Rather than pushing for everyone to use the LTT patch as is,
maybe the better approach would be to help in drafting a better hooking
method for the kernel. On the long run, not only would this be usefull
for LSM and LTT, but for all other projects that may need, for a reason
or another, to hook into the kernel.

>   edited output from lmbench-2beta1, three runs each
>   (removed hostname and clarified kernel version, deleted
>   parts that didn't seem to indicate difference, or that weren't
>   completely run)

I'll have to admit that I would have liked to see those tests that
"didn't seem to indicate difference", if not for this list, then for
my own analysis. Could you send me the complete thing off-list?

I am pleased, though, to see that LTT makes no difference whatsoever
as to the communication bandwith.

Thanks for the feedback and let me know what you think about the
alternative hooking mechanisms I suggested above.

Cheers,

Karim

===================================================
                 Karim Yaghmour
               karymat_private
      Embedded and Real-Time Linux Expert
===================================================

_______________________________________________
linux-security-module mailing list
linux-security-moduleat_private
http://mail.wirex.com/mailman/listinfo/linux-security-module