Re: LTT micro-benchmarks

From: Karim Yaghmour (karymat_private)
Date: Wed Apr 18 2001 - 16:17:48 PDT

  • Next message: Karim Yaghmour: "Re: Hooking into Linux using the Linux Trace Toolkit"

    I finally got some time to answer ... :)
    
    "Anil B. Somayaji" wrote:
    > 
    > -----BEGIN PGP SIGNED MESSAGE-----
    > Hash: SHA1
    > 
    > Well, since I've been harping on micro-benchmarks, I thought I'd go
    > ahead and run a few.  The basic news is that the Linux Trace Toolkit
    > does impose significant overhead on simple operations; however, the
    > overhead is pretty much in the noise for anything real, at least on my
    > system.  Each test seemed to indicate a 1 microsecond overhead for
    > simple operations, roughly.  Of course, this affects fast operations
    > more than slow ones.  A "null" call (getppid) was 60% slower, and
    > minimal basic file operations (stat, open, close) were 11-12% slower -
    > but the overhead was only around 2-3% for fork and execve.  There also
    > seems to be some significant impact on local communication latency,
    > but I only ran the fast versions of the tests, so that may need more
    > work.
    
    I expected this type of result. Here's what you are measuring for
    a "null" call:
    
    1) Before the entry into the real system call, the following assembly
    code gets to run:
    	movl %esp, %eax                 # copy the stack pointer
    	pushl %eax                      # pass the stack pointer copy
    	call SYMBOL_NAME(trace_real_syscall_entry)
    	addl $4,%esp                    # return stack to state before pass
    	movl ORIG_EAX(%esp),%eax	# restore eax to it's original content
    
    2) trace_real_syscall_entry is in traps.c and runs the following when
    no trace driver is loaded:
            /* Set the syscall ID */
            trace_syscall_event.syscall_id = (uint8_t) regs->orig_eax;
    
    	/* Set the address in any case */
    	trace_syscall_event.address  = regs->eip;
    
    	/* Are we in the kernel (This is a kernel thread)? */
    	if(!(regs->xcs & 3))
    	  /* Don't go digining anywhere */
    	  goto trace_syscall_end;
    
    	/* Get the trace configuration */
    	if(trace_get_config(&use_depth,
    			    &use_bounds,
    			    &seek_depth,
    			    (void*)&lower_bound,
    			    (void*)&upper_bound) < 0)
    	  goto trace_syscall_end;
    
    	...
    
    trace_syscall_end:
    	/* Trace the event */
    	trace_event(TRACE_EV_SYSCALL_ENTRY, &trace_syscall_event);
    
    Both trace_get_config() and trace_event() are part of kernel/trace.c
    This is trace_get_config():
      /* Is there a tracer already registered */
      if(tracer_registered == 0)
        return -ENOMEDIUM;
    
      /* Get the configuration */
      *pmFetchSyscallUseDepth  = tracer->fetch_syscall_eip_use_depth;
      *pmFetchSyscallUseBounds = tracer->fetch_syscall_eip_use_bounds;
      *pmSyscallEipDepth = tracer->syscall_eip_depth;
      *pmSyscallLowerBound = tracer->syscall_lower_eip_bound;
      *pmSyscallUpperBound = tracer->syscall_upper_eip_bound;
    
      /* Tell the caller that everything was OK */
      return 0;
    
    And this is trace_event():
      /* Is there a tracer registered */
      if(tracer_registered != 1)
        lRetValue = -ENOMEDIUM;
      else
        /* Call the tracer */
        lRetValue = tracer->trace(pmEventID, pmEventStruct);
    
      /* Are there any callbacks to call */
      if(trace_callback_table[pmEventID - 1].Next != NULL)
        {
        /* Call all the callbacks linked to this event */
        for(pTCTEntry = trace_callback_table[pmEventID - 1].Next;
    	pTCTEntry != NULL;
    	pTCTEntry = pTCTEntry->Next)
          pTCTEntry->Callback(pmEventID, pmEventStruct);
        }
    
      /* Give the return value */
      return lRetValue;
    
    I agree this isn't the most efficient way to do things, but micro-benchmarks
    weren't an issue until now.
    
    3) The cost of the call by itself
    
    4) Once the syscall is made, the following assembly runs:
    #if (CONFIG_TRACE || CONFIG_TRACE_MODULE)
    	call SYMBOL_NAME(trace_real_syscall_exit)
    #endif
    
    5) trace_real_syscall_exit is also in traps.c and does only one thing:
            trace_event(TRACE_EV_SYSCALL_EXIT, NULL);
    
    Your results do make sense given the complexity traversed on the "null"
    call, but there are different ways to reduce that.
    
    Typically, appart from the above "null" system call, here's what an
    LTT macro generates:
    #define TRACE_SCHEDCHANGE(OUT, IN, OUT_STATE) \
               do \
               {\
               trace_schedchange sched_event;\
               sched_event.out       = OUT;\
               sched_event.in        = IN;\
               sched_event.out_state = OUT_STATE; \
               trace_event(TRACE_EV_SCHEDCHANGE, &sched_event);\
               } while(0);
    
    Here, the call to trace_event() is made whether a hook has been inserted
    or not.
    
    The following are possibilities to accelerate the "null" case:
    1) The use of conditional calls:
    if(hook_X_active == 1)
    	call_hook_X();
    There are many variations on this theme. The "if" may be made on
    a table of hooks and call_hook_X() could be a function pointer,
    etc.
    
    2) The use of NOPs (as I had suggested earlier) which are dynamically
    replaced by the hooking code. Practically, this means inserting a
    couple of NOPs at the hooking spot. To ease things, a macro could
    be used that would both generate the NOPs and add an entry in a
    hook table.
    
    3) The use of default jmps over the hook code (this may be simpler
    to implement than the NOP scheme). Using this, the default behavior
    compiled within the kernel would be a jump over the code implementing
    the hook. Something like:
    A()
    B()
    HOOK()
    C()
    
    Would yield:
    call A
    call B
    jump label1234
    call hook
    label1234:
    call C
    
    To activate the hook, one would only need to overwrite the "jump label1234"
    by NOPs. Modern CPUs are quite good at taking care of unconditionnal branches
    and, since the branch distances would be quite short, results may be even better
    than adding a bunch of NOPs.
    
    This solution also makes it easier to code hooks that actually pass
    parameters to the hook called. In the case of NOPs things are a little
    bit more complicated since the hooks being called then has to go figure
    out the stack location of the variables he needs to pass.
    
    > 
    > Note that these results were for the LTT built-in to the kernel.  I
    > wanted to test it as a module (both loaded and unloaded), but the
    > unofficial 2.4.2 patch leaves unresolved symbols with my
    > configuration.
    
    Hmm... I knew I'd have to check the submitted patches before making
    them "official", this is only an extra reason :)
    
    > 
    > What does this mean for the LSM project?  I'm not sure.  Ideally, I
    > think we should shoot for at most a 20% impact on null calls, but
    > really that's a judgement call - maybe someone should check with the
    > kernel gurus about what would be acceptable impact.  My main thought
    > is that if some bit of code is going to be accepted, it should be
    > minimal, and be thoroughly benchmarked.  I would hate to see a
    > beautiful design and implementation get rejected because it was too
    > expensive for people who didn't care about security.
    
    I agree. Rather than pushing for everyone to use the LTT patch as is,
    maybe the better approach would be to help in drafting a better hooking
    method for the kernel. On the long run, not only would this be usefull
    for LSM and LTT, but for all other projects that may need, for a reason
    or another, to hook into the kernel.
    
    >   edited output from lmbench-2beta1, three runs each
    >   (removed hostname and clarified kernel version, deleted
    >   parts that didn't seem to indicate difference, or that weren't
    >   completely run)
    
    I'll have to admit that I would have liked to see those tests that
    "didn't seem to indicate difference", if not for this list, then for
    my own analysis. Could you send me the complete thing off-list?
    
    I am pleased, though, to see that LTT makes no difference whatsoever
    as to the communication bandwith.
    
    Thanks for the feedback and let me know what you think about the
    alternative hooking mechanisms I suggested above.
    
    Cheers,
    
    Karim
    
    ===================================================
                     Karim Yaghmour
                   karymat_private
          Embedded and Real-Time Linux Expert
    ===================================================
    
    _______________________________________________
    linux-security-module mailing list
    linux-security-moduleat_private
    http://mail.wirex.com/mailman/listinfo/linux-security-module
    



    This archive was generated by hypermail 2b30 : Wed Apr 18 2001 - 16:19:02 PDT