[ISN] Attacking multicore CPUs

From: InfoSec News (alerts@private)
Date: Sun Sep 16 2007 - 22:16:22 PDT


http://www.theregister.co.uk/2007/09/14/system_call_sploits/

By Federico Biancuzzi
14th September 2007

The world of multi-core cpus we have just entered is facing a serious 
threat.

A security researcher at Cambridge disclosed a new class of 
vulnerabilities that takes advantage of concurrency to bypass security 
protections such as antivirus software

The attack is based on the assumption that the software that interacts 
with the kernel can be used without interference. The researcher, Robert 
Watson, showed that a careful written exploit can attack in the little 
timeframe when this happens, and literally change the "words" that they 
are exchanging.

Even if some of these dark aspects of concurrency were already known, 
Watson proved that real attacks can be developed, and showed that 
developers have to fix their code. Fast.

Watson presented his work at WOOT07, USENIX Workshop on Offense 
Technology, the results of his research entitled "Exploiting Concurrency 
Vulnerabilities in System Call Wrappers".

During the talk he showed how concurrency can be used to bypass security 
protections applied by so-called syscall wrappers.

A system call, briefly called syscall, is a basic function in the kernel 
that is called by a program. For example, when you open a file it's 
highly probable that the software you are using called the open() 
syscall to open it.

A sycall wrapper sits between the kernel and the program itself, and 
analyzes which syscalls are called and their arguments. A security 
wrapper might be configured to block access to some files, so in the 
previous example trying to open() the file "secrets.txt", it may stop 
the operation and return an error to the application.

We contacted Robert to learn more...


How does the attack work?

System call wrapping is a widely-used technique for extending kernel 
security, found in anti-virus systems and security policy enhancement 
frameworks such as the GSWTK, Systrace, and CerbNG systems I examine in 
the paper. System call interposition allows code running in the kernel 
address space to "wrap" system calls, adding new security checks, 
replacing the values of arguments to virtualize name spaces, or to audit 
arguments for the purposes of logging or intrusion detection. It's a 
very flexible technique, and appealing to software authors because it 
doesn't require changing existing kernel code, and allows control at the 
very well-understood system call interface.

This attack targets a weakness in the system call wraper architecture, 
in which system call arguments are separately copied by the system call 
wrapper and the kernel, allowing the attacker to "race" to replace the 
argument values between copies.

I was able to successfully bypass security in many system call wrappers 
by creating unmanaged concurrency between the attacking processes and 
the wrapper/kernel. This was possible on both uniprocessor systems and 
multiprocessor systems.

The existence of some of these vulnerabilities has been known for years 
(Ghormley 1998, Garfinkel 2003, Watson 2003), and I approached the 
authors of many of these wrapper systems as early as 2002 to report the 
problems. The contribution of this paper is in analyzing the 
vulnerability class, thoroughly exploring the attack space (I identify 
two previously undiscussed classes of race conditions, one of which is 
more broadly applicable), and to explore exploit strategies, allowing us 
to reason about the effectiveness of this attack aproach. It turns out 
that the approach is very effective indeed.

The paper [PDF] provides both a detailed discussion of the general class 
of concurrency vulnerabilities, and more concrete discussion of these 
specific vulnerabilities. I'd refer readers especially to the pictures 
and code in the slides [PDF] associated with the talk, which should make 
both the attack approach and simplicity of the exploits clear. In less 
than 20 lines of C code, and using only standard OS calls for memory 
access and management, the wrapper protections were completely disabled.


What is needed to succeed?

When I started working on this project, I was sure that the 
vulnerabilities could be exploited easily on multiprocessor systems, but 
didn't know to what extent uniprocessor systems would be susceptible. I 
was also unsure of the software requirements -- were threads required, 
etc. As it turns out, the attacks are broadly applicable, working on 
unprocessor OS's without threading. The attacker needs to be able to run 
code in a local process constrained by a system call wrapper, which he 
(or she) will then be able to bypass with relative ease.

On multiprocessor systems, we measure the size of the race window in 
cycles, and I found that the width of the race varied enourmously by 
wrapper system. Most of the wrapper systems I looked at were 
kernel-only, so 30,000 cycles might not be an unusual length. However, 
Systrace performs control in user space, leading to race conditions of 
500,000 cycles or more due to context switching. In the end, the size in 
cycles doesn't make much difference, as both of those numbers are very 
large compared to the cost of local memory access.

On uniprocessor systems, creating concurrency between the kernel and 
user space may be done using page faults, introduced where the kernel 
accesses user memory that has been paged to disk due to memory pressure. 
They can also be introduced through network delays or other IPC, which 
cause the kernel to yield. The key is that the user process is able to 
execute during critical windows between access to a system call argument 
by a wrapper and the kernel -- this turns out to be quite straight 
forward.


Could it be used in a remote exploit? Or it requires too short/precise 
timing to work with common internet latency?

These specific attacks require the attacker to be able to control a 
process on the system -- either legitimately (perhaps they have an 
unprivileged user account) or less legitimately (they have exploited a 
vulnerability in a service, such as Apache, BIND, MySQL, etc to gain 
execution privilege). The attacker will then be able to escape from a 
sandbox placed around their user process or vulnerable service, gaining 
access to the remainder of the system.

The details vary based on the intended effects of the wrapper. For one 
GSWTK wrapper, I show how to bypass intrusion detection when exploiting 
a vulnerable IMAP daemon, preventing alarms from firing despite 
accessing files outside the expected execution profile of an IMAP 
daemon. For Sysjail, I show that access control limits on what IP 
address can be bound may be entirely bypassed. For Sudo monitor mode, I 
am able to prevent the arguments to commands from being properly 
audited.


How much does the hardware platform affect the attack?

Multiprocessor systems are marginally easier to exploit since they do 
not require forcing kernel context switches via paging or other 
techniques. However, I was able to successfully bypass the same wrappers 
on uniprocessor systems. I did my experimental work on Intel hardware, 
but they should work across a range of hardware architectures and 
configurations.


And what about the OS?

These attack techniques target an architectural vulnerability in the 
wrapper approach, and readily apply across operating systems and 
hardware platforms. I was able to use the same C language exploits 
across several operating systems, including Linux, FreeBSD, NetBSD, and 
OpenBSD. They should apply equally well on other operating systems.


Is it something that might affect software written in any programming 
language?

The broader class of concurrency vulnerabilities are relevant to all 
concurrent systems, and are something all software developers need to be 
aware of. These specific races require shared memory between the two 
parties (processes and kernel/system call wrapper), so vulnerable 
software would necessarily involve shared memory between two mutually 
untrusting processes. You might find this construction in cases where 
server and client processes share memory in order to optimize 
inter-process communication, such as between databases and clients or in 
windowing systems.

While more rich language systems, such as scripting languages, often 
introduce opacity in memory access, in practice they behave fairly 
predictably and must do so to use shared memory. If languages support 
shared memory, improperly written programs might well be vulnerable. 
Likewise, they might well support attacks against system call wrappers 
using the techniques I've described.


Robert Watson has been actively involved with FreeBSD since 1999 and 
started the TrustedBSD Project in 2000, with the goal of bringing more 
advanced security features to the platform. As of October, 2005, he 
returned to Academia to work on a PhD at the University of Cambridge 
Computer Laboratory, after spending about six years in industry working 
in commercial and government-sponsored operating system and network 
security research and development.



This archive was generated by hypermail 2.1.3 : Sun Sep 16 2007 - 22:26:23 PDT