RFC: jail functionality

From: serue@private
Date: Wed Jun 29 2005 - 09:14:09 PDT


Hi,

I'd still like to see bsdjail/vserver/zone functionality in linux.  It
seems to me the following pieces are needed:

	filesystem namespaces (mostly there, probably want shared
					subtrees)
	read-only bind mounts (not there yet)
	task separation (ie ptrace, etc: can be done by selinux)
	task-hiding ability (see attached patches)
	network jails (see below)
	hostname/domainname per jail?  (is this necessary?)
	resource management - can be done by selinux, ckrm, etc
	filesystem controls - can be done by selinux, using a simple
			policy (attached) provided jails get their own
			(loopback is fine) filesystem;  else read-only
			bind mounts would also help.
	more?
	Some intuitive script(s) to use all of the above.

Attached are the old task_lookup patch which was used by the bsdjail lsm,
a patch for selinux to utilize this hook, and a sample jail policy and
.fc, which presumably would eventually be changed to a jail_domain()
policy macro.  Does this seem at all useful by itself, or should this
wait until it were actually needed for a complete linux jails
implementation?  (Note that access_vectors.diff patches
/etc/selinux/targeted/src/policy/flask/access_vectors, jail2.fc can go
in /etc/selinux/targeted/src/policy/file_contexts/misc/, and jail2.te
can go into /etc/selinux/targeted/src/policy/domains/misc/)

It seems to me the greatest challenge is network jails.  I don't think
this can be done right with selinux.  I believe you can restrict a
domain's access to remote addresses by IP, but not to local addresses
during bind.  Am I wrong in assuming jails would be useless without
this?  (I suppose they could at least be useful for sandboxes of some
sort)  Does anyone have ideas on a good way to implement these?

Some time ago I sent out an RFC for network namespaces, which allowed a
process to essentially give up its access to a network device.  The
patch only allowed a process to give up access to real network devices,
not ip aliases (ie eth0:0).  But this seems much less useful for
allowing admins to provide multiple jails.

The linux-vserver team is working on virtual networking which (IIUC)
creates a virtual network device which is then associated with a
virtual address, a real network device, and a jail.  This appears to
be a way to make the simple version of network namespaces I describe
in the paragraph above more useful, since we would not need to deal
with ip aliases.

Is there any interest in seeing the virtual network devices and
network namespaces pushed upstream?

Read-only bind mounts?

The attached task-lookup patches?

thanks,
-serge









This archive was generated by hypermail 2.1.3 : Wed Jun 29 2005 - 09:09:08 PDT